Making Machines Learn

Share on facebook
Share on twitter

At the core of any adaptive artificial intelligence technology is the idea of learning. A system that doesn’t learn is pre-programmed – the “correct” solution is integrated inside the program at launch, and the task of the system is to navigate through a series of conditions and caveats to determine which end of the pre-calculated decisions best fits its current condition. A large percentage of AI engines works that way. A “fixed” system like that certainly has advantages:

  1. The outcomes are “managed” and under control.
  2. The programmers can better debug and maintain the source code.
  3. The design teams can help push any action in the desired direction to move the story along.

These advantages, especially the second one, have traditionally outweighed the scale towards creating such pre-programmed decision-tree systems. The people responsible for creating AI in games are programmers, and programmers like to be able to predict what happens in any given moment. Since, very often, designers’ indications on artificial intelligence revolve around “make them not too dumb”, it’s no secret that programmers will choose systems they can maintain. 

But these advantages also have a downside: 

  1. New situations, often introduced by human players finding unforeseen circumstances overlooked during development, aren’t handled.
  2. Decision patterns can be deducted and “reverse-engineered” by astute players.

Often, development teams circumvent these disadvantages by giving these “dumb” AI opponents superior force, agility, hit points, etc. to level the playing field with the player. An enemy bot can, for instance, always hit you between the eyes with his gun as soon as he has a line of sight. The balance needed to create an interesting play experience is difficult to achieve, almost impossible to please both novice players and experienced ones. Usually, once a player becomes more expert in a game, playing against the AI doesn’t offer an interesting experience and the players look for human opponents online. 

But what if the system could reproduce the learning patterns of the human player: starting inexperienced and by being thought, by actions, how to play better? After all, playing a game is reproducing simple patterns in an always more complex set of situations, something computers are made to do. What would it mean to make the system learn how to play better, as it’s playing? 

To answer that question, we need to look outside the field of computer software and go into psychology and biology – what does it mean to learn, and how is the process shaping our expertise in playing a game? How come two different players, playing the same game, will build two completely different styles of playing (a question we’re addressing on our next article about personalities and emotions)?

Let’s look at three different ways of learning and see how machines could use them. 

• The first kind is learning through action: the stove top is turned on, you stick a finger on it, it burns, and you just learned the hard way not to touch the stove. This (Pavlovian?) way of learning is a simple example of action/reaction. Looking at the consequence of the action, the effects are so negative and severe that the expected positive stimulus (taking food now) is outmatched. Teaching computers to learn through this process is not that difficult – you need to weight the consequences of an action and compare them with the expected consequences, or ideal consequences. The worse the real effects are, the harder you learn not to repeat the specific action. 

• The second kind builds upon the first one: learning through observation. You see the stove top, and you can see the water boiling. You deduct that there is a heat source underneath and putting your finger there wouldn’t be wise. This means that you can predict the consequences of an action without having to experience it yourself. A computer that would do it would, of course, need basic information on the reality of the world – it needs to know what a heat source is, and its possible side effects. Even without having experienced direct harm, it’s possible to have it “know” the effects nonetheless. This is achieved through what we call the common core knowledge and will be the topic of an upcoming article. Basically, we know that the stove burns because some people got burned before us. They learned through action, their effects were severe (maybe fatal), and society as a whole learned from the mistakes of these people. The common core (or “Borg” as we call it) is designed to reproduce that. 

• The third kind, the most interesting for gamers, is learning through planning. Again, it builds on its predecessor. If putting you in contact with the stove top inflicts important, possible fatal damages, then it’s possible to use that information on others. Like a nemesis, for instance (by essentially doing the same reasoning as above, but with different measurements of what would be a positive or a negative outlook). I don’t want to burn myself, but I might want to burn someone else. Again, I’ve never burned myself, and I haven’t use the stove top ever in my life before, but I have general knowledge of it’s use and possible side effects, and I’m using that to project in time a plan during a fight. If I push my opponent now on the stove top, it should bring him pain, and this brings me closer to my goal of winning the fight. 

These three types of learning get exponentially more complex to translate in computer terms, but yet they represent simple, binary ways of thinking. Breaking down information and action into simple elements enables computers to comprehend them and work with them. This creates a very different challenge to the game designers and programmers – instead of scripting behaviors in an area, they need to teach the system about the rules of the world around them, and then let the system “understand” how best to use them. The large drawback is the total forfeiture of the first big advantage of fixed systems we listed above, namely to control the behavior of the entities. If the system is poorly constructed, and the rules of the physical world aren’t translated properly to the system, then the entities will behave chaotically (Garbage-in-garbage-out rule). 

Building and training the systems to go through the various ways of learning is the main challenge of a technology like the EHE, but we believe the final outcome is well worth the effort. 

In the next article, I will expand on what I believe are the roles and effects of emotions and personality on learning, decision making process and explain further on the concept of common core knowledge. 

Thank you for reading and I look forward to hearing your thoughts on the matter.

Share on facebook
Share on twitter