Update #4 – Joys of Reinforcement Learning

The premise of our game is to knock other players off of a platform without falling off yourself. Players are either in a 1v1v1v1 or 2v2 scenario.

I’ve never trained an AI before in a game context using reinforcement learning, so I’ll outline the major iterations we went through to get a usable AI. We used the ML-agents framework, which is a well-documented reinforcement learning package from Unity that makes using advanced AI techniques possible for a small team like JustKruisin.

The video on the right shows the very first AI I trained.

For those who are new to the AI training space, there are a few key concepts you can tweak to get a smarter AI:

We tweaked all 3, although I think choosing the right hyperparameters and reward systems are the most difficult to have an intuition for. I’ve listed all the various experiments we took from the start. This is a TrainingScene that shows multiple stages training in parallel, which speeds up training time.

Experiment 1

  • Training Hyperparameters
    • Copied from Unity’s training configs for AgentSoccer
  • Reward Systems
    • +0.2 reward for hitting another player
    • +1 reward for winning (last one on the platform), -1 reward for losing (falling off platform)
  • Observations
    • Agent’s position and speed
    • Ray Perception Sensor with 40 rays surrounding Agent
  • Result
    • Players go to the center of the platform and stay at low speed

Experiment 2 (delta)

  • Training Hyperparameters
    • No change
  • Reward Systems
    • +0.1 reward for hitting another player + 0.13 * speed of player
  • Observations
    • No change
  • Result
    • Players moved at medium / low speed, and they hit each other closer to parallel instead of head on.

Experiment 3 (delta)

  • Training Hyperparameters
    • No change
  • Reward Systems
    • Add small reward for moving at speed >= 10, Subtract small reward for moving at speed < 10,
    • Changed collision reward to be based off of impact velocity instead of player velocity (to incentivize more direct collisions instead of tangential collisions)
  • Observations
    • No change
  • Result
    • Agents acted in a more “exciting” way, hitting each other head on, but they reach a steady state of hitting each other at high speeds without falling off the platform.

Experiment 4 (delta)

  • Training Hyperparameters
    • team_change from 100k -> 300k, swap_steps from 2k -> 20k
  • Reward Systems
    • Added Draw reward of 0
  • Observations
    • No change
  • Result
    • Balls moved at slow speed near center; agents seemed to optimize for a draw

Experiment 5 (delta)

  • Training Hyperparameters
    • No change
  • Reward Systems
    • Set Draw reward to -0.7
  • Observations
    • No change
  • Result
    • There is better hit interaction compared to Experiment 4, but there are no aggressive moves

Experiment 6 (delta)

  • Training Hyperparameters
    • No change
  • Reward Systems
    • Doubled negative reward if <10 speed, otherwise add negative reward
    • Cap cumulative reward per episode at 1.0 (as recommended by the Unity docs)
  • Observations
    • No change
  • Result
    • Clear regression in AI, all the balls go toward center and move at slow speed

Conclusion

Reinforcement learning requires us to be patient as we try to find the ideal AI, since every experiment takes at least 8-10 hours to see whether the AI is potentially useful. This makes testing hypotheses much more difficult and slow.

With that said, while we are still tweaking various inputs to the AI system, reinforcement learning using ML-Agents seems to be an interesting way to build smart AI for games.

If you are optimizing for speed, however, I would probably not go this route. Since we wanted to release ASAP, we made the decision to go back to good ‘ol heuristic-based AI with state machines, and we hope to come back and refine this AI in the future.

Leave a Reply