The Demo Problem: 6.5 min video

This short video is about the problem of training the agent to program an execution unit to move a piece on an infinite board under changing board rules.

  • The learning agent communicates with the environment through an interface
  • There is an infinite board of square fields with a piece on it in the environment
  • Board rules allow shifting the piece only by a certain number of squares in lateral and vertical directions
  • The board rules change regularly
  • There is an execution unit in the environment
  • The execution unit may be programmed to move the piece
  • The problem is to train the agent to program and launch the execution unit each time the board rules change

The video below describes the communication cycle and the alphabet used to train the reinforcement learning agent.