by Andrew Cochran

In a virtual gamespace populated by algorithmic avatars, the characters started displaying complex behaviours after 25 million games. The researchers stopped them at 500 million.

The avatars’ play period was another step to create machine-derived ‘thinking’ that applies to multiple talks in many areas.

The algorithms evolved on their own over time

Sets of algorithms, portrayed visually as ‘agents,’ were put in a game environment and given simple rules and tools.

‘Hiders’ developed over time to show better concealment strategies while the opponents developed more elaborate ways for ‘seeking.’ Eventually, one tried to hide behind barriers they fortified, only to be discovered by its adversary.

The ‘seeker’ had climbed a box to gain height and see over the wall of the barrier.

Image from OpenAI

The experiment provided clues to help achieve life-like intelligence

California-based OpenAI did the hide-and-seek experiments. OpenAI is one of the principal labs pursuing artificial general intelligence, often called AGI, the branch of AI research that’s trying to have machines achieve human-level abilities in several ways.

For this experiment, the OpenAI research team combined two techniques:

  • Multi-agent games – simulated environments where algorithms interact with each other using defined rules of engagement. Algorithmic behaviour is displayed by figures known as ‘agents.’ Games are ideal simulation environments because ‘success’ is clear, as are conditions for how it can and can’t be achieved.
  • Reinforcement learning – a form of deep learning where an AI model ‘learns’ from examples created by its own experience. The model measures each outcome as closer to or farther from its goal. It then self-adjusts accordingly. Reinforcement learning can be thought of as ‘trial and error’ or ‘learning from its mistakes,’ not unlike how humans acquire knowledge in several instances.

Put together, and the agents were ‘learning’ by adjusting their tactics in response to actions by the others. The agents were instructed with the rules of the game, and access to a few elements in the environment: some rectangular objects, some boxes, and some wedges. Other than having these basics, they were left to operate independently.

The agents evolved through six distinct strategies

Images from OpenAI
  • (A) early roundsseekers learn to chase hiders around the environment
  • (B) after 25 million gameshiders learn how to build forts
  • (C) after 75 million gamesseekers learn to jump the walls of the forts using ramps
  • (D) after 115 million gameshiders learn to immobilize the ramps so they can’t be relocated to scale walls
  • (E) after 388 million gamesseekers learned to use an immobilized ramp to scale a box and ‘box surf’ around the environment, able to look from a higher vantage point
  • (F) after 458 million moveshiders learn how to immobilize all the pieces so hiders cannot engage in ‘box surfing’

Like the age-old game, hiders were given a head start. The researchers began with a closed environment bounded by four walls, then tried an open environment. Some parts auto-generated random barriers, so the game environment varied over time. The research team found the agents exploited any opportunity, including imperfections in the simulation.

The experiment was conducted as a proof of concept. There was no mention of how long it took for the 500 million iterations. As well as revealing the emergent behaviours, it demonstrates how high-performance computing makes possible new avenues of AI advances.

OK, but how does this apply to journalism?

OpenAI’s work likely won’t be felt in journalism for a long time. Yet it does suggest future possibilities for story-seeking algorithms. And conceptually, that’s not far-off human behaviour. Self-adjusting after blind alleys is how reporters seek new developments, too.

‘The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.’

OpenAI statement describing their findings