Researchers at OpenAI in California found their algorithms started displaying complex behaviours after 25 million games. They stopped at 500 million.
The algorithms evolved on their own over time
The proof is in the results. Sets of algorithms, portrayed visually as ‘agents,’ were put in a game environment and given simple rules and tools. ‘Hiders’ developed over time to show better and better concealment strategies while the opponents developed more elaborate ways for ‘seeking.’ Eventually one tried to hide behind barriers that they fortified, only to be found out by its adversary. The ‘seeker’ had climbed a box to gain height and see over the wall of the barrier.
The experiment provided clues to help achieve life-like intelligence
OpenAI is one of the principal labs pursuing AGI, or artificial general intelligence, the branch of AI research that’s trying to have machines achieve human-level abilities in several ways. For this experiment the OpenAI research team combined two techniques:
- Multi-agent games – simulated environments where algorithms interact with each other using defined rules of engagement. Algorithmic behaviour is displayed by figures known as ‘agents.’ Games are ideal simulation environments because ‘success’ is clear, as are conditions for the ways it can and can’t be achieved.
- Reinforcement learning – a form of deep learning where an AI model ‘learns’ from examples created by its own experience. The model measures each outcome as being either closer to, or farther from, its goal. It then self-adjusts accordingly. Reinforcement learning can be thought of as ‘trial and error’ or ‘learning from its mistakes,’ not unlike how humans acquire knowledge in several instances.
Put together, the agents were ‘learning’ by adjusting their tactics in response to actions by the others. The agents were instructed with the rules of the game, and access to a few elements in the environment: some rectangular objects, some boxes, and some wedges. Other than having these basics, they were left to operate on their own.
The agents evolved through six distinct strategies
- (A) early rounds – seekers learn to chase hiders around the environment
- (B) after 25 million games – hiders learn how to build forts
- (C) after 75 million games – seekers learn to jump the walls of the forts using ramps
- (D) after 115 million games – hiders learn to immobilize the ramps so they can’t be relocated to scale walls
- (E) after 388 million games – seekers learned to use an immobilized ramp to scale a box and ‘box surf’ around the environment, able to look from a higher vantage point
- (F) after 458 million moves – hiders learn how to immobilize all the pieces so hiders cannot engage in ‘box surfing’
Like the age-old game, hiders were given a head start. The researchers began with a closed environment bounded by four walls, then tried an open environment. Some parts auto-generated random barriers, so the game environment varied over time. The research team found the agents exploited any opportunity available, including imperfections in the simulation.
The experiment was conducted as a proof-of-concept. There was no mention of how long it took for the 500 million iterations. As well as revealing the emergent behaviours, it is another demonstration of how high performance computing is making possible new avenues of AI advances.
OK, but how does this apply to journalism?
OpenAI’s work likely won’t be felt in journalism for a long time. Yet it does suggest future possibilities for story-seeking algorithms. Self-adjusting after blind alleys is how reporters seek new developments, too.
‘The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.’OpenAI statement describing their findings
SEE RELATED ITEMS
- Video showing stages for the behaviours emerging – OpenAI (see top of page)
- Emergent Tool Use From Multi-Agent Autocurricula – OpenAI Paper
- AI learned to use tools after nearly 500 million games of hide and seek – MIT TECHNOLOGY REVIEW | September 17, 2019 | by Karenb Hao