journalismAI.com | Hide-and-seek finds clues to human-like AI

by Andrew Cochran
September 20, 2019

In a virtual gamespace populated by algorithmic avatars, the characters started displaying complex behaviours after 25 million games. The researchers stopped them at 500 million.

The avatars’ play period was another step to create machine-derived “thinking” that applies to multiple tasks in many areas.

The algorithms evolved on their own over time

Sets of algorithms, portrayed visually as “agents,” were put in a game environment and given simple rules and tools.

Through repetitive iterations, “hiders” developed to show better concealment strategies, while the “seekers” gained more elaborate ways of finding their opponents. Eventually, a hider took cover behind a barrier it had fortified, only to be found out. The ‘seeker’ had climbed a box to gain height and see over the wall of the barrier (see Figure 1).

**Figure 1:** The seeker moved a box into position and used it to gain a higher vantage point. *Image from OpenAI*

The experiment provided clues to help achieve life-like intelligence

California-based OpenAI did the hide-and-seek experiments. OpenAI is one of the principal labs pursuing artificial general intelligence, often called AGI, the branch of AI research that’s trying to have machines achieve human-level abilities in several ways.

For this experiment, the OpenAI research team combined two techniques:

Multi-agent games – simulated environments where algorithms interact with each other using defined rules of engagement. Algorithmic behaviour is displayed by figures known as “agents.” Games are ideal simulation environments because “success” is clear, as are conditions for how it can and can’t be achieved.
Reinforcement learning – a form of deep learning where an AI model “learns” from examples created by its own experience. The model measures each outcome as closer to or farther from its goal. It then self-adjusts accordingly. Reinforcement learning can be thought of as “trial and error” or “learning from its mistakes,” not unlike how humans acquire knowledge in several instances.

Together, the agents were “learning” by adjusting their tactics in response to actions by the others. The agents were instructed with the rules of the game, and access to a few elements in the environment: some rectangular objects, some boxes, and some wedges. Other than having these basics, they were left to operate independently.

The agents evolved through six distinct strategies

The researchers observed evolutionary behaviours in over millions of iterations (see Figure 2).

Figure 2: The strategy progression evolved from running and chasing (a) through to “surfing” (e and i). *Images from OpenAI*

(A) early rounds – seekers learn to chase hiders around the environment (see A)
(B) after 25 million games – hiders learn how to build forts
(C) after 75 million games – seekers learn to jump the walls of the forts using ramps
(D) after 115 million games – hiders learn to immobilize the ramps so they can’t be relocated to scale walls
(E) after 388 million games – seekers learned to use an immobilized ramp to scale a box and ‘box surf’ around the environment, able to look from a higher vantage point
(F) after 458 million moves – hiders learn how to immobilize all the pieces so hiders cannot engage in ‘box surfing’

Like the age-old game, the hiders got a head start. The researchers began with a closed environment bounded by four walls and then tried an open environment. The game environment varied over time. The research team saw the agents exploited any opportunity, including imperfections in the simulation.

The experiment was a proof of concept. There was no mention of how long it took for the 500 million iterations. As well as revealing emergent behaviours, the experiment demonstrated how high-performance computing makes possible new avenues of AI advances.

OK, but how does this apply to journalism?

OpenAI’s work likely won’t be felt in journalism for a long time. Yet it may suggest someday possibilities for story-seeking algorithms. Conceptually, that’s not far-off human behaviour. Self-adjusting after blind alleys is how reporters seek new developments, too.

‘The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.’
OpenAI statement describing their findings

SEE RELATED ITEMS

Video showing stages for the behaviours emerging – OpenAI (see top of page)
Emergent Tool Use From Multi-Agent Autocurricula – OpenAI Paper
AI learned to use tools after nearly 500 million games of hide and seek – MIT TECHNOLOGY REVIEW | September 17, 2019 | by Karen Hao
OpenAI Plays Hide and Seek…and Breaks The Game! 🤖 – TWO MINUTE PAPERS | October 22, 2019 | VIDEO

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

Hide-and-seek finds clues to human-like AI

SEE BY SUBJECT

SEE BY TYPE

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

Discover more from journalismAI.com