But can it play… never mind

A dish of neurons may have taught itself to play Pong (badly)

Given control of a paddle and feedback, the neurons figured out what to do.

John Timmer – Oct 13, 2022 2:16 pm | 55

In culture, nerve cells spontaneously form the structures needed to communicate with each other. Credit: JUAN GAERTNER / Getty Images

One of the more exciting developments in AI has been the development of algorithms that can teach themselves the rules of a system. Early versions of things like game-playing algorithms had to be given the basics of a game. But newer versions don’t need that—they simply need a system that keeps track of some reward like a score, and they can figure out which actions maximize that without needing a formal description of the game’s rules.

A paper released by the journal Neuron takes this a step further by using actual neurons grown in a dish full of electrodes. This added an additional level of complication, as there was no way to know what neurons would actually find rewarding. The fact that the system seems to have worked may tell us something about how neurons can self-organize their responses to the outside world.

Say hello to DishBrain

The researchers behind the new work, who were primarily based in Melbourne, Australia, call their system DishBrain. And it’s based on, yes, a dish with a set of electrodes on the floor of the dish. When neurons are grown in the dish, these electrodes can do two things: sense the activity of the neurons above them or stimulate those electrodes. The electrodes are large relative to the size of neurons, so both the sensing and stimulation (which can be thought of as similar to reading and writing information) involve a small population of neurons, rather than a single one.

Beyond that, it’s a standard culture dish, meaning a variety of cell types can be grown in it—for some control experiments, the researchers used cells that don’t respond to electrical signals. For these experiments, the researchers tested two types of neurons: some dissected from mouse embryos, and others produced by inducing human stem cells to form neurons. In both cases, as seen in other experiments, the neurons spontaneously formed connections with each other, creating networks that had spontaneous activity.

While the hardware is completely flexible, the researchers configured it as part of a closed-loop system with a computer controller. In this configuration, electrodes in a couple of regions of the dish were defined as taking input from the DishBrain; they’re collectively termed the motor region since they control the system’s response.

Another eight regions were designated to receive input in the form of stimulation by the electrodes, which act a bit like a sensory area of the brain. The computer could also use these electrodes to provide feedback to the system, which we’ll get into below.

Collectively, these provide everything necessary for a neural network to learn what’s going on in the computer environment. The motor electrodes allow the neurons to alter the behavior of the environment, and the sensory ones receive both input on the state of the environment as well as a signal that indicates whether its actions were successful. The system is generic enough that all sorts of environments could be set up in the computer portion of the experiment—pretty much anything where simple inputs alter the environment.

The researchers chose Pong.

Pong meets theoretical neuroscience

Pong, in many ways, is an excellent choice. The environment only involves a couple of variables: the location of the paddle and the location of the ball. The paddle can only move along a single line, so the motor portion of things only needs two inputs: move up or move down. And there’s a clear reward for doing things well: you avoid an end state where the ball goes past the paddles and the game stops. It is a great setup for testing a simple neural network.

But there’s a notable issue here: There’s no reason for neurons to consider a state where the ball’s still in play rewarding. So there’s no way for humans to know what sort of signal the computer should generate to indicate when the neural network has been successful. And without that sort of signal, there’s no way for the neural network to learn anything.

This is where the research team turned to theoretical neurobiology. One proposal for how sensory networks learn to interpret the world is that they try to minimize the mismatch between what the network thinks is going to happen and the actual state of the world. In this view, learning networks naturally try to minimize the discrepancy between the predicted and actual states.

Put in Pong terms, the sensory portion of the network will take the positional inputs, determine an action (move the paddle up or down), and then generate an expectation for what the next state will be. If it’s interpreting the world correctly, that state will be similar to its prediction, and thus the sensory input will be its own reward. If it gets things wrong, then there will be a large mismatch, and the network will revise its connections and try again.

To drive the reward home, if the network lost the game by allowing the ball to cross the end line, the researchers fed the network a burst of random positional information, which is presumably unrelated to any predictions it made. This would create a large difference with any predictions and induce the system to reorganize before the game restarted shortly afterward.

While this all sounds sensible, it’s important to remember that this is just a proposal about how neurons might self-organize into a learning system. We don’t know whether parts of intact brains behave this way, much less whether a bunch of neural cells dumped randomly into a dish would spontaneously form a learning system.

It kinda works

Amazingly, the system appears to have worked, for at least some definitions of “worked.” Systems comprising either mouse or human neurons saw the average length of Pong rallies go up over time, indicating they might be learning the game’s rules. Systems based on non-neural cells, or those lacking a reward system, didn’t see this sort of improvement.

That said, there are a host of caveats. Even the best behaving systems didn’t play Pong all that well. The best performing control systems, which were likely to be moving the paddle randomly, consistently outperformed the average trained neural networks, and the worst performances by trained systems were worse than the average control neural networks. So, while the boost in performance was statistically significant, you couldn’t necessarily identify a trained, functional system just by watching it play Pong.

The second thing is that there were a lot of individual tests done by the researchers: different measures of performance, different amounts of training, multiple controls, and so on. As such, you’d expect some good results to pop up purely due to chance. So, what you have to look for here is whether the good-looking results consistently point in the same direction.

That consistency appears to be there. The effect was seen in both human and mouse neurons, and several measures of success all moved in parallel: average length of rallies and total number of rallies with at least three bounces off the paddle went up, while the number of “aces” where the paddle never touches the ball went down. Giving no feedback when a game ends with the ball crossing the end line produced performance that was intermediate between trained systems and control systems. Combined, these suggest that there’s a real effect there, even if the individual tests produced fairly weak results.

Fortunately, this is something that should be relatively simple to subject to replication. There are probably plenty of games that are roughly as simple as Pong and could be used to search for a similar improvement with experience.

But, assuming this holds up, it provides some evidence that neural networks formed from actual neurons spontaneously develop the ability to learn. And that could explain some of the learning capabilities of actual brains, where smaller groups of neurons are organized into functional units, much like the sensory and motor units used here.

Neuron, 2022. DOI: 10.1016/j.neuron.2022.09.001 (About DOIs).

Listing image: JUAN GAERTNER / Getty Images

John Timmer Senior Science Editor

John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

55 Comments