Home » Uncategorized

Measuring Progress Toward AGI Is Hard

Summary:  Artificial General Intelligence (AGI) is still a ways off in the future but surprisingly there’s been very little conversation about how to measure if we’re getting close.  This article reviews a proposal to benchmark existing AIs against animal capabilities in an Animal-AI Olympics.  It’s a real thing and just now accepting entrants.


Artificial General Intelligence (AGI) is something that many AI researchers have an opinion about but with surprisingly little consistency.  We believe broadly that achieving human-level AGI requires a system that has all of the following:

  • Consciousness: To have subjective experience and thought.
  • Self-awareness: To be aware of oneself as a separate individual, especially to be aware of one’s own thoughts and uniqueness.
  • Sentience: The ability to feel perceptions or emotions subjectively.
  • Sapience: The capacity for wisdom.

OK, so those are the characteristics described in science fiction.  We’d probably think we are pretty close if it could just:

  1. Learn from one source and apply it another completely unrelated field. In other words, generalize.
  2. That is recall a task once learned and again, apply it to other data or other environments.
  3. Be small and fast. Today’s systems are very energy hungry which stands in the way of making them tiny.
  4. Learn in a truly unsupervised manner.

There’s also quite a wide range of opinion about when we’ll achieve AGI.  Just about a year ago this time we reported on a panel included in the 2017 conference on Machine Learning at the University of Toronto on the theme of ‘How far away is AGI’.  The participants were an impressive group of 7 leading thinkers and investors in AI (including Ben Goertzel and Steve Jurvetson).  Here’s what they thought:

  • 5 years to subhuman capability
  • 7 years
  • 13 years maybe (By 2025 we’ll know if we can have it by 2030)
  • 23 years (2040)
  • 30 years (2047)
  • 30 years
  • 30 to 70 years

There’s significant disagreement but the median is 23 years (2040) with half the group thinking considerably longer. 

How Do We Measure Progress and Not Just Final Success

Needless to say full achievement of either of those lists is a tall order.  Not all researchers agree to what degree these characteristics are necessary and sufficient before we declare victory.  After all we are on the journey to achieve AGI and no one has yet actually seen the destination.

Several tests of final success have been proposed most of which you’ve probably heard of.

  • 3371515400The Turing Test: Can a computer convince a human that it is also human.  This one is now 69 years old.
  • The Employment Test: Nils Nilson (2005), a robot should automate economically important jobs.
  • The Coffee Test: From Steve Wozniak, cofounder of Apple in 2007.  When a robot can enter a strange house and make a decent cup of coffee. 
  • The Robot College Student: From Ben Goertzel in 2012.  When a robot can enroll in a college and earn a degree using the same resources and methods as a human. 

Curiously there don’t seem to have been any significant new tests of final success added in the last 7 years.  Is the matter settled?

Actually what seems not to be settled is how to measure our progress toward these goals.  Like most progress in our field we ought to be able to see these successes coming some years in advance as incremental improvements allow better performance.  But how do we tell if we’re 50% there or 75%?

The Animal-AI Olympics

3371515992One interesting approach was floated this last February in a project partnership between the University of Cambridge Leverhulme Center for the Future of Intelligence, and GoodAI, a research institution based in Prague.  Their thought is to benchmark the current level of various AIs against different animal species using a variety of already established animal cognition tasks.  Hence, the Animal-AI Olympics.

In June they announced the details of what these tests would be and are now taking submissions from potential competitors.  They propose that the following 10 tests represent increasing levels of difficulty and therefore sophistication in reasoning, for both animal and AI.

  1. Food: A single positive reward.  Get as much food as possible within the time limits.
  2. Preferences: Modifies the food test to include a preference selection for getting more food or easier to obtain food.
  3. Obstacles: Some immovable barriers that impede the agent’s navigation require the agent to explore the environment to solve the task.
  4. Avoidance: Introduces ‘hot zones’ and ‘death zones’ requiring the agent to avoid negative stimuli.
  5. Spatial Reasoning: Tests for complex navigational abilities and requires some knowledge of simple physics by which the environment operates.
  6. Generalization: Includes variations of the environment that may look superficially different to the agent even though the properties and solutions to problems remain the same.
  7. Internal Models: The agent must be able to store an internal model of the environment. Lights may turn off after a while requiring the agent to remember the layout and navigate in the dark.
  8. Object Permanence: Many animals seem to understand that when an object goes out of sight it still exists. This is a property of our world, and of our environment, but is not necessarily respected by many AI systems. There are many simple interactions that aren’t possible without understanding object permanence.
  9. Advanced Preferences: Tests the agent’s ability to make more complex decisions to ensure it gets the highest possible reward. Expect tests with choices that lead to different achievable rewards.
  10. Causal Reasoning: Includes the ability to plan ahead so that the consequences of actions are considered before they are undertaken. All the tests in this category have been passed by some non-human animals, and these include some of the more striking examples of intelligence from across the animal kingdom.

This strikes me as valuable to know but not particularly definitive in predicting how far we’ve progressed toward AGI.  It also seems to focus exclusively on reinforcement learning.  My guess is that 8 out of 10 AGI researchers would probably say reinforcement learning is the most likely path, yet we shouldn’t rule out a breakthrough coming from other efforts like spiking or neuromorphic chips or even literal biological wetware chips

I’m thinking that those General Dynamics robots get to at least number 6 on that scale and maybe a little higher.  Still, it’s good to see someone put a stake in the ground and take a shot at this.  I’m anxious to see the results.

Other articles on AGI:

A Wetware Approach to Artificial General Intelligence (AGI) (2018)

In Search of Artificial General Intelligence (AGI) (2017)

Artificial General Intelligence – The Holy Grail of AI (2016)

Other articles on Spiking / Neuromorphic Neural Nets

Off the Beaten Path – HTM-based Strong AI Beats RNNs and CNNs at Prediction and Anomaly Detection

The Three Way Race to the Future of AI. Quantum vs. Neuromorphic vs. High Performance Computing

More on 3rd Generation Spiking Neural Nets

Beyond Deep Learning – 3rd Generation Neural Nets

Other articles by Bill Vorhies

About the author:  Bill is Contributing Editor for Data Science Central.  Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.  His articles have been read more than 2 million times.

He can be reached at:

[email protected] or [email protected]