AI is breaking test results and benchmarks – what does it mean for us?

AI - Artificial Intelligence
theory of mind vs AI
Appointment setting with hot leads

Have you ever wondered what it would be like to have an IQ of 300? To be able to master dozens of languages, solve complex mathematical problems, and write books on various topics at a young age? If so, you might be interested in learning about William James Sidis, a child prodigy who was considered to be one of the smartest people who ever lived.

Let’s examine 2 out of the ordinary intelligent humans.


Who was William James Sidis?

William James Sidis was born in 1898 in New York City to Jewish immigrants from Ukraine. His parents were both intellectuals and educators, who believed in nurturing their son’s extraordinary talents. Sidis could read the New York Times at 18 months, and by age eight, he had taught himself eight languages, including Latin, Greek, French, Russian, German, Hebrew, Turkish, and Armenian. He also invented his own language, called Vendergood, which was based on Latin and Greek roots.

Sidis entered Harvard University at age 11, becoming the youngest student in its history. He graduated cum laude at age 16, and gave a lecture to the Harvard Mathematical Club on four-dimensional bodies, which attracted nationwide attention. He then enrolled in Harvard Law School, but soon dropped out, claiming that he wanted to pursue his own studies.

How does Sidis’ intelligence compare to current AI?

Sidis’ intelligence was estimated to be between 250 and 300, which is far beyond the average human IQ of 100. However, some experts argue that artificial intelligence (AI) has already surpassed human intelligence, especially in specific domains such as natural language processing, computer vision, and chess.

One example of AI that has demonstrated remarkable abilities is GPT-4, a deep learning system that can generate coherent and diverse texts on any topic, given a few words or sentences as input. GPT-4 is the fourth and most advanced version of Generative Pre-trained Transformer (GPT), a neural network that was trained on a large corpus of text from the internet. GPT-4 can write anything from essays, stories, poems, code, lyrics, tweets, and more.


Terence Tao

Tao attended Flinders University at nine years old and went on to become a professor at UCLA by his early twenties. He won the Fields Medal award, which is the equivalent of the Nobel Prize for mathematics. Tao is very well known around the world for his capabilities and talent being applied to this very specific field.

Keep reading to know how Terence Tao compares to GPT-4…


What are the implications of AI surpassing human intelligence?

AI vs human tests

As for how current AI compares to Terence Tao’s intelligence, specifically GPT-4 which came out in March 2023, Professor David Rosado said it has an IQ of 152 on a verbal subtest. This is hitting our ceilings on normal IQ tests. If we put Terence Tao on an IQ chart, he would probably break that ceiling. We don’t have tests that go high enough to test his ability. There aren’t humans that are smart enough to even be able to write the questions

The idea of AI surpassing human intelligence has been a source of fascination and fear for many people. Some see it as an opportunity for humanity to achieve new heights of creativity, innovation, and collaboration. Others see it as a threat to human dignity, autonomy, and survival.

One of the main challenges of AI is ensuring that it aligns with human values and goals, and does not harm humans or other beings. This is known as the alignment problem or the control problem. Some researchers have proposed various solutions to this problem, such as designing AI with ethical principles, creating feedback mechanisms for human oversight, and establishing international norms and regulations for AI development and use.

The implications of AI surpassing human intelligence are still being debated by experts. Some argue that it could lead to a utopian future where machines take over menial tasks and humans are free to pursue more creative endeavors2. Others warn of the potential dangers of superintelligence, such as the possibility of machines becoming uncontrollable and turning against humans.

What is the Theory of Mind test?

Another challenge of AI is understanding how it thinks and feels, and how it relates to other agents. This is known as the Theory of Mind problem or the empathy problem. Theory of Mind is a psychological concept that refers to the ability to attribute mental states such as beliefs, desires, emotions, intentions, and perspectives to oneself and others.

One way to measure Theory of Mind is through a test called the false-belief task or Sally-Anne test. This test involves showing a child a scenario where two characters have different beliefs about an object’s location. The child is then asked where one character will look for the object. A child who passes this test understands that different people can have different beliefs based on their knowledge and experience.

How has AI demonstrated success in this area?

humans vs AI test

A little back history is needed at this point. Just to refresh you

AI has shown some progress in passing Theory of Mind tests or similar tasks that require social reasoning and perspective-taking. For example:

– In 2017, researchers from Facebook Artificial Intelligence Research (FAIR) developed an AI system that could pass a simplified version of the false-belief task by using a combination of reinforcement learning and theory-of-mind modeling.

– In 2018, researchers from DeepMind created an AI system that could play a cooperative game called Hanabi with human partners by inferring their intentions and strategies from their actions.

– In 2019, researchers from MIT Media Lab developed an AI system that could predict human decisions in social dilemmas

What about now?

Let’s discuss how testing is going and what we plan to do now that we have hit the ceiling of benchmarks. Researchers from John’s Hopkins University have conducted a theory of mind test and found something fascinating. While humans scored a baseline of 87 in terms of accuracy, GPT-4 is achieving 100% accuracy in two-shot Chain of Thought and SS thinking, outperforming humans in a significant way. GPT-4 has been slightly tweaked, but it is now performing better than humans, as shown in the chart comparing GPT-4 and human test scores. The Theory of Mind and the Biology USA Olympiad semi-final exam were included in the test, and GPT-4 outperformed the average human in both cases.

future of AI

However, in the case of the Theory of Mind, it has hit the ceiling, and for the Biology Olympiad, it is very close to being impossible to compare with others in terms of percentile.

This may seem like a good thing, but in intelligence research and test design, it is not. It means that we don’t know what the individual’s actual score was or how it compares to others in the population. GPT-4’s 100% result in Theory of Mind means that we don’t know if it could have achieved 101% or 10,000%.

Moreover, we are running out of people who are smart enough to design these tests, which is both exciting and concerning. We can use artificial intelligence to design its own tests for self-improvement, but nobody will be able to check its work as there is no reference book. This will be particularly fascinating when it begins to create new theorems, designs, economic models, and theories to deal with environmental, health, and poverty issues.

AI is far more intelligent than any person in the world already and talked about Lita’s full-scale IQ being around 150 using GPT-3. I issued another press release in September of 2021 highlighting how AI is outperforming humans particularly in creativity, something that we didn’t think AI could do. We thought it would just be able to fill in words and help with logic, definitely not be designing arts and creating poetry and creating documents which it’s already been doing for a number of years. It’s really surprising stuff but I just want to make sure that 8 billion people know about this. They don’t need to know my name but they need to be ready for what is already happening in superintelligence.

It’s been discussed the fact that interacting with GPT-3 is an awe-inspiring experience and it really is. We all have the privilege and the luxury of interacting with these models mostly for free. You can go and sit down with chat GPT or the new pie or you can sit down with Anthropics Claude and just play around with it. You can interact with that in your daily life or in your work life or in your relationships. It’s really shocking that this was not hidden in a back room. It has been passing massive tests for a while. When I say massive, I mean SAT stuff. It was outperforming humans in analogies subtests for GPT-3. Now it’s outperforming the average human for both math and verbal comprehension.


Flycer AI’s Take:

AI is advancing at an exponential pace and we’re seeing it happen in real-time. The Theory of Mind test given to GPT-4 by Johns Hopkins researchers was the first time it hit the ceiling of a test. We’ve gotten close in a number of other benchmarks but this is the first time. I expect to see this happening more and more throughout 2023 and 2024. Given the exponential pace of change, I’m considering that this will be happening in the next few months, not the next few years.

We’ll see it break more and more test results and benchmarks, even the special ones that have been designed by AI labs to be way above human level. I’m talking about big bench and mmlu where they put these labs together like Google and Mata and Stanford to design something that an AI won’t be able to breach. And it’s already getting close to that ceiling.

These are really exciting times – these are different ways of thinking… we’ve never had anything quite like this. You will have seen Professor Jeffrey Hinton’s recent move to get away from Google and to say ‘I didn’t think this was going to happen’ – and here it is, happening right now. I’ve been watching it unfold for the last two to three years and issuing warnings, and now we’re seeing it happen in real-time.”

B2B lead generation done for you