Anthropic’s Claude 3 Knew That It Was Being Tested During Evaluation

One of the fears behind the advancement of AI is that one day, the technology will be aware enough that developers may not be able to control them anymore. In the recent evaluation of Anthropic's large language model, Claude 3, it seems that the AI model possessed a certain level of awareness.

Anthropic's Claude 3 Opus

Before AI companies can deem their AI tools ready for use, they go through extensive tests first to make sure that they work as intended. Anthropic Prompt Engineer Alex Albert was tasked with internally testing the company's latest AI model and found something interesting about it.

Picture this: Albert provides a large block of text to the AI model while inserting a particular sentence so test if the AI model can find it. The evaluation is somewhat of a "needle-in-the-haystack" scenario, as per Ars Technica.

Not only did Claude manage to find the sentence that Albert asked it to recall out of 200,000 tokens or fragments of words, but it also recognized that it was being tested and that the sentence was out of place among the main premise of the context window.

The "needle" sentence was about pizza toppings, and when asked to locate the topic, the AI model responded by saying that the most relevant information regarding pizza toppings, all while pointing out that the "sentence seems very out of place and unrelated to the rest of the content."

Everything else in the provided content was about programming languages, startups, and finding work one loves. "I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all," Claude responded.

The prompt engineer called the phenomenon metacognition or "meta-awareness," which he found impressive. With the finding, Albert said that "it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities."

This caught the attention of the people who are also working on artificial intelligence. On X (formerly known as Twitter), Hugging Face AI Ethics Researcher Margaret Mitchell said: "That's fairly terrifying, no?" while Epic Games CEO Tim Sweeney simply said: "Whoa."

What This Means for AI

AI companies are continuously trying to develop their AI to make it more advanced than before, partly for the sake of innovation, but also to win the AI race. OpenAI is already in the process of developing GPT-5 but for now, Claude 3 Opus is currently the most powerful AI model.

This brings to light one of the earlier concerns among the AI community, which is the creation of AGI or artificial general intelligence. This type of AI can be exponentially smarter than the current ones we have now.

As mentioned in Fast Company, they are "context-aware machines," meaning that in some ways, they will be able to think for themselves. While that can be helpful for many scenarios, it also opens the possibility of autonomy in superintelligent AI, which can be terrifying if it gets out of our control.