AGI and knowledge: we have ways of making him talk

AGI and knowledge: we have ways of making him talk

Our conception of information and data in IT Systems is heavily shaped by the capabilities and design of these systems. In a traditional system, it is easy to differentiate between the data, application logic and user interface, and this model of an IT system is widely used and has been refined over many years. This model heavily influences our understanding of data protection and confidentiality. Since data is clearly separated from logic and visualization, it is easy to identify, manipulate and apply rules for storage and usage.

Until recently, modern deep learning language models did not change much about this model. Models such as BERT are good for classifying text or gaining semantic insight into individual snippets of text. Such models use implicit knowledge to interpret complex relationships between words and their meanings. This is beautifully illustrated by the visualizations of the way the model uses attention when analyzing the sentences “The animal didn’t cross the street because it was too tired” and “The animal didn’t cross the street because it was too wide”. While the model associates the word “it” with the word “animal” in the first sentence, it shifts to “street” in the second, demonstrating an understanding of the intended meaning of the sentence. While clearly some sort of implicit knowledge is encoded within the model weights, it is mainly rather superficial and there is little danger of the model e.g. retaining classified information that could be later re-extracted.

The new generation of (early) generalizable models

This has started to change with the recent spread of a new generation of language models that are several orders of magnitude larger in size and capacity, allowing them to memorize far more data. Models such as GPT3 differ from BERT not just in their capacity, but in their purpose: GPT3 is a generative model, it is built to be able to both construct novel outputs, and to reconstruct old inputs. This means that while BERT would answer a question by pointing to where in a provided text the answer most likely is (extractive), GPT would be able to generate new text containing the answer (generative). Until now, language models couldn’t produce new information but only help users find the place to look for the answer, but the new generation of generative models can “memorize” information during training and use that to provide answers to the user.

Let us illustrate this by asking GPT3 about Angela Merkel. Starting with the prompt “Angela Merkel was born in” the model continues with “Hamburg as Angela Dorothea Kasner. She studied Physics at Leipzig University, […]”. Opening the hood of the machinery and looking at the individual word predictions, we can see that there is a high confidence for Hamburg and Physics but there are other options the model considered but rejected.

Information about Merkel's life

Here just a brief remark that warrants its own article: We can see that the associations (and thereby also mistakes) that these models make are remarkably similar to the kinds humans make. “Asked” about the education of Angela Merkel, the model seems to consider options that a human might articulate as “something technical, physics I believe but could also be mathematics or chemistry. I also associate politics with Angela Merkel, so maybe that is also part of her education.”

One of the challenges of interacting with information implicitly stored in big language models is that there is no clear way how to use it. In the example above, the prompt was selected simply based on intuition, and it proved to be suited to extracting this specific information. This query is probably comparatively easy for the model, as information about Angela Merkel is likely to be quite frequently represented in the training corpus.

Problems with the use of language models for their knowledge

Are we heading towards a world where bigger and bigger language models encapsulate all of the world’s knowledge? Maybe, but there are also significant challenges when attempting to use the implicit knowledge in large models:

Hendrycks et a. 2020 looked at GPT-3’s ability to answer questions that require specific knowledge (e.g. “What is the embryological origin of the hyoid bone?”) by picking an option in a multiple choice questionnaire (e.g. “(D) The second and third pharyngeal arches”). They found that GPT-3 performs rather poorly at this task, while also simultaneously showing massive improvement by increasing model size. This may indicate that even extremely specific knowledge can be learned and used by even bigger language models in the future.

Increase of GPT-3's knowledge performance with size

There is also remarkably interesting work by a large group of researchers from (Google, Stanford, UC Berkeley, Northeastern University, OpenAI, Harvard and Apple) illustrating that small details in the prompt significantly influence the model’s capability to reproduce knowledge, making this a rather unreliably way of retrieving information. They showed that depending on the prompt and search the model was able to reproduce either 25 or up to 500 digits of pi.

One of the most significant challenges is that there is no clear way to prevent the model from generating misinformation, either because the model hasn’t memorized the correct information or because there is no correct answer (because the question is nonsensical or the answer is unknown). Daniel Bigham showed that GPT-3 can be used to identify and explain nonsensical statements: The model replied to the statement “My sneeze lasted 10 hours” with “Sneeze can’t last that long.” and to “Leonardo da Vinci architected the Eiffel Tower.” With “Leonardo da Vinci dies in 1519, and the Eiffel Tower was built in 1889.”. Arram Sabeti demonstrated that GPT-3 can be taught to reply to nonsense questions with a certain phrase (here: “yo be real”) instead of giving false information. So the model answers “What is human life expectancy in the United States?” with “Human life expectancy in the United States is 78 years.”, while replying “yo be real” to the question “How many rainbows does it take to jump from Hawaii to seventeen?”.

A structured way to extract knowledge from big language models has been demonstrated by Wang et al in October 2020. They showed that the implicitly learned knowledge can be transformed into entities and relationships representing a knowledge graph. This work could be a promising approach to make the knowledge in AGI models more compatible with traditional application logic and databases.

Knowledge graph about Bob Dylan generated out of a language model

Visual knowledge

Recent remarkable progress in generative image models will lead to similar advances as those that we have seen in language models. A giant image model trained on pictures from the internet would be able to reproduce the face of a person based on characteristic traits (e.g. recognizing Queen Elisabeth by her hat) or learned objects (like e.g. the Eiffel Tower).

Implications for interactions with information, data protection and confidentiality

This new generation of general AI systems in which information is no longer solely stored in databases and logic in source code presents unique opportunities and challenges. There is no separation between language, information and logic in these models and they can apply and combine knowledge in similar ways to how a human might.

  • AGI models can be used to learn and apply knowledge. Even extremely specific domain knowledge can be memorized given a big enough model size.
  • This learned knowledge can be applied through natural language or transformed into structured interfaces like knowledge graphs, lists or scores.
  • It is difficult to guarantee or prevent the learning of specific information. AGI models may unintentionally learn (and leak) confidential data.

Further Articles


Multimodality: attention is all you need is all we needed


When training our AI models, what we’re trying to build is a model of reality that captures the properties necessary to perform whatever task we’re trying to do....


Read more


AGI and knowledge: we have ways of making him talk

#GPT-3#Data Protection#AGI#Knowledge Graphs

Big AGI models can memorize very specific knowledge. While this enables new use-cases it also creates issues of data protection as they can be used to leak this knowledge....


Read more

Load more

Press & Announcements

Public Relations
Robo-writers: the rise and risks of language-generating AI

Leahy, currently a researcher at the start-up firm Aleph Alpha in Heidelberg, Germany, now leads an independent group of volunteer researchers called EleutherAI, which is aiming to create a GPT-3-sized model. The biggest hurdle, he says, is not code or training data but computation [...]

Matthew Hutson

Made in Germany – noch. Die neuen Gründer sind Deutschlands letzte Chance. Sie sichern mit avancierten Technologien die Industriejobs von morgen.

[Andrulis] ist überzeugt: „Die KI, die hinter modernen Verwaltungen und Regierungen stehen wird, muss europäisch sein.“ Nur so lasse sich sicherstellen, dass sensible Informationen nicht missbraucht würden. Und dass in diesem Geschäft auch Unternehmen mitmischen, die in Europa Arbeitsplätze schaffen und Steuern zahlen.

Dominik Reintjes, Thomas Stölzel

Aleph Alpha erhält 5,3 Millionen Euro für europäischen OpenAI-Konkurrenten

Aleph Alpha will dem US-amerikanischen OpenAI ein europäisches KI-Pendant gegenüberstellen. Es soll europäischen Werten und dem Datenschutz entsprechen.

Oliver Bünte

HEIDELBERGER START-UP ALEPH ALPHA: Deutscher Ex-Apple-Manager plant eine KI für Europa

Jonas Andrulis war von seiner Arbeit als hochrangiger KI-Entwickler bei Apple enttäuscht. Nun will er dem Valley auf eigene Faust Konkurrenz machen.

Christoph Kapalschinski

Europa muss dieses Projekt kopieren, sonst verliert es den Anschluss

Zusammen mit KI-Firmen wie Aleph Alpha in Heidelberg und den europäischen KI-Forschungsnetzwerken Claire und Ellis, die alle moderne KI-Verfahren vorantreiben, kann Europa all das selbst in die Hand nehmen. Noch ist Zeit.

Prof. Dr. Kristian Kersting

Machine Learning Street Talk #031 WE GOT ACCESS TO GPT-3! (With Gary Marcus, Walid Saba and Connor Leahy)

In this special edition, Dr. Tim Scarfe, Yannic Kilcher and Dr. Keith Duggar speak with Professor Gary Marcus, Dr. Walid Saba and Connor Leahy (Aleph Alpha) about GPT-3. We have all had a significant amount of time to experiment with GPT-3 and show you demos of it in use and the considerations. Do you think GPT-3 is a step towards AGI?

Yannic Kilcher, Dr. Tim Scarfe, Dr. Keith Duggar