Google DeepMind introduces Aeneas, an AI model for reproducing ancient texts

Dmytro Dzhuhalyk - 24 July, 02:14 PM

The Roman world was rich in written language, visible on both imperial monuments and everyday objects, but only fragments of texts have survived to this day, damaged or deliberately defaced. To help historians reconstruct these inscriptions, the Google DeepMind team is introducing Aeneas, an open-source artificial intelligence model for restoring ancient texts.

Restoring, dating, and locating ancient inscriptions is virtually impossible without contextual information. Traditionally, historians have relied on their own expertise and specialized resources to identify "parallels"—texts that share common features in wording, syntax, standardized formulas, or provenance. Aeneas was designed to speed up this work by analyzing thousands of Latin inscriptions and finding textual and contextual parallels in seconds.

Aeneas can also adapt to other ancient languages, scripts and media, from papyri to coins. The large language model was developed in collaboration with the University of Nottingham and in partnership with researchers from the Universities of Warwick, Oxford and the Athens University of Economics and Business. Google now wants this research to benefit as many people as possible, so the model is fully open to researchers, students, teachers, museum workers and others at predictingthepast.com .

Advanced capabilities of the model include finding parallels in collections of Latin inscriptions, processing multimodal input data such as the geographical provenance of a text, and restoring gaps in texts. Google DeepMind currently says that Aeneas has leading performance and "sets a new standard in restoring damaged texts and predicting the time and place of their writing."

To train Aeneas, Google created a large dataset based on decades of work by historians. All records were cleaned, harmonized, and linked into a single "Latin Epigraphic Dataset" (LED), which collected over 176,000 Latin inscriptions. To work, the model uses a transformer-based decoder to process the text input of the inscription, after which specialized networks handle character recovery and dating using the text, and geographic attribution uses images of the inscriptions as input.

Each inscription in Aeneas is accompanied by a list of similar examples, which is formed using the method of “embedding” – a way of encoding the content and context of an inscription into a kind of historical profile. This approach takes into account the subject of the text, the language, the time and place of its creation, as well as its connections to other inscriptions.