TOPPAN Group Develops AI-OCR Engine Capable of Deciphering Medieval Greek
TOPPAN has developed an AI-OCR engine to decode medieval Greek, applying their expertise from Japanese cursive script decoding. They aim for over 95% accuracy using Vatican Library data.
📋 Article Processing Timeline
- 📰 Published: April 7, 2026 at 19:02
- 🔍 Collected: April 7, 2026 at 10:30
- 🤖 AI Analyzed: April 21, 2026 at 00:06 (325h 35m after Collected)
TOPPAN Holdings Inc. (Headquarters: Bunkyo-ku, Tokyo; President & COO: Satoshi Oya; hereinafter "TOPPAN Holdings") and its group company TOPPAN Inc. (Headquarters: Bunkyo-ku, Tokyo; President & CEO: Haruhiko Noguchi; hereinafter "TOPPAN") have developed an AI-OCR engine (hereinafter "this AI-OCR engine") capable of deciphering medieval Greek, which is generally considered difficult to read.
Moving forward, by utilizing image and text data of Greek manuscripts from the Vatican Apostolic Library—which has a cooperative relationship with the Printing Museum operated by TOPPAN Holdings—and by continuously accumulating training data and improving precision, they aim to achieve a recognition accuracy of 95% or higher with this AI-OCR engine.
The results of this initiative will be demonstrated at the special exhibition "Birth of Masterpieces Vatican Apostolic Library III+" to be held at the Printing Museum starting Saturday, April 25, 2026.
■ Background of the Development of this AI-OCR Engine
While old documents record diverse information regarding historically valuable facts and regional cultures, much of it is written in handwritten characters that are difficult for modern people to decipher. Accurately deciphering these contents and passing on the culture has become a global social issue, not just limited to Japan.
For approximately 30 years, the TOPPAN Group has collaborated on multiple projects with the Vatican Apostolic Library to promote cultural preservation. The Vatican Apostolic Library has made part of its collection of over 2 million items publicly available as high-definition images in the IIIF (*1) format for the purpose of promoting research and educational use. The number of published images exceeds 9 million and continues to expand steadily. In addition, data structuring of additional information such as "transcriptions (*2)" and "annotations" is being carried out for some Greek manuscript images. However, to deploy this additional information across the entire collection, highly specialized personnel capable of deciphering medieval Greek needed to perform tasks over a long period.
To support the research and utilization of precious historical materials across Japan, TOPPAN has been working on deciphering old documents written in "Kuzushiji" (cursive script), which are difficult for modern people to read. In 2015, they began R&D on "Kuzushiji OCR" to decipher Kuzushiji using AI image recognition technology, and have since engaged in collaborations with various research institutions and organized events. Furthermore, they launched the old document deciphering and utilization service "Fuminoha®" in 2021, and the smartphone app "Komonjo Camera®" in 2023, which allows the general public to easily decipher old documents.
Against this background, TOPPAN has now developed an AI-OCR engine capable of deciphering medieval Greek, utilizing the AI-OCR technology and knowledge cultivated thus far in the deciphering of Kuzushiji.
■ Features of this AI-OCR Engine
・Deciphering Medieval Greek
Medieval Greek is characterized by inconsistent notation: letter shapes vary depending on the era and the writer, parts of words are sometimes omitted, and spellings different from modern usage are employed. Additionally, sentences were sometimes written without spaces between words, making it extremely difficult for modern people without specialized knowledge to read. In this AI-OCR engine, the deciphering of medieval Greek characters was realized by preparing a database of one million character shapes and lines as training data.
Moving forward, by utilizing image and text data of Greek manuscripts from the Vatican Apostolic Library—which has a cooperative relationship with the Printing Museum operated by TOPPAN Holdings—and by continuously accumulating training data and improving precision, they aim to achieve a recognition accuracy of 95% or higher with this AI-OCR engine.
The results of this initiative will be demonstrated at the special exhibition "Birth of Masterpieces Vatican Apostolic Library III+" to be held at the Printing Museum starting Saturday, April 25, 2026.
■ Background of the Development of this AI-OCR Engine
While old documents record diverse information regarding historically valuable facts and regional cultures, much of it is written in handwritten characters that are difficult for modern people to decipher. Accurately deciphering these contents and passing on the culture has become a global social issue, not just limited to Japan.
For approximately 30 years, the TOPPAN Group has collaborated on multiple projects with the Vatican Apostolic Library to promote cultural preservation. The Vatican Apostolic Library has made part of its collection of over 2 million items publicly available as high-definition images in the IIIF (*1) format for the purpose of promoting research and educational use. The number of published images exceeds 9 million and continues to expand steadily. In addition, data structuring of additional information such as "transcriptions (*2)" and "annotations" is being carried out for some Greek manuscript images. However, to deploy this additional information across the entire collection, highly specialized personnel capable of deciphering medieval Greek needed to perform tasks over a long period.
To support the research and utilization of precious historical materials across Japan, TOPPAN has been working on deciphering old documents written in "Kuzushiji" (cursive script), which are difficult for modern people to read. In 2015, they began R&D on "Kuzushiji OCR" to decipher Kuzushiji using AI image recognition technology, and have since engaged in collaborations with various research institutions and organized events. Furthermore, they launched the old document deciphering and utilization service "Fuminoha®" in 2021, and the smartphone app "Komonjo Camera®" in 2023, which allows the general public to easily decipher old documents.
Against this background, TOPPAN has now developed an AI-OCR engine capable of deciphering medieval Greek, utilizing the AI-OCR technology and knowledge cultivated thus far in the deciphering of Kuzushiji.
■ Features of this AI-OCR Engine
・Deciphering Medieval Greek
Medieval Greek is characterized by inconsistent notation: letter shapes vary depending on the era and the writer, parts of words are sometimes omitted, and spellings different from modern usage are employed. Additionally, sentences were sometimes written without spaces between words, making it extremely difficult for modern people without specialized knowledge to read. In this AI-OCR engine, the deciphering of medieval Greek characters was realized by preparing a database of one million character shapes and lines as training data.