Presentation

Handwritten documents, especially from archives, are full of valuable data, but need to be digitized to facilitate their use. Artificial intelligences can be used to achieve this, but they require large learning bases to train and improve their results.
Thus, Taliesin, a web application born of a desire to facilitate the transcription of these handwritten documents into digital ones, allows the generation of learning bases to train these artificial intelligences in handwriting recognition.
Taliesin, through an easy-to-use ergonomic interface, allows the transcription paragraph by paragraph, sentence by sentence or word by word of any handwritten document. Composed of several artificial intelligence algorithms, the application merges their results to obtain a unique and accurate transcription.

Computer science project of the fourth-year students of INSA Rennes

How does the application works ?

description edit

From a handwritten document

A unique transcription of the document is obtained after the merge of the results

backup storage

Once the final result is obtained, it's added to the database


The transcription of the document is made by various artificial intelligence

Helo wonderful worlcl !

Then, the user can correct the document with the help of the color code



Merging the results of the recognizers

To improve the transcription speed and to ease the correction by the user,
Taliesin merges the results of the different document recognizers. Then, this complex algorithm returns a unique result to the user.

The recognizers allow the transciption of handwritten documents :
CRNN trained with the database "Read"
CRNN trained with the database "IAM"
PyLaia trained with the database "IAM"

A new merging algorithm retrieve the three results.
Comparing the, the algorithm weight every word to return the most realistic transcription possible.

The final response is colored with a particular color code, easing the correction for users:
- A word in red : the transcription is unreliable, a correction is needed.
- A word in orange : there are doubts about the transcription, a check is needed.
-A word in black : the transciption is considered correct.


Interface

Our team

Our team is composed of eight fourth-year computer science students of INSA Rennes.

Marine ANIS

Rémi BOUCHER

Chloé Marcoz

Nathan MAURY

Killan MOAL

Nathan MOUREAUX

Sarah OURY

Aymeric SANCHEZ

Our partners

We would like to thank all our partners as well as our supervisors Alexandre GIMENEZ PUIG engineers at Sopra Steria and Bertrand COUASNON an INSA teacher-researcher.

Archives Départementales d'Ille-et-Vilaine
Archives

The Ille-et-Vilaine's departemental archives provide us with handwritten documents and are part of the Taliesin beta testers.

Sopra Steria
Sopra Steria

A french digital services company from which Alexandre GIMENEZ came. He helps us with the management of an Agile project, and with the technical part of the project.

INSA Rennes
INSA Rennes

Our engineering school thanks to which we were able to realize this project.

Doptim
Doptim

A company creating AI and Big Data solutions. One of its goals is to enable the transcription of old handwritten parish registers into digital text. Doptim is one of Taliesin's beta-testers.

IntuiDoc
IntuiDoc

The IRISA's (a research institute in informatic) IntuiDoc team focuses its research on handwriting, gesture and document processing. The team provides us with recognizers and are beta-testers of Taliesin.