Handwritten documents, especially from archives, are full of valuable data, but need to be digitized to facilitate their use.
Artificial intelligences can be used to achieve this, but they require large learning bases to train and improve their results.
Thus, Taliesin, a web application born of a desire to facilitate the transcription of these handwritten documents into digital ones,
allows the generation of learning bases to train these artificial intelligences in handwriting recognition.
Taliesin, through an easy-to-use ergonomic interface, allows the transcription paragraph by paragraph, sentence by sentence or word by word of any handwritten document.
Composed of several artificial intelligence algorithms, the application merges their results to obtain a unique and accurate transcription.
Computer science project of the fourth-year students of INSA Rennes
From a handwritten document
A unique transcription of the document is obtained after the merge of the results
Once the final result is obtained, it's added to the database
The transcription of the document is made by various artificial intelligence
Then, the user can correct the document with the help of the color code
To improve the transcription speed and to ease the correction by the user,
Taliesin merges the results of the different document recognizers.
Then, this complex algorithm returns a unique result to the user.
The recognizers allow the transciption of handwritten documents :
CRNN trained with the database "Read"
CRNN trained with the database "IAM"
PyLaia trained with the database "IAM"
A new merging algorithm
retrieve the three results.
Comparing the, the algorithm weight every word to return the most realistic transcription possible.
The final response is colored with a particular color code, easing the correction for users:
- A word in red : the transcription is unreliable, a correction is needed.
- A word in orange : there are doubts about the transcription, a check is needed.
-A word in black : the transciption is considered correct.
Our team is composed of eight fourth-year computer science students of INSA Rennes.
We would like to thank all our partners as well as our supervisors Alexandre GIMENEZ PUIG engineers at Sopra Steria and Bertrand COUASNON an INSA teacher-researcher.