A learning base generation application for handwriting recognition
Fourth year annual project of the INSA computer science department
Handwrittenand especially old documents aredifficult to read.To exploit them, we can use recognizers based onartificial intelligence..
However, automatic recognition requiresa lot of training data.Therefore, thousands of examples need to be annotated.
Taliesin facilitates theimport, slicingand annotationof handwritten documents and thus appears as a solution for the generation oflearning bases.A training base is a set of annotated examples containing images and their associated transcriptions.
Taliesin is a application for generating learning bases for handwriting recognition systems. These data allow recognizers to generate a model capable of making predictions on new documents.
To generate these learning databases, Taliesin offers an interface that facilitates the work of annotators. The training data sets are automatically generated thanks to deep neural network-based recognizers that annotate the different pages. In case of inconsistency, the user can modify the prediction manually using auto-completion. Once the image database is annotated, the user can export the examples and use them to train handwriting recognizers.
Our team is composed of seven students in their fourth year at the INSA Rennes INFORMATICS department.
We would like to thank all our partners as well as our supervisors Alexandre GIMENEZ PUIG and ERWAN FOUCHE engineers at Sopra Steria as well as Bertrand COUASNON teacher researcher INSA/IRISA