Deep learning with Multilingual Transformers for Image-to-Text recognition

The goal of this internship is to design a Transformer model and/or a learning strategy to recognize multiple languages and multiple layouts with a single model. See the pdf of the subject.