The main objective of Shadoc team is to model the content of man-made data for written communication.

Studied data

Image recognition

The first field of interest is image recognition of documents. We work on scanned images of historical and recent documents with complex structures. We also consider digital native documents, such as PDFs, the structure of which is not always directly interpretable.

Analysis of Sequences / Time series

The team works on time series and information sequences in the field of analysis, interpretation and recognition according to several granularities and modalities. We consider first of all low-level time series associated with trajectories formed by handwritten traces or 2D/3D gestures. They come from different types of sensors: inertial, Pen-based and (Multi-)Touch Capture on touch screen, Motion capture, Kinect or Leap Motion sensor. At a higher level, time series are studied to provide context (temporal, spatial and semantic) and to develop evolutionary or incremental analysis and learning approaches. We can also consider many sequences in collections of documents.

Research directions

Combination / Hybridization

The originality of Shadoc team is to propose a combination between deep learning based systems and
syntactical ones. We study different implementations of combination:

  • The syntactical part brings contextual information to generative neural networks to make them able to converge
  • Some low level elements can be extracted using deep learning systems: text-lines, simple gestures, symbols. . . They are then combined using two dimensional grammars. This type of combination builds hybrid systems with greater generalization capabilities than neural-only systems, while requiring a smaller amount of annotated data ;
  • Combination of document structure recognition and handwriting recognition;
  • Combination of syntactical language models with transformers neural networks ;
  • Combination of handwriting recognition with explicit segmentation with Seq2Seq recognition.
  • Strong combination of two dimensional grammars and transformers, where syntactical rules drives the transformer architecture.

This exploration of different mechanisms of combination between syntactic and neural models allows
to reduce as much as possible the expression of a priori knowledge in syntactic form on elements that are
difficult to learn for deep neural networks (or at the cost of very large amounts of annotated data), while
taking advantage of the modeling capabilities of deep learning on elements that require less annotated data.

Learning with few data

Learning with few data is a regular limitation in our applications. On the one hand, works of the team
are done with humans. Thus, data have to be acquired with users, which limits the amount of data that
can be acquired.

Various approaches can be investigated to overcome this limitation. One way is to design network architectures which build a relevant latent representation of data, even if it is trained on a small training
set . Another way is to design a semi-supervised approach. These approaches allow to benefit of large set
of unsupervised data when only a small amount of labeled examples is available. The users can be involved in the labeling process through a semi-automatic approach, called active learning, for which a model selects data examples of interest which will be manually labeled by the user.

Self-adaptive systems

Building self-adaptive systems which can automatically adapt themselves to a new corpus of document
without any or with only few labeled data is a challenging objective. It can be reached by combining
syntactic and unsupervised deep learning methods.

Rejection capabilities

The construction of recognition systems with rejection capabilities is important both for the integration of
these systems in interactive processes, with humans or other systems, but also to be able to exploit automatically
generated annotations, and integrate them in semi-supervision processes. Indeed it is important to
select through rejection, when a human expert will be solicited to answer questions in an interactive system.
We will also study rejection capacities of deep neural networks to be able to select unsupervised annotated data to be used as new training data.

Comments are closed.