IRISA-KIHT-S and KIHT-public Datasets

IRISA-KIHT-S and KIHT-public Datasets


Digital devices can help pupils and teachers in the learning process by promoting active learning techniques and providing immediate feedbacks. The e-learning literature shows that computer-based analysis of handwriting can be really accurate, sensitive, and reliable to produce relevant and consistent feedbacks for correction or guidance.

The IRISA-KIHT-S dataset was presented in [1] for a task of handwritting reconstruction from the sensor data. The sensor data come from a digital pen called the STABILO Digipen. Noted that these data can also be used for classification purposes.

The IRISA-KIHT-S dataset is available free of charge for research community and for research purposes only. For publications using this database, please quote the reference below. [1]

Datasets description

These datasets are composed of 30 recordings for the IRISA-KIHT-S dataset and 149 recordings for the KIHT-Public datasets.

Every recording session generates files from the data acquisition mobile app. The sensor signals file has 13 columns: milliseconds, accelerometer front (x, y, z), accelerometer rear  (x, y, z), gyroscope (x, y, z), magnetometer  (x, y, z), and force signals. Tablet signal files contain milliseconds, position coordinates (x, y, z), and pressure force signals.

The transcription (labels) file contains labels and the start and stop time-stamps for every sample. Additional files concerning the sensor calibration and recording meta data are provided.

Data Acquisition

The recording process begins by selecting a set of predefined scripts to be written on the tablet surface using the Digipen. These two data sets are made up of the following two recording types:

  • KIHT_TABLET_MIXED, consists of 34 samples to be written one by one during a single recording session. It is composed of five groups: 15 characters, 10 words, 5 equations, 2 shapes and 2 word groups.
  • KIHT_TABLET_MIXED_EXTENDED, consists of 57 samples to be written one by one during a single recording session. It is composed of five groups: 30 characters, 10 words, 5 equations, 4 shapes and 8 word groups.

While recording, a user holds the pen’s on/off switch up, which is a natural way to take the Digipen due to grips designed on the pen to naturally position the fingers properly.


Each Digipen is equipped with five sensors.

  • Front accelerometer (STM LSM6DSL)
  • Gyroscope (STM LSM6DSL)
  • Rear accelerometer (Freescale MMA8451Q)
  • Magnetometer (ALPS HSCDTD008A)
  • Force sensor (ALPS HSFPAR003A)

Sensor Data

The sensors’ raw data stream is provided in the files called sensor_data.csv. Each file consists of 15 columns:

  • Millis: The timestamp when the data were processed on the tablet computer that the pen was connected to during recording
  • Acc1 X, Acc1 Y, Acc1 Z: The values of the front accelerometer in three dimensions
  • Acc2 X, Acc2 Y, Acc2 Z: The values of the rear accelerometer in three dimensions
  • Gyro X, Gyro Y, Gyro Z: The gyroscope values in three dimensions
  • Mag X, Mag Y, Mag Z: The magnetometer values in three dimensions
  • Force: The force with which the pen tip touches the surface
  • Time: A sample counter


If you use the IRISA-KIHT-S dataset, please cite:

[1] Swaileh, W., Imbert, F., Soullard, Y. et al. Online handwriting trajectory reconstruction from kinematic sensors using temporal convolutional network. IJDAR (2023).


  TITLE = {{Online Handwriting Trajectory Reconstruction from Kinematic Sensors using Temporal Convolutional Network}},
  AUTHOR = {Swaileh, Wassim and Imbert, Florent and Soullard, Yann and Tavenard, Romain and Anquetil, Eric},
  URL = {},
  JOURNAL = {{International Journal on Document Analysis and Recognition}},
  PUBLISHER = {{Springer Verlag}},
  YEAR = {2023},
  KEYWORDS = {Online Handwriting ; Trajectory Reconstruction ; Digital Pen ; Temporal Convolutional Neural Network ; Inertial Measurement Units},
  PDF = {},
  HAL_ID = {hal-04076399},
  HAL_VERSION = {v2},