IRISA-KIHT-S and KIHT-public Datasets

IRISA-KIHT-S and KIHT-public Datasets

Links

Digital devices can help pupils and teachers in the learning process by promoting active learning techniques and providing immediate feedbacks. The e-learning literature shows that computer-based analysis of handwriting can be really accurate, sensitive, and reliable to produce relevant and consistent feedbacks for correction or guidance.

The IRISA-KIHT-S dataset was presented in [1] for a task of handwritting reconstruction from the sensor data. The sensor data come from a digital pen called the STABILO Digipen. Noted that these data can also be used for classification purposes.

Conditions of Use

1. Purpose and Scope

  • 1.1 The database is provided for research purposes only.
  • 1.2 Users must agree to use the database solely for academic, educational, or scientific research. Commercial use is strictly prohibited unless explicitly authorized in writing.

2. Citations

  • 2.1 For publications using the IRISA-KIHT-S database, please quote the reference: [1]
  • 2.2 For publications using the KIHT-Public database, please quote the reference:[2]

Datasets description

These datasets are composed of 30 recordings for the IRISA-KIHT-S dataset and 149 recordings for the KIHT-Public datasets.

Every recording session generates files from the data acquisition mobile app. The sensor signals file has 13 columns: milliseconds, accelerometer front (x, y, z), accelerometer rear  (x, y, z), gyroscope (x, y, z), magnetometer  (x, y, z), and force signals. Tablet signal files contain milliseconds, position coordinates (x, y, z), and pressure force signals.

The transcription (labels) file contains labels and the start and stop time-stamps for every sample. Additional files concerning the sensor calibration and recording meta data are provided.

Data Acquisition

The recording process begins by selecting a set of predefined scripts to be written on the tablet surface using the Digipen. These two data sets are made up of the following two recording types:

  • KIHT_TABLET_MIXED, consists of 34 samples to be written one by one during a single recording session. It is composed of five groups: 15 characters, 10 words, 5 equations, 2 shapes and 2 word groups.
  • KIHT_TABLET_MIXED_EXTENDED, consists of 57 samples to be written one by one during a single recording session. It is composed of five groups: 30 characters, 10 words, 5 equations, 4 shapes and 8 word groups.

While recording, a user holds the pen’s on/off switch up, which is a natural way to take the Digipen due to grips designed on the pen to naturally position the fingers properly.

Sensors

Each Digipen is equipped with five sensors.

  • Front accelerometer (STM LSM6DSL)
  • Gyroscope (STM LSM6DSL)
  • Rear accelerometer (Freescale MMA8451Q)
  • Magnetometer (ALPS HSCDTD008A)
  • Force sensor (ALPS HSFPAR003A)

Sensor Data

The sensors’ raw data stream is provided in the files called sensor_data.csv. Each file consists of 15 columns:

  • Millis: The timestamp when the data were processed on the tablet computer that the pen was connected to during recording
  • Acc1 X, Acc1 Y, Acc1 Z: The values of the front accelerometer in three dimensions
  • Acc2 X, Acc2 Y, Acc2 Z: The values of the rear accelerometer in three dimensions
  • Gyro X, Gyro Y, Gyro Z: The gyroscope values in three dimensions
  • Mag X, Mag Y, Mag Z: The magnetometer values in three dimensions
  • Force: The force with which the pen tip touches the surface
  • Time: A sample counter

Citation

If you use the IRISA-KIHT-S or KIHT-Public dataset, please cite:

[1] Swaileh, W., Imbert, F., Soullard, Y. et al. Online handwriting trajectory reconstruction from kinematic sensors using temporal convolutional network. IJDAR (2023).

[2] Florent Imbert, Eric Anquetil, Yann Soullard, Romain Tavenard. Mixture-of-experts for handwriting trajectory reconstruction from IMU sensors. Pattern Recognition, 2024, 161, pp.111231. ⟨10.1016/j.patcog.2024.111231⟩. ⟨hal-04811975⟩.