Description
This dataset is composed of 40 adult writers, acquired on Paper. 5 recordings are dedicated to the test set, from different writers than in training. This refers to 2983 examples in training and 281 in the test set.
Every recording session generates files from the data acquisition mobile app. The sensor signals file has 13 columns: milliseconds, rear accelerometer (x, y, z), gyroscope (x, y, z), magnetometer (x, y, z), and force signals. Tablet signal files contain milliseconds, position coordinates (x, y, z), and pressure force signals. Other KIHT datasets contains an additional front accelerometer (x,y,z) which is not taken into account in the experiments when training on paper and tablet data jointly.
The transcription (labels) file contains labels and the start and stop time-stamps for every sample. Additional files concerning the sensor calibration and recording meta data are provided.
Data Acquisition
The recording process begins by selecting a set of predefined scripts to be written on the tablet surface using the Digipen. While recording, the user holds the pen’s on/off switch up, which is a natural way to take the Digipen due to grips designed on the pen to naturally position the fingers properly. The dataset is detailed below.
| characters | words | sentences | equations | drawings | |
| Train | 1411 | 404 | 276 | 185 | 147 |
| Valid | 334 | 98 | 56 | 42 | 30 |
| Test | 147 | 50 | 39 | 25 | 20 |
References
If you use the KIHT-Paper dataset, you agree to cite the following reference:
[1] Paper in submission
How to get the dataset ?
Before downloading the dataset, you agree that this dataset is under the CLIC licence and can only be used for research purposes. To receive the download link, please complete the following contact form.