Article by William Mocaër
Online action detection is a challenging task in computer vision that involves recognizing and localizing human actions in real-time video streams. To facilitate research in this area, several datasets have been developed, each offering unique characteristics and challenges. In this article, we will present six datasets widely used for skeleton-based online action detection: G3D, OAD, MSRC-12, MAD, Chalearn, and PKUMMD. Links are available for download.
Warning: Only the skeleton modality can be downloaded here. For more modalities, please refer to the original websites mentionned in the corresponding sections below.
All datasets are converted in the same format (see format section). Split files (train/validation/test) are also available in the zips.
- Download G3D : https://www.irisa.fr/intuidoc/data/database/G3D.zip
- Download OAD : https://www.irisa.fr/intuidoc/data/database/OAD.zip
- Download MSRC-12 : https://www.irisa.fr/intuidoc/data/database/MSRC12.zip
- Download only the subset MSRC6-IconicC4 : https://www.irisa.fr/intuidoc/data/database/MSRC6_IconicC4.zip
- Download MAD : https://www.irisa.fr/intuidoc/data/database/MAD.zip
- Download Chalearn : https://www.irisa.fr/intuidoc/data/database/Chalearn.zip
- Download PKUMMD : https://www.irisa.fr/intuidoc/data/database/PKUMMD.zip
For untrimmed 2D gestures bases (ILGDB and MTGSetB), see https://www-shadoc.irisa.fr/mtgsetb-and-ilgdb-untrimmed/
Format
Data
- 1 file = 1 sequence
- 1 line = 1 frame
- 1 line = X groups of 3 position values (x,y,z), where X is the joints count. (Traditional order or the kinect joints, first joint is the root (hipcenter/spinebase))
Label
- 1 file = 1 sequence
- Name of the file is exactly the same as the corresponding data
- 1 line = 1 gesture
- 1 line is decomposed into 3 or 4 values : “Class id, start frame, end frame[, Action point Frame” .
For PKU-MMD the 4th element is a “confidence” :
Note that $confidence$ is either $1$ or $2$ for slight and strong recommendation respectively.
https://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html
Actions.csv
contains the list of the action classes
- 1 line = 1 class
- the 0 is always “nothing” (never used to label anything in the label files)
- format of a line is
id;class name
More than one Action.csv can be provided if necessary, for exemple a “Action1pers.csv” is provided with PKU-MMD, providing new ids for actions which concerns only the 1-skeleton sequence subset. A “Label1pers” folder is then provided with the corresponding ids.
Split files
A split file contains 4 or 6 lines, it contains the list of files that should be use for training, [validation] and testing. Files are separated by a comma “,”.
Example :
Train files: 1.txt,2.txt, ... [Validation files : 3.txt, .... ] Test files: 4.txt, 8.txt, ....
Datasest descriptions
G3D Dataset
Full name : Gaming 3D
Paper :
V. Bloom, D. Makris and V. Argyriou, "G3D: A gaming action dataset and real time action recognition evaluation framework," 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 7-12, doi: 10.1109/CVPRW.2012.6239175.
Original link : http://velastin.dynu.com/G3D/G3D.html
20 Actions in total, separated into 7 categories: Fighting, golf,tennis, bowling, FPS, driving, and miscellaneous actions. Fighting is the most used category in the litterature.
The dataset consists of 20 gaming actions performed by 10 subjects.
Each of the 30 sequences in the fighting category contains these five actions in the same order.
G3D has both frame-level and action point annotations.
To enhance the evaluation and challenge the recognition systems further, we proposed an extended test set “re-arranged”. new sequences are identified as “test….” in the data. be sure not using the original corresponding sequence in the training since the new test is generated from these 3 original sequences. Please see splitFighting_unbiased for a correct split.
Some errors, mentionned in the pdf in the .zip, has been corrected in our version of the dataset.
Download G3D : https://www.irisa.fr/intuidoc/data/database/G3D.zip
Sequence examples :
OAD
Full name : Online Action Detection
Paper :
Li, Y., Lan, C., Xing, J., Zeng, W., Yuan, C., Liu, J. (2016). Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9911. Springer, Cham. https://doi.org/10.1007/978-3-319-46478-7_13
Original link : https://www.icst.pku.edu.cn/struct/Projects/OAD.html
Collected with Kinect v2, 10 classes, 700 action instances, 59 sequences, 8 fps.
Download OAD : https://www.irisa.fr/intuidoc/data/database/OAD.zip
Sequence examples :
MSRC-12 and MSRC6_IconicC4
Full name : Microsoft Research Cambridge-12
Paper :
Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). Association for Computing Machinery, New York, NY, USA, 1737–1746.
Original link: https://www.microsoft.com/en-us/download/details.aspx?id=52283
5 Intruction modalities:
- Images
- Text
- Video
- Images + Text
- Video + Text
12 classes, 30 subjects, 2 categories: Iconic and metaphoric. 30 Fps.
The subset MSRC6_IconicC4 refers to Iconic category and only the C4 modality (Video + text). It has been used for some experiments in the litterature, like early recognition (Boulahia et al. 2018 RFIAP), and by Bloom et al.
Download MSRC-12 : https://www.irisa.fr/intuidoc/data/database/MSRC12.zip
Download the only the subset MSRC6-IconicC4 : https://www.irisa.fr/intuidoc/data/database/MSRC6_IconicC4.zip
Sequence examples:
MAD
Full name: Multi-Modal Action Detection
Paper:
Huang, D., Yao, S., Wang, Y., De La Torre, F. (2014). Sequential Max-Margin Event Detectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_27
Original link : http://humansensing.cs.cmu.edu/mad/download.html
40 sequences. 35 actions, always in the same order.
To enhance the evaluation and challenge the recognition systems further, we proposed an extended test set “re-arranged”. new sequences are identified as “test….” in the data. Be sure not using the original corresponding sequence in the training, Please see split_unbiased for a correct split.
Download MAD : https://www.irisa.fr/intuidoc/data/database/MAD.zip
Sequence examples:
Chalearn (2013) / Montalbano V1
Full name : ChaLearn Gesture dataset (2013). Seem to be renamed as Montalbano V1 later.
Paper:
Sergio Escalera, Jordi Gonzàlez, Xavier Baró, Miguel Reyes, Oscar Lopes, Isabelle Guyon, Vassilis Athitsos, and Hugo Escalante. 2013. Multi-modal gesture recognition challenge 2013: dataset and results. In Proceedings of the 15th ACM on International conference on multimodal interaction (ICMI '13). Association for Computing Machinery, New York, NY, USA, 445–452. https://doi.org/10.1145/2522848.2532595
Original link: http://sunai.uoc.edu/chalearn [Not available anymore]
New Original link : https://chalearnlap.cvc.uab.cat/dataset/12/data/8/description/
- 27 users
- 20 Italian gesture classes
- 20 fps
- the gestures are performed in continuous sequences lasting 1-2 minutes (8-20 actions)
- a single user is recorded
- Number of sequences :
- development: 393 (7.754 gestures),
- validation: 287 (3.362 gestures)
- test: 276 (2.742 gestures) → no per frame annotation, not used in most of the works.
Download Chalearn : https://www.irisa.fr/intuidoc/data/database/Chalearn.zip
Sequence examples :
PKU-MMD
Full name: PKU (Peking University) Multi-Modality Dataset
Paper:
Chunhui, Liu and Yueyu, Hu and Yanghao, Li and Sijie, Song and Jiaying, Liu. PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding. arXiv preprint arXiv:1703.07475,2017.
Original link: https://www.icst.pku.edu.cn/struct/Projects/PKUMMD.html
the largest action detection dataset for 3D data
- 51 actions, decomposed into 43 single-skeleton classes and 8 two-skeleton classes.
- 30 fps
- Kinect V2
- 1076/3 Long sequences. 3 Views for the same sequences
- 57 subjects
- Two protocols : cross-view and cross subjects.
Download PKUMMD : https://www.irisa.fr/intuidoc/data/database/PKUMMD.zip
Sequence examples :
Contact
For any question, contact William Mocaër or Eric Anquetil
william.mocaer@irisa.fr, eric.anquetil@irisa.fr