List of datasets in computer vision and image processing
This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.
Object detection and recognition
3D Objects
See for a review of 33 datasets of 3D object as of 2015. See for a review of more datasets as of 2022.| Dataset Name | Brief description | Preprocessing | Instances | Format | Default Task | Created | Reference | Creator |
| Princeton Shape Benchmark | 3D polygonal models collected from the Internet | 1814 models in 92 categories | 3D polygonal models, categories | shape-based retrieval and analysis | 2004 | Shilane et al. | ||
| Berkeley 3-D Object Dataset | Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories. | 849 images, in 75 scenes | color image, depth image, object class, bounding boxes, 3D center points | Predict bounding boxes | 2011, updated 2014 | Janoch et al. | ||
| ShapeNet | 3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories. | 3,000,000 models, 220,000 of which are classified. | 3D models, class labels | Predict class label. | 2015 | Chang et al. | ||
| ObjectNet3D | Images, 3D shapes, and objects 100 categories. | 90127 images, 201888 objects, 44147 3D shapes | images, 3D shapes, object bounding boxes, category labels | recognizing the 3D pose and 3D shape of objects from 2D images | 2016 | Xiang et al. | ||
| Common Objects in 3D | Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk. | 6 million frames from 40000 videos | multi-view images, camera poses, 3D point clouds, object category | Predict object category. Generate objects. | 2021, updated 2022 as CO3Dv2 | Meta AI | ||
| Google Scanned Objects | Scanned objects in SDF format. | over 10 million | 2022 | Google AI | ||||
| Objectverse-XL | 3D objects | over 10 million | 3D objects, metadata | novel view synthesis, 3D object generation | 2023 | Deitke et al. | ||
| OmniObject3D | Scanned objects, labelled in 190 daily categories | 6,000 | textured meshes, point clouds, multiview images, videos | robust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation | 2023 | Wu et al. | ||
| UnCommon Objects in 3D | 1,070 categories in the LVIS | 2025 | Meta AI |
Object detection and recognition for autonomous vehicles
| Dataset Name | Brief description | Preprocessing | Instances | Format | Default Task | Created | Reference | Creator |
| Cityscapes Dataset | Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included. | Pixel-level segmentation and labeling | 25,000 | Images, text | Classification, object detection | 2016 | Daimler AG et al. | |
| German Traffic Sign Detection Benchmark Dataset | Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries. | Signs manually labeled | 900 | Images | Classification | 2013 | S. Houben et al. | |
| KITTI Vision Benchmark Dataset | Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners. | Many benchmarks extracted from data. | >100 GB of data | Images, text | Classification, object detection | 2012 | A. Geiger et al. | |
| FieldSAFE | Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization. | Classes labelled geographically. | >400 GB of data | Images and 3D point clouds | Classification, object detection, object localization | 2017 | M. Kragh et al. | |
| Daimler Monocular Pedestrian Detection dataset | It is a dataset of pedestrians in urban environments. | Pedestrians are box-wise labeled. | Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels. | Images | Object recognition and classification | 2006 | Daimler AG | |
| CamVid | The Cambridge-driving Labeled Video Database is a collection of videos. | The dataset is labeled with semantic labels for 32 semantic classes. | over 700 images | Images | Object recognition and classification | 2008 | Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla | |
| RailSem19 | RailSem19 is a dataset for understanding scenes for vision systems on railways. | The dataset is labeled semanticly and box-wise. | 8500 | Images | Object recognition and classification, scene recognition | 2019 | Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai | |
| BOREAS | BOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS. | The data is annotated by 3D bounding boxes. | 350 km of driving data | Images, Lidar and Radar data | Object recognition and classification, scene recognition | 2023 | Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot | |
| Bosch Small Traffic Lights Dataset | It is a dataset of traffic lights. | The labeling include bounding boxes of traffic lights together with their state. | 5000 images for training and a video sequence of 8334 frames for evaluation | Images | Traffic light recognition | 2017 | Karsten Behrendt, Libor Novak, Rami Botros | |
| FRSign | It is a dataset of French railway signals. | The labeling include bounding boxes of railway signals together with their state. | more than 100000 | Images | Railway signal recognition | 2020 | Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri | |
| GERALD | It is a dataset of German railway signals. | The labeling include bounding boxes of railway signals together with their state. | 5000 | Images | Railway signal recognition | 2023 | Philipp Leibner, Fabian Hampel, Christian Schindler | |
| Multi-cue pedestrian | Multi-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians. | The databaset is labeled box-wise. | 1092 image pairs with 1776 boxes for pedestrians | Images | Object recognition and classification | 2009 | Christian Wojek, Stefan Walk, Bernt Schiele | |
| RAWPED | RAWPED is a dataset for detection of pedestrians in the context of railways. | The dataset is labeled box-wise. | 26000 | Images | Object recognition and classification | 2020 | Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver | |
| OSDaR23 | OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways. | The databaset is labeled box-wise. | 16874 frames | Images, Lidar, Radar and Infrared | Object recognition and classification | 2023 | Roman Tilly, Rustam Tagiew, Pavel Klasek ; Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel ; Karsten Schwalbe | |
| Agroverse | Argoverse is a multi-sensory dataset for detection of objects in the context of roads. | The dataset is annotated box-wise. | 320 hours of recording | Data from 7 cameras and LiDAR | Object recognition and classification, object tracking | 2022 | Argo AI, Carnegie Mellon University, Georgia Institute of Technology | |
| Rail3D | Rail3D is a LiDAR dataset for railways recorded in Hungary, France, and Belgium | The dataset is annotated semantically | 288 million annotated points | LiDAR | Object recognition and classification, object tracking | 2024 | Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine | |
| WHU-Railway3D | WHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in China | The dataset is annotated semantically | 4.6 billion annotated data points | LiDAR | Object recognition and classification, object tracking | 2024 | Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University | |
| RailFOD23 | A dataset of foreign objects on railway catenary | The dataset is annotated boxwise | 14,615 images | Images | Object recognition and classification, object tracking | 2024 | Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology | |
| ESRORAD | A dataset of images and point clouds for urban road and rail scenes from Le Havre and Rouen | The dataset is annotated boxwise | 2,700 k virtual images and 100,000 real images | Images, LiDAR | Object recognition and classification, object tracking | 2022 | Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies | |
| RailVID | Data recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight. | The dataset is annotated semantically | 1,071 images | infrared images | Object recognition and classification, object tracking | 2022 | Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University | |
| RailPC | LiDAR dataset in the context of railways | The dataset is annotated semantically | 3 billion data points | LiDAR | Object recognition and classification, object tracking | 2024 | Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology | |
| RailCloud-HdF | LiDAR dataset in the context of railways | The dataset is annotated semantically | 8060.3 million data points | LiDAR | Object recognition and classification, object tracking | 2024 | Mahdi Abid, Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium | |
| RailGoerl24 | RGB and LiDAR dataset in the context of railways | The dataset is annotated boxwise | 12205 HD RGB frames and 383922305 LiDAR colored cloud points | RGB, LiDAR | Person recognition and classification | 2025 | Rustam Tagiew, Ilkay Wunderlich, Philipp Zanitzer, Mark, Sastuba, Carsten Knoll, Kilian Göller, Haadia Amjad, Steffen Seitz | |
| MRSI | RGB and Infrared dataset in the context of railways | The dataset is annotated boxwise and pixelwise, eleven classes including background | 23000 RGB images and 4000 infrared images | RGB, Infrared | Object recognition and classification | 2022 | Yihao Chen, Ning Zhu, Qian Wu, Cheng Wu, Weilong Niu and Yiming Wang | |
| RailDriVE February 2019 | Data Set for Rail Vehicle Positioning Experiments | The dataset is not annotated | 26:46 min back and forward driving on an 1.2 km track segment | GNSS, IMU, Speed/distance sensors, RGB | Lokalisation and mapping | 2019 | Hanno Winter, Michael Helmut Roth |
Facial recognition
In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See for a curated list of datasets, focused on the pre-2005 period.| Dataset name | Brief description | Preprocessing | Instances | Format | Default task | Created | Reference | Creator |
| Labeled Faces in the Wild | Images of named individuals obtained by Internet search. | frontal face detection, bounding box cropping | 13233 images of 5749 named individuals | images, labels | unconstrained face recognition | 2008 | Huang et al. | |
| Aff-Wild | 298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect ; in-the-wild setting; color database; various resolutions | the detected faces, facial landmarks and valence-arousal annotations | ~1,250,000 manually annotated images | video | affect recognition | 2017 | CVPR IJCV | D. Kollias et al. |
| Aff-Wild2 | 558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect dimensional affect action units ; in-the-wild setting; color database; various resolutions | the detected faces, detected and aligned faces and annotations | ~2,800,000 manually annotated images | video | affect recognition | 2019 | BMVC FG | D. Kollias et al. |
| FERET (facial recognition technology) | 11338 images of 1199 individuals in different positions and at different times. | None. | 11,338 | Images | Classification, face recognition | 2003 | United States Department of Defense | |
| Ryerson Audio-Visual Database of Emotional Speech and Song | 7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities. | Files labelled with expression. Perceptual validation ratings provided by 319 raters. | 7,356 | Video, sound files | Classification, face recognition, voice recognition | 2018 | S.R. Livingstone and F.A. Russo | |
| SCFace | Color images of faces at various angles. | Location of facial features extracted. Coordinates of features given. | 4,160 | Images, text | Classification, face recognition | 2011 | M. Grgic et al. | |
| Yale Face Database | Faces of 15 individuals in 11 different expressions. | Labels of expressions. | 165 | Images | Face recognition | 1997 | J. Yang et al. | |
| Cohn-Kanade AU-Coded Expression Database | Large database of images with labels for expressions. | Tracking of certain facial features. | 500+ sequences | Images, text | Facial expression analysis | 2000 | T. Kanade et al. | |
| JAFFE Facial Expression Database | 213 images of 7 facial expressions posed by 10 Japanese female models. | Images are cropped to the facial region. Includes semantic ratings data on emotion labels. | 213 | Images, text | Facial expression cognition | 1998 | Lyons, Kamachi, Gyoba | |
| FaceScrub | Images of public figures scrubbed from image searching. | Name and m/f annotation. | 107,818 | Images, text | Face recognition | 2014 | H. Ng et al. | |
| BioID Face Database | Images of faces with eye positions marked. | Manually set eye positions. | 1521 | Images, text | Face recognition | 2001 | BioID | |
| Skin Segmentation Dataset | Randomly sampled color values from face images. | B, G, R, values extracted. | 245,057 | Text | Segmentation, classification | 2012 | R. Bhatt. | |
| Bosphorus | 3D Face image database. | 34 action units and 6 expressions labeled; 24 facial landmarks labeled. | 4652 | Images, text | Face recognition, classification | 2008 | A Savran et al. | |
| UOY 3D-Face | neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised. | labeling. | 5250 | Images, text | Face recognition, classification | 2004 | University of York | |
| CASIA 3D Face Database | Expressions: Anger, smile, laugh, surprise, closed eyes. | None. | 4624 | Images, text | Face recognition, classification | 2007 | Institute of Automation, Chinese Academy of Sciences | |
| CASIA NIR | Expressions: Anger Disgust Fear Happiness Sadness Surprise | None. | 480 | Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second | Face recognition, classification | 2011 | Zhao, G. et al. | |
| BU-3DFE | neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear. 3D images extracted. | None. | 2500 | Images, text | Facial expression recognition, classification | 2006 | Binghamton University | |
| Face Recognition Grand Challenge Dataset | Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data. | None. | 4007 | Images, text | Face recognition, classification | 2004 | National Institute of Standards and Technology | |
| Gavabdb | Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images. | None. | 549 | Images, text | Face recognition, classification | 2008 | King Juan Carlos University | |
| 3D-RMA | Up to 100 subjects, expressions mostly neutral. Several poses as well. | None. | 9971 | Images, text | Face recognition, classification | 2004 | Royal Military Academy (Belgium) | |
| SoF | 112 persons wear glasses under different illumination conditions. | A set of synthetic filters with different level of difficulty. | 42,592 | Images, Mat file | Gender classification, face detection, face recognition, age estimation, and glasses detection | 2017 | Afifi, M. et al. | |
| IMDb-WIKI | IMDb and Wikipedia face images with gender and age labels. | - | 523,051 | Images | Gender classification, face detection, face recognition, age estimation | 2015 | R. Rothe, R. Timofte, L. V. Gool |