List of datasets in computer vision and image processing


This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

3D Objects

See for a review of 33 datasets of 3D object as of 2015. See for a review of more datasets as of 2022.
Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated ReferenceCreator
Princeton Shape Benchmark3D polygonal models collected from the Internet1814 models in 92 categories3D polygonal models, categoriesshape-based retrieval and analysis2004Shilane et al.
Berkeley 3-D Object Dataset Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories.849 images, in 75 scenescolor image, depth image, object class, bounding boxes, 3D center pointsPredict bounding boxes2011, updated 2014Janoch et al.
ShapeNet3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories.3,000,000 models, 220,000 of which are classified.3D models, class labelsPredict class label.2015Chang et al.
ObjectNet3DImages, 3D shapes, and objects 100 categories.90127 images, 201888 objects, 44147 3D shapesimages, 3D shapes, object bounding boxes, category labelsrecognizing the 3D pose and 3D shape of objects from 2D images2016Xiang et al.
Common Objects in 3D Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk.6 million frames from 40000 videosmulti-view images, camera poses, 3D point clouds, object categoryPredict object category. Generate objects.2021, updated 2022 as CO3Dv2Meta AI
Google Scanned ObjectsScanned objects in SDF format.over 10 million2022Google AI
Objectverse-XL3D objectsover 10 million3D objects, metadatanovel view synthesis, 3D object generation2023Deitke et al.
OmniObject3DScanned objects, labelled in 190 daily categories6,000textured meshes, point clouds, multiview images, videosrobust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation2023Wu et al.
UnCommon Objects in 3D 1,070 categories in the LVIS2025Meta AI

Object detection and recognition for autonomous vehicles

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated ReferenceCreator
Cityscapes DatasetStereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.Pixel-level segmentation and labeling25,000Images, textClassification, object detection2016Daimler AG et al.
German Traffic Sign Detection Benchmark DatasetImages from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.Signs manually labeled900ImagesClassification2013S. Houben et al.
KITTI Vision Benchmark DatasetAutonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.Many benchmarks extracted from data.>100 GB of dataImages, textClassification, object detection2012A. Geiger et al.
FieldSAFEMulti-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.Classes labelled geographically.>400 GB of dataImages and 3D point cloudsClassification, object detection, object localization2017M. Kragh et al.
Daimler Monocular Pedestrian Detection datasetIt is a dataset of pedestrians in urban environments.Pedestrians are box-wise labeled.Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.ImagesObject recognition and classification2006Daimler AG
CamVidThe Cambridge-driving Labeled Video Database is a collection of videos.The dataset is labeled with semantic labels for 32 semantic classes.over 700 imagesImagesObject recognition and classification2008Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla
RailSem19RailSem19 is a dataset for understanding scenes for vision systems on railways.The dataset is labeled semanticly and box-wise.8500ImagesObject recognition and classification, scene recognition2019Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai
BOREASBOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.The data is annotated by 3D bounding boxes.350 km of driving dataImages, Lidar and Radar dataObject recognition and classification, scene recognition2023Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot
Bosch Small Traffic Lights DatasetIt is a dataset of traffic lights.The labeling include bounding boxes of traffic lights together with their state.5000 images for training and a video sequence of 8334 frames for evaluationImagesTraffic light recognition2017Karsten Behrendt, Libor Novak, Rami Botros
FRSignIt is a dataset of French railway signals.The labeling include bounding boxes of railway signals together with their state.more than 100000ImagesRailway signal recognition2020Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
GERALDIt is a dataset of German railway signals.The labeling include bounding boxes of railway signals together with their state.5000ImagesRailway signal recognition2023Philipp Leibner, Fabian Hampel, Christian Schindler
Multi-cue pedestrianMulti-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians.The databaset is labeled box-wise.1092 image pairs with 1776 boxes for pedestriansImagesObject recognition and classification2009Christian Wojek, Stefan Walk, Bernt Schiele
RAWPEDRAWPED is a dataset for detection of pedestrians in the context of railways.The dataset is labeled box-wise.26000ImagesObject recognition and classification2020Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver
OSDaR23OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.The databaset is labeled box-wise.16874 framesImages, Lidar, Radar and InfraredObject recognition and classification2023Roman Tilly, Rustam Tagiew, Pavel Klasek ; Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel ; Karsten Schwalbe
AgroverseArgoverse is a multi-sensory dataset for detection of objects in the context of roads.The dataset is annotated box-wise.320 hours of recordingData from 7 cameras and LiDARObject recognition and classification, object tracking2022Argo AI, Carnegie Mellon University, Georgia Institute of Technology
Rail3DRail3D is a LiDAR dataset for railways recorded in Hungary, France, and BelgiumThe dataset is annotated semantically288 million annotated pointsLiDARObject recognition and classification, object tracking2024Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine
WHU-Railway3DWHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in ChinaThe dataset is annotated semantically4.6 billion annotated data pointsLiDARObject recognition and classification, object tracking2024Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University
RailFOD23A dataset of foreign objects on railway catenaryThe dataset is annotated boxwise14,615 imagesImagesObject recognition and classification, object tracking2024Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology
ESRORADA dataset of images and point clouds for urban road and rail scenes from Le Havre and RouenThe dataset is annotated boxwise2,700 k virtual images and 100,000 real imagesImages, LiDARObject recognition and classification, object tracking2022Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies
RailVIDData recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight.The dataset is annotated semantically1,071 imagesinfrared imagesObject recognition and classification, object tracking2022Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University
RailPCLiDAR dataset in the context of railwaysThe dataset is annotated semantically3 billion data pointsLiDARObject recognition and classification, object tracking2024Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology
RailCloud-HdFLiDAR dataset in the context of railwaysThe dataset is annotated semantically8060.3 million data pointsLiDARObject recognition and classification, object tracking2024Mahdi Abid, Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium
RailGoerl24RGB and LiDAR dataset in the context of railwaysThe dataset is annotated boxwise12205 HD RGB frames and 383922305 LiDAR colored cloud pointsRGB, LiDARPerson recognition and classification2025Rustam Tagiew, Ilkay Wunderlich, Philipp Zanitzer, Mark, Sastuba, Carsten Knoll, Kilian Göller, Haadia Amjad, Steffen Seitz
MRSIRGB and Infrared dataset in the context of railwaysThe dataset is annotated boxwise and pixelwise, eleven classes including background23000 RGB images and 4000 infrared imagesRGB, InfraredObject recognition and classification2022Yihao Chen, Ning Zhu, Qian Wu, Cheng Wu, Weilong Niu and Yiming Wang
RailDriVE February 2019Data Set for Rail Vehicle Positioning ExperimentsThe dataset is not annotated26:46 min back and forward driving on an 1.2 km track segmentGNSS, IMU, Speed/distance sensors, RGBLokalisation and mapping2019Hanno Winter, Michael Helmut Roth

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See for a curated list of datasets, focused on the pre-2005 period.
Dataset nameBrief descriptionPreprocessingInstancesFormatDefault taskCreated ReferenceCreator
Labeled Faces in the Wild Images of named individuals obtained by Internet search.frontal face detection, bounding box cropping13233 images of 5749 named individualsimages, labelsunconstrained face recognition2008Huang et al.
Aff-Wild298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect ; in-the-wild setting; color database; various resolutions the detected faces, facial landmarks and valence-arousal annotations~1,250,000 manually annotated imagesvideo affect recognition 2017CVPR
IJCV
D. Kollias et al.
Aff-Wild2558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect dimensional affect action units ; in-the-wild setting; color database; various resolutions the detected faces, detected and aligned faces and annotations~2,800,000 manually annotated imagesvideo affect recognition 2019BMVC
FG
D. Kollias et al.
FERET (facial recognition technology)11338 images of 1199 individuals in different positions and at different times.None.11,338ImagesClassification, face recognition2003United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song 7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.Files labelled with expression. Perceptual validation ratings provided by 319 raters.7,356Video, sound filesClassification, face recognition, voice recognition2018S.R. Livingstone and F.A. Russo
SCFaceColor images of faces at various angles.Location of facial features extracted. Coordinates of features given.4,160Images, textClassification, face recognition2011M. Grgic et al.
Yale Face DatabaseFaces of 15 individuals in 11 different expressions.Labels of expressions.165ImagesFace recognition1997J. Yang et al.
Cohn-Kanade AU-Coded Expression DatabaseLarge database of images with labels for expressions.Tracking of certain facial features.500+ sequencesImages, textFacial expression analysis2000T. Kanade et al.
JAFFE Facial Expression Database213 images of 7 facial expressions posed by 10 Japanese female models.Images are cropped to the facial region. Includes semantic ratings data on emotion labels.213Images, textFacial expression cognition1998Lyons, Kamachi, Gyoba
FaceScrubImages of public figures scrubbed from image searching.Name and m/f annotation.107,818Images, textFace recognition2014H. Ng et al.
BioID Face DatabaseImages of faces with eye positions marked.Manually set eye positions.1521Images, textFace recognition2001BioID
Skin Segmentation DatasetRandomly sampled color values from face images.B, G, R, values extracted.245,057TextSegmentation, classification2012R. Bhatt.
Bosphorus3D Face image database.34 action units and 6 expressions labeled; 24 facial landmarks labeled.4652
Images, text
Face recognition, classification2008A Savran et al.
UOY 3D-Faceneutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.labeling.5250
Images, text
Face recognition, classification2004University of York
CASIA 3D Face DatabaseExpressions: Anger, smile, laugh, surprise, closed eyes.None.4624
Images, text
Face recognition, classification2007Institute of Automation, Chinese Academy of Sciences
CASIA NIRExpressions: Anger Disgust Fear Happiness Sadness SurpriseNone.480Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per secondFace recognition, classification2011Zhao, G. et al.
BU-3DFEneutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear. 3D images extracted.None.2500Images, textFacial expression recognition, classification2006Binghamton University
Face Recognition Grand Challenge DatasetUp to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.None.4007Images, textFace recognition, classification2004National Institute of Standards and Technology
GavabdbUp to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.None.549Images, textFace recognition, classification2008King Juan Carlos University
3D-RMAUp to 100 subjects, expressions mostly neutral. Several poses as well.None.9971Images, textFace recognition, classification2004Royal Military Academy (Belgium)
SoF112 persons wear glasses under different illumination conditions.A set of synthetic filters with different level of difficulty.42,592 Images, Mat fileGender classification, face detection, face recognition, age estimation, and glasses detection2017Afifi, M. et al.
IMDb-WIKIIMDb and Wikipedia face images with gender and age labels.-523,051ImagesGender classification, face detection, face recognition, age estimation2015R. Rothe, R. Timofte, L. V. Gool