List of datasets in computer vision and image processing

This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

3D Objects

See for a review of 33 datasets of 3D object as of 2015. See for a review of more datasets as of 2022.

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created	Reference	Creator
Princeton Shape Benchmark	3D polygonal models collected from the Internet		1814 models in 92 categories	3D polygonal models, categories	shape-based retrieval and analysis	2004		Shilane et al.
Berkeley 3-D Object Dataset	Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories.		849 images, in 75 scenes	color image, depth image, object class, bounding boxes, 3D center points	Predict bounding boxes	2011, updated 2014		Janoch et al.
ShapeNet	3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories.		3,000,000 models, 220,000 of which are classified.	3D models, class labels	Predict class label.	2015		Chang et al.
ObjectNet3D	Images, 3D shapes, and objects 100 categories.		90127 images, 201888 objects, 44147 3D shapes	images, 3D shapes, object bounding boxes, category labels	recognizing the 3D pose and 3D shape of objects from 2D images	2016		Xiang et al.
Common Objects in 3D	Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk.		6 million frames from 40000 videos	multi-view images, camera poses, 3D point clouds, object category	Predict object category. Generate objects.	2021, updated 2022 as CO3Dv2		Meta AI
Google Scanned Objects	Scanned objects in SDF format.		over 10 million			2022		Google AI
Objectverse-XL	3D objects		over 10 million	3D objects, metadata	novel view synthesis, 3D object generation	2023		Deitke et al.
OmniObject3D	Scanned objects, labelled in 190 daily categories		6,000	textured meshes, point clouds, multiview images, videos	robust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation	2023		Wu et al.
UnCommon Objects in 3D	1,070 categories in the LVIS					2025		Meta AI

Object detection and recognition for autonomous vehicles

Dataset Name	Brief description	Preprocessing	Instances	Format	Default Task	Created	Reference	Creator
Cityscapes Dataset	Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.	Pixel-level segmentation and labeling	25,000	Images, text	Classification, object detection	2016		Daimler AG et al.
German Traffic Sign Detection Benchmark Dataset	Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.	Signs manually labeled	900	Images	Classification	2013		S. Houben et al.
KITTI Vision Benchmark Dataset	Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.	Many benchmarks extracted from data.	>100 GB of data	Images, text	Classification, object detection	2012		A. Geiger et al.
FieldSAFE	Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.	Classes labelled geographically.	>400 GB of data	Images and 3D point clouds	Classification, object detection, object localization	2017		M. Kragh et al.
Daimler Monocular Pedestrian Detection dataset	It is a dataset of pedestrians in urban environments.	Pedestrians are box-wise labeled.	Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.	Images	Object recognition and classification	2006		Daimler AG
CamVid	The Cambridge-driving Labeled Video Database is a collection of videos.	The dataset is labeled with semantic labels for 32 semantic classes.	over 700 images	Images	Object recognition and classification	2008		Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla
RailSem19	RailSem19 is a dataset for understanding scenes for vision systems on railways.	The dataset is labeled semanticly and box-wise.	8500	Images	Object recognition and classification, scene recognition	2019		Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai
BOREAS	BOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.	The data is annotated by 3D bounding boxes.	350 km of driving data	Images, Lidar and Radar data	Object recognition and classification, scene recognition	2023		Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot
Bosch Small Traffic Lights Dataset	It is a dataset of traffic lights.	The labeling include bounding boxes of traffic lights together with their state.	5000 images for training and a video sequence of 8334 frames for evaluation	Images	Traffic light recognition	2017		Karsten Behrendt, Libor Novak, Rami Botros
FRSign	It is a dataset of French railway signals.	The labeling include bounding boxes of railway signals together with their state.	more than 100000	Images	Railway signal recognition	2020		Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
GERALD	It is a dataset of German railway signals.	The labeling include bounding boxes of railway signals together with their state.	5000	Images	Railway signal recognition	2023		Philipp Leibner, Fabian Hampel, Christian Schindler
Multi-cue pedestrian	Multi-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians.	The databaset is labeled box-wise.	1092 image pairs with 1776 boxes for pedestrians	Images	Object recognition and classification	2009		Christian Wojek, Stefan Walk, Bernt Schiele
RAWPED	RAWPED is a dataset for detection of pedestrians in the context of railways.	The dataset is labeled box-wise.	26000	Images	Object recognition and classification	2020		Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver
OSDaR23	OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.	The databaset is labeled box-wise.	16874 frames	Images, Lidar, Radar and Infrared	Object recognition and classification	2023		Roman Tilly, Rustam Tagiew, Pavel Klasek ; Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel ; Karsten Schwalbe
Agroverse	Argoverse is a multi-sensory dataset for detection of objects in the context of roads.	The dataset is annotated box-wise.	320 hours of recording	Data from 7 cameras and LiDAR	Object recognition and classification, object tracking	2022		Argo AI, Carnegie Mellon University, Georgia Institute of Technology
Rail3D	Rail3D is a LiDAR dataset for railways recorded in Hungary, France, and Belgium	The dataset is annotated semantically	288 million annotated points	LiDAR	Object recognition and classification, object tracking	2024		Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine
WHU-Railway3D	WHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in China	The dataset is annotated semantically	4.6 billion annotated data points	LiDAR	Object recognition and classification, object tracking	2024		Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University
RailFOD23	A dataset of foreign objects on railway catenary	The dataset is annotated boxwise	14,615 images	Images	Object recognition and classification, object tracking	2024		Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology
ESRORAD	A dataset of images and point clouds for urban road and rail scenes from Le Havre and Rouen	The dataset is annotated boxwise	2,700 k virtual images and 100,000 real images	Images, LiDAR	Object recognition and classification, object tracking	2022		Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies
RailVID	Data recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight.	The dataset is annotated semantically	1,071 images	infrared images	Object recognition and classification, object tracking	2022		Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University
RailPC	LiDAR dataset in the context of railways	The dataset is annotated semantically	3 billion data points	LiDAR	Object recognition and classification, object tracking	2024		Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology
RailCloud-HdF	LiDAR dataset in the context of railways	The dataset is annotated semantically	8060.3 million data points	LiDAR	Object recognition and classification, object tracking	2024		Mahdi Abid, Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium
RailGoerl24	RGB and LiDAR dataset in the context of railways	The dataset is annotated boxwise	12205 HD RGB frames and 383922305 LiDAR colored cloud points	RGB, LiDAR	Person recognition and classification	2025		Rustam Tagiew, Ilkay Wunderlich, Philipp Zanitzer, Mark, Sastuba, Carsten Knoll, Kilian Göller, Haadia Amjad, Steffen Seitz
MRSI	RGB and Infrared dataset in the context of railways	The dataset is annotated boxwise and pixelwise, eleven classes including background	23000 RGB images and 4000 infrared images	RGB, Infrared	Object recognition and classification	2022		Yihao Chen, Ning Zhu, Qian Wu, Cheng Wu, Weilong Niu and Yiming Wang
RailDriVE February 2019	Data Set for Rail Vehicle Positioning Experiments	The dataset is not annotated	26:46 min back and forward driving on an 1.2 km track segment	GNSS, IMU, Speed/distance sensors, RGB	Lokalisation and mapping	2019		Hanno Winter, Michael Helmut Roth

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See for a curated list of datasets, focused on the pre-2005 period.

Dataset name	Brief description	Preprocessing	Instances	Format	Default task	Created	Reference	Creator
Labeled Faces in the Wild	Images of named individuals obtained by Internet search.	frontal face detection, bounding box cropping	13233 images of 5749 named individuals	images, labels	unconstrained face recognition	2008		Huang et al.
Aff-Wild	298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect ; in-the-wild setting; color database; various resolutions	the detected faces, facial landmarks and valence-arousal annotations	~1,250,000 manually annotated images	video	affect recognition	2017	CVPR IJCV	D. Kollias et al.
Aff-Wild2	558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect dimensional affect action units ; in-the-wild setting; color database; various resolutions	the detected faces, detected and aligned faces and annotations	~2,800,000 manually annotated images	video	affect recognition	2019	BMVC FG	D. Kollias et al.
FERET (facial recognition technology)	11338 images of 1199 individuals in different positions and at different times.	None.	11,338	Images	Classification, face recognition	2003		United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song	7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.	Files labelled with expression. Perceptual validation ratings provided by 319 raters.	7,356	Video, sound files	Classification, face recognition, voice recognition	2018		S.R. Livingstone and F.A. Russo
SCFace	Color images of faces at various angles.	Location of facial features extracted. Coordinates of features given.	4,160	Images, text	Classification, face recognition	2011		M. Grgic et al.
Yale Face Database	Faces of 15 individuals in 11 different expressions.	Labels of expressions.	165	Images	Face recognition	1997		J. Yang et al.
Cohn-Kanade AU-Coded Expression Database	Large database of images with labels for expressions.	Tracking of certain facial features.	500+ sequences	Images, text	Facial expression analysis	2000		T. Kanade et al.
JAFFE Facial Expression Database	213 images of 7 facial expressions posed by 10 Japanese female models.	Images are cropped to the facial region. Includes semantic ratings data on emotion labels.	213	Images, text	Facial expression cognition	1998		Lyons, Kamachi, Gyoba
FaceScrub	Images of public figures scrubbed from image searching.	Name and m/f annotation.	107,818	Images, text	Face recognition	2014		H. Ng et al.
BioID Face Database	Images of faces with eye positions marked.	Manually set eye positions.	1521	Images, text	Face recognition	2001		BioID
Skin Segmentation Dataset	Randomly sampled color values from face images.	B, G, R, values extracted.	245,057	Text	Segmentation, classification	2012		R. Bhatt.
Bosphorus	3D Face image database.	34 action units and 6 expressions labeled; 24 facial landmarks labeled.	4652	Images, text	Face recognition, classification	2008		A Savran et al.
UOY 3D-Face	neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.	labeling.	5250	Images, text	Face recognition, classification	2004		University of York
CASIA 3D Face Database	Expressions: Anger, smile, laugh, surprise, closed eyes.	None.	4624	Images, text	Face recognition, classification	2007		Institute of Automation, Chinese Academy of Sciences
CASIA NIR	Expressions: Anger Disgust Fear Happiness Sadness Surprise	None.	480	Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second	Face recognition, classification	2011		Zhao, G. et al.
BU-3DFE	neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear. 3D images extracted.	None.	2500	Images, text	Facial expression recognition, classification	2006		Binghamton University
Face Recognition Grand Challenge Dataset	Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.	None.	4007	Images, text	Face recognition, classification	2004		National Institute of Standards and Technology
Gavabdb	Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.	None.	549	Images, text	Face recognition, classification	2008		King Juan Carlos University
3D-RMA	Up to 100 subjects, expressions mostly neutral. Several poses as well.	None.	9971	Images, text	Face recognition, classification	2004		Royal Military Academy (Belgium)
SoF	112 persons wear glasses under different illumination conditions.	A set of synthetic filters with different level of difficulty.	42,592	Images, Mat file	Gender classification, face detection, face recognition, age estimation, and glasses detection	2017		Afifi, M. et al.
IMDb-WIKI	IMDb and Wikipedia face images with gender and age labels.	-	523,051	Images	Gender classification, face detection, face recognition, age estimation	2015		R. Rothe, R. Timofte, L. V. Gool