The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

International Conf. on Computer Vision (ICCV) 2017 /
By Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, Peter Kontschieder


The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25,000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes. Annotation is performed in a dense and fine-grained style by using polygons for delineating individual objects. Our dataset is 5x larger than the total amount of fine annotations for Cityscapes and contains images from all around the world, captured at various conditions regarding weather, season and daytime. Images come from different imaging devices (mobile phones, tablets, action cameras, professional capturing rigs) and differently experienced photographers. In such a way, our dataset has been designed and compiled to cover diversity, richness of detail and geographic extent. As default benchmark tasks, we define semantic image segmentation and instance-specific image segmentation, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.