Benchmarking City-Scale Semantic 3D Map Making with Mapillary Metropolis

We are organizing a tutorial at ICCV'21 entitled Benchmarking City-Scale Semantic 3D Map Making with Mapillary Metropolis as part of ICCV 2021. It will be held as a half-day tutorial on Saturday, Oct. 16 (12pm-3.30pm EDT) as a fully virtual event.

List of organizers (Facebook Reality Labs)

Aleksander Colovic, Arno Knapitsch, Lorenzo Porzi, Samuel Rota Bulò, Jerneja Mislej, Manuel Lopez-Antequera, Vasileios Balntas, Edward Miller, Yubin Kuang, Peter Kontschieder.

The Mapillary Metropolis dataset

Next-generation, location-based computer vision (CV) applications like augmented reality or autonomous driving require robustly working CV algorithms. Robustness means that algorithms can cope with variability in input data like seasonal and weather-related appearance changes, low-quality data from cheap cameras, or data captured under suboptimal lighting conditions. Producing reliable predictions in such challenging data scenarios and for highly-varying, city-scale environments makes a real impact for downstream applications.

To make this impact measurable, we are introducing a new, publicly accessible city-scale dataset called Mapillary Metropolis. This dataset is designed with the goal of creating a completely novel and complex benchmarking paradigm for training and testing computer vision algorithms in the context of semantic 3D map making. Our new dataset comprises multiple data modalities at a city-scale size, registered across different representations, and enriched with human- and machine-generated annotations for different object recognition and tracking tasks. These modalities include professional- and consumer-grade street-level images, aerial images, 3D point clouds from street-level LiDAR, aerial LiDAR, and image-based reconstruction (SfM and MVS), and CAD models. All data modalities are aligned based on manual correspondence annotations and ingestion of survey-grade ground control point data. Our dataset is designed to take city-scale 3D semantic modeling to the next level by enabling researchers to study shortcomings in current methods including, but not limited to object recognition and tracking, 3D modeling, depth estimation, relocalisation, image retrieval, change detection, sensor-fusion, etc.

Given the complexity of our new Metropolis dataset and the positive impact it might have on the community, the goal of this tutorial is to provide an introduction to the dataset, related benchmarking tools, and pre-trained models to facilitate experimentation by interested researchers and engineers.

Provided Course Material

The dataset is available for download at the Mapillary dataset page and the SDK for data handling and visualization examples is on github.

Tutorial Agenda

The tutorial will focus on four main topics: i) introducing Metropolis and its design philosophy, ii) describing the 3D data generation and alignment of different data modalities, the dataset's extent and annotation processes, iii) introducing the dataset SDK and how to access the different data modalities and annotations, and iv) describing select tasks and novel benchmarking concepts.

Introduction and Motivation(20min): Overview of the tutorial, the dataset's contents and general design philosophy. Introductory explanation of data modalities and complementarity, camera and LiDAR model heterogeneity.
Dataset Creation Process (70min): Large-scale 3D data generation pipelines from panoramic image data leveraging semantic image segmentation. Alignment of reconstructed 3D models with multiple, complementary data modalities including street-level- and aerial-LiDAR scans, and CAD models. Integration of surveying grade ground control landmark points. Description of the annotation UI and developed tools for 2D/3D and cross-modal correspondence annotations. Fusion/registration of multiple modalities as part of the annotation process. Description of both, manual and machine generated annotation approaches.
Coffee Break (15min)
Dataset Usage (45min): In-depth description of the dataset's internal organization and data formats. Full walk-through of typical use cases, with focus on best practices.
CV Applications and Evaluation Metrics (45min): Description of current benchmarking tasks developed on top of Metropolis (e.g. 2D detection, 3D detection, tracking, single-image depth estimation). Overview of evaluation metrics, protocols, and uncertainty incorporation in benchmarking.
Facebook Research Award Recipient Announcements (5min)
Conclusions, Questions, and Feedback (10min): Tutorial summary, Feedback to organizers, feature requests for further dataset extensions.