Joint COCO and Mapillary Recognition Workshop at ECCV 2018

Mapillary Research is co-organizing the Joint COCO and Mapillary Recognition workshop at this year’s ECCV conference on September 9 in Munich, Germany. Previous COCO workshops have significantly contributed to pushing the state-of-the-art in object recognition and this year we are hosting challenges for Object Detection with Instance Segmentation and a new task on Panoptic Image Segmentation using images from the Mapillary Vistas dataset 1. We are looking forward to receiving high-quality submissions for advancing the field of visual object recognition.


The goals and participation rules for both tasks are described next and the Mapillary challenges are based on the Mapillary Vistas Research Dataset (MVD), version 1.1. Please register here to receive your download instructions via email to access training (18,000), validation (2,000) and test (5,000) images together with labels (for train and val) and for a more detailed description of the dataset. You will receive a single link to a zip file comprising training and validation images (with annotations for semantic segmentation, object detection and panoptic segmentations, respectively) and the test images (5,000 RGB images) in the same email. After downloading, please verify the integrity of the downloaded zip file by comparing the md5 checksum, which should match 56259a7e3539712187d34762ddf027f8. In its research edition, MVD comprises 66 object categories (28 stuff classes, 37 thing classes, 1 void class), and corresponding label IDs can be found by running

> python

in a console from the extracted training data root folder. Please check the accompanying README file in the zip file to make sure you have the dataset version 1.1, which mostly effects the test data. In case you are not sure, we provide the latest version of the test set (used in all competitions from 2018 onwards) here.

Mapillary Vistas Object Detection Task

The goal of this task is to provide instance-specific segmentation results for a subset of 37 object classes on MVD. In such a way, results allow to count individual instances of classes like e.g. the number of cars or pedestrians in an image. Details on the submission format are shown below and (mostly) follow the specification of the corresponding COCO task to ease participation in both dataset challenges.

For detection with object segments (instance segmentation), please use the following format:

    "image_id" : str, 
    "category_id" : int, 
    "segmentation" : RLE, 
    "score" : float,

Please note that the value for image_id is a string (and should be filled with the image filename without extension) while for the COCO dataset this is an integer. This change is due to different naming conventions used for Mapillary Vistas and COCO datasets, respectively. The category_id is a 1-based integer mapping to the respective class label positions in the config.json file, found in the dataset zip file described above. For example, class Bird is the first entry in the config file and corresponding instances should receive label category_id: 1 (rather than 0). In addition, please note that the config file contains also stuff classes, such that values for category_id are not continuously assigned from 1 to 37. The segmentation itself is stored as a run-length-encoded binary mask, and you can find helper scripts for encoding/decoding in Python or Matlab.

All detection results should be submitted as a zipped, single json file and can be submitted to our CodaLab benchmark server. Additional information can be taken from the COCO upload and result formats for detection, respectively. The main performance metric used is Average Precision (AP) computed on the basis of instance-level segmentations per object category and averaged over a range of overlaps 0.5:0.05:0.95 with inclusive start and end, following 2.

Mapillary Vistas Panoptic Segmentation Task

This task is an image labeling problem where each pixel in an image under test has to be assigned to a discrete label from a set of pre-defined object categories or to an ignore label void. In addition to labeling stuff classes, each category with instance-specific annotations should be labeled as in the object detection task above such that object instances are separately segmented and enumerable. This task follows the definition recently introduced in 3, and will be evaluated with the Panoptic metric and corresponding code.

Again, we mostly follow the Panoptic result submission format (see Section 4) with the difference of image_id being a string rather than an integer as for COCO (see description on objection detection task above).

    "image_id" : str,
    "file_name" : str,
    "segments_info" : [segment_info],

    "id" : int,
    "category_id" : int,

Panoptic submissions have to contain exactly one json file encoding all annotations and one folder with PNGs where file names and segment indices are matching with the values and id's in the annotations, respectively. Similar to the detection task, you can find additional information from the COCO upload and result formats for Panoptic segmentation, respectively.

As for the detection task, the corresponding benchmark server will accept at most 5 submissions on the test data per team until the challenge deadline, but you can submit your results on the validation set in order to verify the format.

Performance Evaluation

All test results have to be stored in a single .zip file per task and can be uploaded to the corresponding benchmark server. You can make up to 5 submissions on the test data to the benchmark server before the challenge deadline. However, to validate your submission and upload format, you can submit your validation data results to the server as well. These guidelines are similar to COCO and can be found here.

Important Dates

Training, validation and test data Available here
Challenge submission deadline August 10 August 17, 2018 (11:59 PST)
Challenge winners notified August 29, 2018
Winners present at ECCV 2018 Workshop September 9, 2018

Confirmed Speaker

Andreas Geiger is a full professor at the University of Tübingen and a group leader at the Max Planck Institute for Intelligent Systems. Prior to this, he was a visiting professor at ETH Zürich and a research scientist in the Perceiving Systems department of Dr. Michael Black at the MPI-IS. He received his PhD degree in 2013 from the Karlsruhe Institute of Technology. His research interests are at the intersection of 3D reconstruction, motion estimation and visual scene understanding. His work has been recognized with several prizes, including the 3DV best paper award, the CVPR best paper runner up award, the Heinz Maier Leibnitz Prize and the German Pattern Recognition Award. He serves as an area chair and associate editor for several computer vision conferences and journals (CVPR, ICCV, ECCV, PAMI, IJCV).


We thank NVIDIA for generously sponsoring a Titan Xp GPU for each winner of our tasks. We also thank amazon AWS for sponsoring a total of $20.000 in AWS credits for the winning submissions, respectively. We reserve the right to withhold prices in case of limited submission quality.


  1. G. Neuhold, T. Ollmann, S. Rota Bulò, and P. Kontschieder. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In International Conference on Computer Vision (ICCV), 2017, pdf.

  2. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), 2014, pdf.

  3. A. Kirillov, K. He, R. B. Girshick, C. Rother, and P. Dollár. Panoptic Segmentation. arXiv Tech Report, 2018, pdf.