Joint COCO and Mapillary Recognition Workshop at ICCV 2019

Mapillary Research is co-organizing for the second time the Joint COCO and Mapillary Recognition workshop. This year, it will be part of the ICCV conference and held as a full-day workshop on Sunday, October 27 in Seoul, South Korea. Our previous joint COCO and Mapillary workshop has significantly contributed to pushing the state-of-the-art in object recognition and this year will see a revival of Object Detection with Instance Segmentation and Panoptic Image Segmentation challenges using images from the Mapillary Vistas dataset 1. We are again looking forward to receiving high-quality submissions for advancing the field of visual object recognition!

New Rules and Awards for 2019 Challenges

Below we summarize the main changes with respect to our previous year's Joint COCO and Mapillary Recognition workshop:

  1. Participants must submit a technical report that includes a detailed ablation study of their submission (suggested length 1-4 pages). The reports will be made public. Please, use this latex template for the report and send it to both, and This report will substitute the short text description that we requested previously. Only submissions with a report will be considered for any award and will be put in the Mapillary Vistas leaderboard.

  2. This year for each challenge track we will have two different awards: best result award and most innovative award. The most innovative award will be based on the method description in the submitted technical reports and decided by the COCO award committee. The commitee will invite teams to present at the workshop based on the innovations of the submissions rather than the best scores.

  3. This year we introduce a single best paper award for the most innovative and successful solution across all challenges. The winner will be determined by the workshop organization committee.

  4. We will only allow up to 3 submissions to the test server per challenge until the challenge deadline.


The goals and participation rules for both tasks are described next and the Mapillary challenges are based on the Mapillary Vistas Research Dataset (MVD), version 1.1. Please register here to receive your download instructions via email to access training (18,000), validation (2,000) and test (5,000) images together with labels (for train and val) and for a more detailed description of the dataset. You will receive a single link to a zip file comprising training and validation images (with annotations for semantic segmentation, object detection and panoptic segmentations, respectively) and the test images (5,000 RGB images) in the same email. After downloading, please verify the integrity of the downloaded zip file by comparing the md5 checksum, which should match 56259a7e3539712187d34762ddf027f8. In its research edition, MVD comprises 66 object categories (28 stuff classes, 37 thing classes, 1 void class), and corresponding label IDs can be found by running

> python

in a console from the extracted training data root folder. Please check the accompanying README file in the zip file to make sure you have the dataset version 1.1, which mostly effects the test data. In case you are not sure, we provide the latest version of the test set (used in all competitions from 2018 onwards) here.

Mapillary Vistas Object Detection Task

The goal of this task is to provide instance-specific segmentation results for a subset of 37 object classes on MVD. In such a way, results allow to count individual instances of classes like e.g. the number of cars or pedestrians in an image. Details on the submission format are shown below and (mostly) follow the specification of the corresponding COCO task to ease participation in both dataset challenges.

For detection with object segments (instance segmentation), please use the following format:

    "image_id" : str, 
    "category_id" : int, 
    "segmentation" : RLE, 
    "score" : float,

Please note that the value for image_id is a string (and should be filled with the image filename without extension) while for the COCO dataset this is an integer. This change is due to different naming conventions used for Mapillary Vistas and COCO datasets, respectively. The category_id is a 1-based integer mapping to the respective class label positions in the config.json file, found in the dataset zip file described above. For example, class Bird is the first entry in the config file and corresponding instances should receive label category_id: 1 (rather than 0). In addition, please note that the config file contains also stuff classes, such that values for category_id are not continuously assigned from 1 to 37. The segmentation itself is stored as a run-length-encoded binary mask, and you can find helper scripts for encoding/decoding in Python or Matlab.

All detection results should be submitted as a zipped, single json file and can be submitted to our CodaLab benchmark server. Additional information can be taken from the COCO upload and result formats for detection, respectively. The main performance metric used is Average Precision (AP) computed on the basis of instance-level segmentations per object category and averaged over a range of overlaps 0.5:0.05:0.95 with inclusive start and end, following 2.

Mapillary Vistas Panoptic Segmentation Task

This task is an image labeling problem where each pixel in an image under test has to be assigned to a discrete label from a set of pre-defined object categories or to an ignore label void. In addition to labeling stuff classes, each category with instance-specific annotations should be labeled as in the object detection task above such that object instances are separately segmented and enumerable. This task follows the definition recently introduced in 3, and will be evaluated with the Panoptic metric and corresponding code.

Again, we mostly follow the Panoptic result submission format (see Section 4) with the difference of image_id being a string rather than an integer as for COCO (see description on objection detection task above).

    "image_id" : str,
    "file_name" : str,
    "segments_info" : [segment_info],

    "id" : int,
    "category_id" : int,

Panoptic submissions have to contain exactly one json file encoding all annotations and one folder with PNGs where file names and segment indices are matching with the values and id's in the annotations, respectively. Similar to the detection task, you can find additional information from the COCO upload and result formats for Panoptic segmentation, respectively.

As for the detection task, the corresponding benchmark server will accept at most 3 submissions on the test data per team until the challenge deadline, but you can submit your results on the validation set in order to verify the format.

Performance Evaluation

All test results have to be stored in a single .zip file per task and can be uploaded to the corresponding benchmark server. You can make up to 3 submissions on the test data to the benchmark server before the challenge deadline. However, to validate your submission and upload format, you can submit your validation data results to the server as well. These guidelines are similar to COCO and can be found here.

Important Dates

Training, validation and test data Available here
Challenge submission deadline October 4 October 11, 2019 (11:59 PST)
Technical report submission deadline October 11, 2019
Challenge winners notified October 19, 2019
Winners present at ICCV 2019 Workshop October 27, 2019

Invited Speakers

Alex Berg (Facebook & UNC Chapel Hill)

I am a research scientist at Facebook. My research examines a wide range of computational visual recognition, connections to natural language processing, psychology, and has a concentration on computational efficiency. I completed my PhD in computer science at UC Berkeley in 2005, and have worked alongside many wonderful people at Yahoo! Research, Columbia University, Stony Brook University, and am currently an associate professor (on leave) at UNC Chapel Hill.

Andrej Karpathy (Tesla)

I am the Sr. Director of AI at Tesla, where I lead the team responsible for all neural networks on the Autopilot. Previously, I was a Research Scientist at OpenAI working on Deep Learning in Computer Vision, Generative Modeling and Reinforcement Learning. I received my PhD from Stanford, where I worked with Fei-Fei Li on Convolutional/Recurrent Neural Network architectures and their applications in Computer Vision, Natural Language Processing and their intersection. Over the course of my PhD I squeezed in two internships at Google where I worked on large-scale feature learning over YouTube videos, and in 2015 I interned at DeepMind on the Deep Reinforcement Learning team. Together with Fei-Fei, I designed and was the primary instructor for a new Stanford class on Convolutional Neural Networks for Visual Recognition (CS231n). The class was the first Deep Learning course offering at Stanford and has grown from 150 enrolled in 2015 to 330 students in 2016, and 750 students in 2017.


  1. G. Neuhold, T. Ollmann, S. Rota Bulò, and P. Kontschieder. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In International Conference on Computer Vision (ICCV), 2017, pdf.

  2. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), 2014, pdf.

  3. A. Kirillov, K. He, R. B. Girshick, C. Rother, and P. Dollár. Panoptic Segmentation. arXiv Tech Report, 2018, pdf.