LSUN'17 Segmentation Tasks

Mapillary Research is co-organizing the Large-Scale Scene Understanding (LSUN) workshop at this year’s CVPR conference in Honolulu, Hawaii. LSUN comprises several challenges in the context of scene understanding and we are hosting tasks for Semantic Image Segmentation and Instance-specific Image Segmentation of Street-level Images. We are looking forward to receiving submissions based on novel semantic segmentation and instance-aware semantic segmentation models, respectively.

Post-LSUN Updates

We congratulate the winners of our LSUN’17 semantic segmentation 1 and instance-specific semantic segmentation 2 tasks, respectively. Details about their winning architectures can be found in their slides and in our Mapillary Vistas ICCV’17 paper and supplementary material. Even though the LSUN’17 challenge is over, our benchmark server remains online and we are looking forward to receiving further submissions in the formats described below. You can take a look at the leaderboard here.


The goals and participation rules for these two tasks are described next and both of them are based on the Mapillary Vistas Dataset (MVD) we have recently released. Please register here to get access to training (18,000) and validation (2,000) images and for a more detailed description of the dataset. After downloading, please verify the integrity of the downloaded zip file by comparing the md5 checksum, which should match cf59d2f0db54d8b4ae952c00e275f015. MVD in its research edition comprises 66 object categories with 37 of them being instance-specifically labeled, and corresponding label IDs can be found by running

> python

in a console from the extracted data root folder. Test data is made available for registered and eligible users per email. Also, please check back on this site for further updates and upload instructions for results.

Semantic Image Segmentation Task

This task is an image labeling problem where each pixel in an image under test has to be assigned to a discrete label from a set of pre-defined object categories. We expect test data results to follow the ground truth labeling convention, i.e., assigned pixel-labels are interpreted in the same manner we provide our ground truth. As main performance metric, we use the well-established mean Intersection-over-Union (IoU) score 3 (also known as Jaccard index). Test results have to be stored as 8-bit .png files using the exact same filename as the RGB test image. For example, if the test image filename was M2kh294N9c72sICO990Uew.jpg, the corresponding label file needs to be named M2kh294N9c72sICO990Uew.png. Color palettes as included for visualization purposes in the training label images can be ignored. All test results have to be stored in a single .zip file and can be uploaded to our benchmark server.

Instance-specific Semantic Segmentation Task

The goal of this task is to provide individual, instance-specific segmentation results for a subset of 37/66 object classes on MVD. In such a way, results allow to count individual instances of classes like e.g. the number of cars or pedestrians in an image. We expect submission results for each test image to provide a corresponding filename.txt file, where each line separately describes a segmented instance by providing

<instance mask0>␣<label id0>␣<confidence0>
<instance mask1>␣<label id1>␣<confidence1>

<instance mask> is the path to a mask file where non-zero entries belong to the instance segmentation, <label id> is an integer corresponding to the class label and <confidence> is a floating point number informing about the confidence of the prediction. The main performance metric used is Average Precision (AP) computed on the basis of instance-level segmentations per object category and averaged over a range of overlaps 0.5:0.05:0.95 with inclusive start and end. This procedure is following 4 and 5 and we also penalize multiple predictions that are assigned to the same ground truth instance annotations as false positives. Again, all test results have to be stored in a single .zip file and can be uploaded to our benchmark server.

Performance Evaluation

During the challenge period, no performance feedback on test data will be provided, i.e. there is no point in submitting (multiple) results to the evaluation server since results will not be disclosed to participants. In case of multiple submissions to the server, the last successfully committed results are evaluated and will enter the ranking. Finally, we provide performance evaluation scripts on our github repository to assess scores for predictions on publicly available validation (and training) data.

Important Dates

Training and validation data available May 2017
Test data available and test server running June 26, 2017
Challenge submission deadline (extended) July 9 July 14, 2017 (23:59 CEST)
LSUN workshop at CVPR, announcement of winners on stage, public leaderboard going live July 26, 2017

Sponsorships, Prizes and Acknowledgements

We greatly appreciate workshop sponsorship from the following companies, significantly supporting image annotation costs. Moreover, we acknowledge financial support from the Austrian Research Promotion agency (FFG) via project DIGIMAP.

Diamond sponsorship

Gold sponsorship

Silver sponsorship

Bronze sponsorship

Prize announcement for winning teams

The winning entries for the Semantic Image Segmentation and Instance-specific Semantic Segmentation tasks will each receive a prize. Amazon Web Services (AWS) has generously offered to sponsor 10,000 US$ in form of AWS credits for the winning competition participants, respectively.


  1. Y. Zhang, H. Zhao and J. Shi, Team PSPNet

  2. S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, Team UCenter 

  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman., The PASCAL Visual Object Classes (VOC) Challenge. In International Journal of Computer Vision (IJCV), 2010. 

  4. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), 2014. 

  5. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.