Learning Multi-Object Tracking and Segmentation from Automatic Annotations

Conf. on Computer Vision and Pattern Recognition (CVPR) 2020 /
By Lorenzo Porzi, Markus Hofinger, Idoia Ruiz, Joan Serrat, Samuel Rota Bulò, Peter Kontschieder

Abstract

In this work we contribute a novel pipeline to automatically generate training data, and to improve over state-of-the-art multi-object tracking and segmentation (MOTS) methods. Our proposed track mining algorithm turns raw street-level videos into high-fidelity MOTS training data, is scalable and overcomes the need of expensive and time-consuming manual annotation approaches. We leverage state-of-the-art instance segmentation results in combination with optical flow predictions, also trained on automatically harvested training data. Our second major contribution is MOTSNet - a deep learning, tracking-by-detection architecture for MOTS - deploying a novel mask-pooling layer for improved object association over time. Training MOTSNet with our automatically extracted data leads to significantly improved sMOTSA scores on the novel KITTIMOTS dataset (+1.9%/+7.5% on cars/pedestrians), and MOTSNet improves by +4.1% over previously best methods on the MOTSChallenge dataset. Our most impressive finding is that we can improve over previous best-performing works, even in complete absence of manually annotated MOTS training data.

Application to KITTI Video

More publications

Modeling the Background for Incremental Learning in Semantic Segmentation

By Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò, Elisa Ricci, Barbara Caputo
Conf. on Computer Vision and Pattern Recognition (CVPR) 2020 /

Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition

By Frederik Warburg, Soren Hauberg, Manuel López-Antequera, Pau Gargallo, Yubin Kuang, Javier Civera
Conf. on Computer Vision and Pattern Recognition (CVPR) 2020 /

The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale

By Christian Ertler, Jerneja Mislej, Tobias Ollmann, Lorenzo Porzi, Gerhard Neuhold, Yubin Kuang
arXiv (technical report) /

Towards Generalization Across Depth for Monocular 3D Object Detection

By Andrea Simonelli, Samuel Rota Bulò, Lorenzo Porzi, Elisa Ricci, Peter Kontschieder
arXiv (technical report) /

Disentangling Monocular 3D Object Detection

By Andrea Simonelli, Samuel Rota Bulò, Lorenzo Porzi, Manuel López-Antequera, Peter Kontschieder
International Conf. on Computer Vision (ICCV) 2019 /

AdaGraph: Unifying Predictive and Continuous Domain Adaptation through Graphs

By Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Conf. on Computer Vision and Pattern Recognition (CVPR) 2019 /

Seamless Scene Segmentation

By Lorenzo Porzi, Samuel Rota Bulò, Aleksander Colovic, Peter Kontschieder
Conf. on Computer Vision and Pattern Recognition (CVPR) 2019 /

Deep Single Image Camera Calibration with Radial Distortion

By Manuel López-Antequera, Roger Marı́, Pau Gargallo, Yubin Kuang, Javier Gonzalez-Jimenez, Gloria Haro
Conf. on Computer Vision and Pattern Recognition (CVPR) 2019 /

Unsupervised Domain Adaptation using Feature-Whitening and Consensus Loss

By Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulò, Nicu Sebe, Elisa Ricci
Conf. on Computer Vision and Pattern Recognition (CVPR) 2019 /

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

By Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
Conf. on Computer Vision and Pattern Recognition (CVPR) 2018 /

Boosting Domain Adaptation by Discovering Latent Domains

By Massimilano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Conf. on Computer Vision and Pattern Recognition (CVPR) 2018 /

Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View

By Albert Pumarola, Antonio Agudo, Lorenzo Porzi, Alberto Sanfeliu, Vincent Lepetit, Francesc Moreno-Noguer
Conf. on Computer Vision and Pattern Recognition (CVPR) 2018 /

The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

By Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, Peter Kontschieder
International Conf. on Computer Vision (ICCV) 2017 /

AutoDIAL: Automatic DomaIn Alignment Layers

By Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, Samuel Rota Bulò
International Conf. on Computer Vision (ICCV) 2017 /

Loss Max-Pooling for Semantic Image Segmentation

By Samuel Rota Bulò, Gerhard Neuhold, Peter Kontschieder
Conf. on Computer Vision and Pattern Recognition (CVPR) 2017 /

Online Learning with Bayesian Classification Trees

By Samuel Rota Bulò, Peter Kontschieder
Conf. on Computer Vision and Pattern Recognition (CVPR) 2016 /

Dropout Distillation

By Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
Intern. Conf. on Machine Learning (ICML) 2016 /