OATAO - Open Archive Toulouse Archive Ouverte Open Access Week

Instance Sequence Queries for Video Instance Segmentation with Transformers

Xu, Zhujun and Vivet, Damien Instance Sequence Queries for Video Instance Segmentation with Transformers. (2021) Sensors, 21 (13). 4507. ISSN 1424-8220

[img]
Preview
(Document in English)

PDF (Publisher's version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
15MB

Official URL: https://doi.org/10.3390/s21134507

Abstract

Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1)~building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. {On TITAN Xp GPU}, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.

Item Type:Article
Audience (journal):International peer-reviewed journal
Uncontrolled Keywords:
Institution:Université de Toulouse > Institut Supérieur de l'Aéronautique et de l'Espace - ISAE-SUPAERO (FRANCE)
Laboratory name:
Funders:
FUI (FUI STAR: DOS0075476 00) - ANR AVISE (ANR-17-CE22-0001-01)
Statistics:download
Deposited On:02 Jul 2021 17:05

Repository Staff Only: item control page