You Only Look at Once for Real-time and Generic Multi-Task

Wang, Jiayuan; Wu, Q. M. Jonathan; Zhang, Ning

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2310

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: You Only Look at Once for Real-time and Generic Multi-Task

Authors: Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang

(Submitted on 2 Oct 2023 (v1), revised 2 Nov 2023 (this version, v3), latest version 24 Apr 2024 (v4))

Abstract: High precision, lightweight, and real-time responsiveness are three essential requirements for implementing autonomous driving. In this study, we present an adaptive, real-time, and lightweight multi-task model designed to concurrently address object detection, drivable area segmentation, and lane line segmentation tasks. Specifically, we developed an end-to-end multi-task model with a unified and streamlined segmentation structure. We introduced a learnable parameter that adaptively concatenate features in segmentation necks, using the same loss function for all segmentation tasks. This eliminates the need for customizations and enhances the model's generalization capabilities. We also introduced a segmentation head composed only of a series of convolutional layers, which reduces the inference time. We achieved competitive results on the BDD100k dataset, particularly in visualization outcomes. The performance results show a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, we introduced real-world scenarios to evaluate our model's performance in a real scene, which significantly outperforms competitors. This demonstrates that our model not only exhibits competitive performance but is also more flexible and faster than existing multi-task models. The source codes and pre-trained models are released at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.01641 [cs.CV]
	(or arXiv:2310.01641v3 [cs.CV] for this version)

Submission history

From: Jiayuan Wang [view email]
[v1] Mon, 2 Oct 2023 21:09:43 GMT (9334kb,D)
[v2] Tue, 10 Oct 2023 03:50:28 GMT (9444kb,D)
[v3] Thu, 2 Nov 2023 16:52:42 GMT (9446kb,D)
[v4] Wed, 24 Apr 2024 20:05:04 GMT (8776kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2310.01641v3

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: You Only Look at Once for Real-time and Generic Multi-Task

Submission history