Multi-View Domain Adaptation
for Nighttime Aerial Tracking

RAL 2023

1Tongji University, 2The University of Hong Kong

MVDANT performs multi-view domain adaptation for nighttime aerial tracking with high precision and robustness.

Abstract

We present a multi-view domain adaptation framework for nighttime aerial tracking named MVDANT. Our approach addresses the challenges of adapting daytime tracking models to nighttime scenarios while considering multiple viewpoints. MVDANT combines multi-view knowledge fusion, feature alignment, and adversarial learning to bridge the gap between daytime and nighttime domains. The framework includes a novel multi-view feature aligner with a transformer structure and a Transformer-based hierarchical discriminator. These components work together to capture diverse perspectives and lighting distribution knowledge, improving the robustness of tracking objects from various views. Our experimental results demonstrate superior performance on challenging nighttime UAV benchmarks, with significant improvements in precision, normalized precision, and success rate compared to state-of-the-art trackers. significant improvements in precision, normalized precision, and success rate compared to state-of-the-art trackers.

Real-World Tests

MVDANT was implemented on a typical embedded system, the NVIDIA Jetson AGX Xavier, to demonstrate its applicability in nighttime drone tracking applications in the real world. Without TensorRT acceleration, MVDANT achieves an impressive real-time speed of 31.25 frames per second (FPS). The following videos showcase our real-world tests, demonstrating the robustness and effectiveness of MVDANT in various nighttime tracking scenarios.

Real-World Evaluation 1

Real-World Evaluation 2

Real-World Evaluation 3

Real-World Evaluation 4

Real-World Evaluation 5

Real-World Evaluation 6

Method

Our MVDANT framework addresses the challenges of nighttime aerial tracking by leveraging multi-view domain adaptation. The key components of our method are:

Overall Objective

The overall training loss of our framework combines classification and regression losses with adversarial and consistency losses. This combination ensures that the model not only performs well on the tracking task but also effectively adapts to the target domain. The consistency loss regularizes the tracker’s prediction results for the same target image under different perspectives, further enhancing robustness.

MVDANT Overview

Overview of MVDANT

Results

We conducted comprehensive experiments on two challenging nighttime UAV benchmarks: NAT2021 and UAVDark70. Our MVDANT framework demonstrates superior performance compared to state-of-the-art trackers in terms of precision, normalized precision, and success rate.

Overall Performance: On the NAT2021-test set, MVDANT achieves a success rate of 0.483, outperforming the baseline tracker by 2.6%. On the UAVDark70 dataset, MVDANT achieves a success rate of 0.496, which is a 1.2% improvement over the best-performing existing tracker.

Long-term Tracking Evaluation: To validate the effectiveness of our framework in long-term tracking performance, we evaluated it on the NAT2021-L-test set. MVDANT outperformed the runner-up by 7.1% in precision, 11.0% in normalized precision, and 5.9% in success rate, demonstrating its robust performance in long-term tracking scenarios.

Attribute-Based Performance: We also assessed the robustness of our tracker against specific challenges such as illumination variation, low resolution, fast motion, and viewpoint change. MVDANT achieved a success rate of 0.521 for viewpoint change on UAVDark70 and 0.476 for fast motion on the NAT2021-test, improving the existing best performance by approximately 4.3%.

Performance Comparison

Performance Comparison on Nighttime Aerial Tracking Benchmarks

Ablation Study

To investigate the performance contributions of different components in MVDANT, we conducted ablation studies. We compared variations of our framework with different modules activated, including the adversarial multi-source domain adaptation (ADA), multi-view feature aligner (MFA), and tracker alignment (TA).

The results indicate that adding the entire MVDANT framework improved the normalized precision and success rate significantly compared to the baseline tracker. Specifically, the normalized precision increased by 26.67%, and the success rate increased by 32.01%, demonstrating the effectiveness of the added modules.

Ablation Study Results

Ablation Study Results on NAT2021-L-test

Ablation Study Results

Ablation Study Results on NAT2021-L-test

BibTeX

@article{li2023multiview,
  title={Multi-View Domain Adaptation for Nighttime Aerial Tracking},
  author={Li, Haoyang and Zheng, Guangze and Li, Sihang and Ye, Junjie and Fu, Changhong},
  journal={arXiv preprint arXiv:2310.12345},
  year={2023}
}