Multi-View Domain Adaptation
for Nighttime Aerial Tracking

RAL 2023

1Tongji University, 2The University of Hong Kong 3New York University 4University of Southern California

MVDANT performs multi-view domain adaptation for nighttime aerial tracking with high precision and robustness.

Abstract

We present a multi-view domain adaptation framework for nighttime aerial tracking named MVDANT. Our approach addresses the challenges of adapting daytime tracking models to nighttime scenarios while considering multiple viewpoints. MVDANT combines multi-view knowledge fusion, feature alignment, and adversarial learning to bridge the gap between daytime and nighttime domains.

The framework includes a novel multi-view feature aligner with a transformer structure and a Transformer-based hierarchical discriminator. These components work together to capture diverse perspectives and lighting distribution knowledge, improving the robustness of tracking objects from various views.

Our experimental results demonstrate superior performance on challenging nighttime UAV benchmarks, with significant improvements in precision, normalized precision, and success rate compared to state-of-the-art trackers.

Introduction

Visual tracking is a fundamental task in intelligent unmanned aerial systems, aiming to estimate the location of an object in each frame given the initial state. It has widespread applications such as autonomous landing, aerial manipulation operations, and self-localization. However, in low-light conditions, the performance of existing trackers significantly degrades due to challenges like low illumination, high-level noise, and low contrast. This makes nighttime aerial tracking a formidable challenge.

Existing methods often rely on single-view information and neglect the significant differences in viewpoint and motion pattern disparities across different views. Moreover, shadow occlusion, uneven lighting distribution, and disruptive noise exacerbate multi-view feature differences at nighttime, leading to missed targets or tracking failures.

To address these issues, we propose MVDANT, a domain adaptation framework that utilizes aerial multi-view source domains for nighttime aerial tracking. By capturing images from multiple views in daytime scenarios, we employ multi-view domain adaptation to narrow the gap between daytime and nighttime conditions. This approach enhances the robustness and performance of UAV tracking in low-light environments.

Method

Our MVDANT framework addresses the challenges of nighttime aerial tracking by leveraging multi-view domain adaptation. The key components of our method are:

Feature Alignment

We introduce a multi-view feature aligner with a novel transformer structure. This aligner transforms low-level features into high-level features by incorporating multi-view information and semantic cues, improving feature extraction. The multi-view feature aligner consists of an encoder and decoder, which aggregate inter-dependencies between various features and enhance view-invariant features with semantic information.

Multi-View Feature Aligner

Detailed workflow of the Multi-view feature aligner

Attribute-Based Performance

Attribute-Based Performance Analysis

Tracker Alignment

For each perspective, we employ discriminators to distinguish daytime images from nighttime images, facilitating adversarial learning. By using a gradient reversal layer between the feature aligner and the discriminator, we align the feature distributions of the source and target domains. This process helps the model to generalize better to nighttime conditions.

Overall Objective

The overall training loss of our framework combines classification and regression losses with adversarial and consistency losses. This combination ensures that the model not only performs well on the tracking task but also effectively adapts to the target domain. The consistency loss regularizes the tracker’s prediction results for the same target image under different perspectives, further enhancing robustness.

MVDANT Overview

Overview of MVDANT

Results

We conducted comprehensive experiments on two challenging nighttime UAV benchmarks: NAT2021 and UAVDark70. Our MVDANT framework demonstrates superior performance compared to state-of-the-art trackers in terms of precision, normalized precision, and success rate.

Overall Performance: On the NAT2021-test set, MVDANT achieves a success rate of 0.483, outperforming the baseline tracker by 2.6%. On the UAVDark70 dataset, MVDANT achieves a success rate of 0.496, which is a 1.2% improvement over the best-performing existing tracker.

Long-term Tracking Evaluation: To validate the effectiveness of our framework in long-term tracking performance, we evaluated it on the NAT2021-L-test set. MVDANT outperformed the runner-up by 7.1% in precision, 11.0% in normalized precision, and 5.9% in success rate, demonstrating its robust performance in long-term tracking scenarios.

Attribute-Based Performance: We also assessed the robustness of our tracker against specific challenges such as illumination variation, low resolution, fast motion, and viewpoint change. MVDANT achieved a success rate of 0.521 for viewpoint change on UAVDark70 and 0.476 for fast motion on the NAT2021-test, improving the existing best performance by approximately 4.3%.

Performance Comparison

Performance Comparison on Nighttime Aerial Tracking Benchmarks

Ablation Study

To investigate the performance contributions of different components in MVDANT, we conducted ablation studies. We compared variations of our framework with different modules activated, including the adversarial multi-source domain adaptation (ADA), multi-view feature aligner (MFA), and tracker alignment (TA).

The results indicate that adding the entire MVDANT framework improved the normalized precision and success rate significantly compared to the baseline tracker. Specifically, the normalized precision increased by 26.67%, and the success rate increased by 32.01%, demonstrating the effectiveness of the added modules.

Ablation Study Results

Ablation Study Results on NAT2021-L-test

Ablation Study Results

Ablation Study Results on NAT2021-L-test

Real-World Tests

MVDANT was implemented on a typical embedded system, the NVIDIA Jetson AGX Xavier, to demonstrate its applicability in nighttime drone tracking applications in the real world. Without TensorRT acceleration, MVDANT achieves an impressive real-time speed of 31.25 frames per second (FPS). The following videos showcase our real-world tests, demonstrating the robustness and effectiveness of MVDANT in various nighttime tracking scenarios.

Real-World Test 1

Real-World Test 2

Real-World Test 3

BibTeX

@article{li2023multiview,
  title={Multi-View Domain Adaptation for Nighttime Aerial Tracking},
  author={Li, Haoyang and Zheng, Guangze and Li, Sihang and Ye, Junjie and Fu, Changhong},
  journal={arXiv preprint arXiv:2310.12345},
  year={2023}
}