Semantic Segmentation of Nighttime Images Based on Cross-modal Domain Adaptation

Authors

  • Jixing Huang Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author
  • Yanhe Li Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author
  • Yuchen Zhang Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author
  • Xinyue Zhang Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author
  • Xin-yue Zhang Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author
  • Ruihan Qi Stony Brook Institute at Anhui University, Hefei, Anhui 230031, China Author

DOI:

https://doi.org/10.5281/zenodo.15582511

Keywords:

Nighttime Images Semantic Segmentation, All-weather Autonomous Perception, Event Cameras, Multimodal Cooperative Framework, Dual-branch Network, Cross-modal Contrastive Loss (CMCL), Hybrid Gaussian Kernel MMD Loss, Dynamic Confidence Screening (DCS), Pseudo-label Noise Suppression, Low-light Noise

Abstract

Semantic segmentation of nighttime images is crucial for all-weather autonomous perception but faces challenges like low-light noise, motion blur, and cross-domain adaptation limitations. Traditional visible-light methods suffer from sensor constraints (60 dB dynamic range), causing information loss in extreme darkness (<1 lux), while domain adaptation approaches degrade due to day-night noise distribution shifts. This work introduces event cameras (140 dB range, μs-level response) to establish a multimodal cooperative framework. A dual-branch network decouples visible content features and event-based motion features, optimized by cross-modal contrastive loss (CMCL) and hybrid Gaussian kernel MMD loss for modality alignment and domain matching. A dynamic confidence screening (DCS) mechanism integrates optical flow consistency and Bayesian uncertainty to suppress pseudo-label noise (18.5% false detection reduction). Evaluations on DSEC/MVSEC datasets demonstrate 21.3% mIoU gain in extreme low-light, 34.5% boundary IoU improvement in blurred regions, and 14.2% superior cross-domain adaptation (day→night) over state-of-the-art methods. This framework offers a label-efficient and robust solution for nighttime autonomous driving systems, advancing multimodal sensing deployment.

Downloads

Download data is not yet available.

References

L. Hoyer et al., "DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 1–12.

I. Alonso et al., "EV-SegNet: Semantic Segmentation for Event-Based Cameras," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2019, pp. 1–10.

M. Gehrig et al., "DSEC: A Stereo Event Camera Dataset for Driving Scenarios," IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 4947–4954, Jul. 2021.

K. Zuiderveld, "Contrast Limited Adaptive Histogram Equalization," in Graphics Gems IV, Academic Press, 1994, pp. 474–485.

J. L. Starck et al., "The Curvelet Transform for Image Denoising," IEEE Trans. Image Process., vol. 11, no. 6, pp. 670–684, Jun. 2002.

X. Guo et al., "LIME: Low-Light Image Enhancement via Illumination Map Estimation," IEEE Trans. Image Process., vol. 25, no. 9, pp. 3983–3996, Sep. 2016.

C. Wei et al., "RetinexNet: A Deep Learning Approach for Low-Light Image Enhancement," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 340–356.

C. Guo et al., "Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1780–1789.

Y. Jiang et al., "EnlightenGAN: Deep Light Enhancement Without Paired Supervision," IEEE Trans. Image Process., vol. 30, pp. 2340–2349, 2021, doi: 10.1109/TIP.2021.3051462.

C. Sakaridis et al., "Guided Curriculum Model Adaptation for Semantic Nighttime Segmentation," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2018, pp. 1–10.

R. Wang et al., "Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4225–4238, Aug. 2022.

Q. Ha et al., "MFNet: Towards Real-Time Semantic Segmentation for Autonomous Vehicles with Multi-Spectral Scenes," in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 5108–5115.

T. Huang et al., "CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation," IEEE Trans. Image Process., vol. 31, pp. 7313–7325, 2022.

Y. Zhang et al., "Cross-Modal Collaborative Representation Learning for Nighttime Semantic Segmentation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1–10.

Y. Li et al., "GATE-Net: Gated Adaptive Transfer for Weakly Supervised Infrared Semantic Segmentation," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 1–10.

C. Sakaridis et al., "Semantic Nighttime Segmentation via Thermal-to-Visible Image Translation," in Proc. AAAI Conf. Artif. Intell., 2019, vol. 33, pp. 1–8.

C. Wei et al., "RetinexNet: A Deep Learning Approach for Low-Light Image Enhancement," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 340–356.

C. Guo et al., "Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1780–1789.

T. Huang et al., "CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation," IEEE Trans. Image Process., vol. 31, pp. 7313–7325, 2022.

C. Sakaridis et al., "Guided Curriculum Model Adaptation for Semantic Nighttime Segmentation," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2018, pp. 1–10.

Y. Zhang et al., "Cross-Modal Collaborative Representation Learning for Nighttime Semantic Segmentation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1–10.

G. Gallego et al., "Event-Based Vision: A Survey," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2022.

G. Gallego et al., "Event Cameras," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2022.

T.-H. Vu et al., "ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 1–10.

Y. Zou et al., "Nighttime Scene Parsing via Unsupervised Domain Adaptation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4438–4453, Aug. 2022.

Y. Tian et al., "RGB-Event Fusion for High Temporal Resolution Semantic Segmentation," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 1–16.

M. Long et al., "Learning Transferable Features with Deep Adaptation Networks," in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 97–105.

Y. Gal et al., "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning," in Proc. Int. Conf. Mach. Learn. (ICML), 2016, pp. 1050–1059.

W. Tranheden et al., "DACS: Domain Adaptation via Cross-Domain Mixed Sampling," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 1–10.

L. Wang et al., "Event-Based High Dynamic Range Image Recovery," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 1–10.

L. Wang et al., "Event-Based High Dynamic Range Image Recovery," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 1–10.

M. Gehrig et al., "DSEC: A Stereo Event Camera Dataset for Driving Scenarios," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 1–10.

A. Vaswani et al., "Attention Is All You Need," in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 5998–6008.

M. Gehrig et al., "DSEC: A Stereo Event Camera Dataset for Driving Scenarios," IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 4947–4954, Jul. 2021.

V. Sandfort et al., "Data Augmentation Using Generative Adversarial Networks (CycleGAN) to Improve Generalizability in CT Segmentation Tasks," Sci. Rep., vol. 9, no. 1, p. 16884, Nov. 2019.

M. Gehrig et al., "E-RAFT: Dense Optical Flow from Event Cameras," in Proc. Int. Conf. 3D Vis. (3DV), 2021, pp. 197–206.

G. Ros et al., "The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3234–3243.

H.-H. Nguyen et al., "Real-Time Semantic Segmentation on Edge Devices with NVIDIA Jetson AGX Xavier," in Proc. IEEE Int. Conf. Consum. Electron.-Asia (ICCE-Asia), 2022, pp. 1–4.

Downloads

Published

2025-06-03

Data Availability Statement

CRediT authorship contribution statement

Jixing Huang: Writing – original draft, Methodology, Conceptualization. Yanhe Li: Writing – original draft, Data curation. Ruihan Qi: Writing – review & editing, Supervision, Investigation. Yuchen Zhang: Supervision. Xinyue Zhang: Supervision. Xin-yue Zhang: Supervision, Modification.

Issue

Section

Articles

How to Cite

Jixing Huang, Yanhe Li, Yuchen Zhang, Xinyue Zhang, Xin-yue Zhang, & Ruihan Qi. (2025). Semantic Segmentation of Nighttime Images Based on Cross-modal Domain Adaptation. Global Academic Frontiers, 3(2), 106-129. https://doi.org/10.5281/zenodo.15582511