Abstract: Multimodal object detection is crucial for autonomous driving and intelligent surveillance. However, the existing methods face critical challenges, such as cross-modal semantic misalignment ...