*: equal contribution
Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception | |
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP | |
Region-aware Knowledge Distillation for Efficient Image-to-Image Translation |