Publications

(2024). Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight. Tech report.
(2024). Towards Better Vision-Inspired Vision-Language Models. In CVPR.
(2024). Res-tuning: A flexible and efficient tuning paradigm via unbinding tuner from backbone. In NeurIPS.
(2023). Towards Real-World Visual Tracking with Temporal Contexts. In TPAMI.
(2023). Temporally-Adaptive Models for Efficient Video Understanding. Tech report.