Article

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

Through multi-scale visual prompt and post-pretrain token scaling, Chain-of-Sight achieves 3.7x pre-training speedup without performance sacrifice.

Jul 4, 2024

Temporally-Adaptive Models for Efficient Video Understanding

Temporally-adaptive convolutions and transformers.

Aug 10, 2023