Multi-modal large language models with hierarchical visual features and deeper vision-language interactions.
Jun 1, 2024
Memory and parameter efficient fine-tuning.
Feb 13, 2024
Temporal context aware object tracker.
Aug 22, 2023