I'm Orr Zohar 👋
My research focuses on Large Multimodal Models, with the hope of pushing these models to be capable more capable of understanding images and videos.
SmolVLM2: The tiniest video-LMM ever!
Apollo: Exploring video understanding in LMMs
- 💫 Video-STAR: Introduced a method that allows the utilization of any labeled video dataset for instruction tuning.
- 🤖 VideoAgent: A novel agent-based system that utilizes a large language model to iteratively identify and compile crucial information from long-form videos