Generative Video Motion Editing with 3D Point Tracks Paper • 2512.02015 • Published 26 days ago • 2
view article Article Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers Nov 3, 2022 • 336
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation Paper • 2505.21904 • Published May 28 • 3
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Paper • 2505.24871 • Published May 30 • 23
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models Paper • 2505.24025 • Published May 29 • 27