We introduce MultiVSR - a large-scale dataset for multilingual visual speech recognition. MultiVSR comprises ~12,000 hours of video data paired with word-aligned transcripts from 13 languages. We ...
- The `calvin_env` conda environment must be active. - calvin_env and pybullet must be importable.