title: Ministral WebGPU
emoji: ⚡️
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: Frontier multimodal AI, running entirely in your browser.
app_build_command: npm run build
app_file: dist/index.html
models:
- mistralai/Ministral-3-3B-Instruct-2512-ONNX
- mistralai/Ministral-3-3B-Instruct-2512
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
AI Multimodal WebGPU Assistant
Developer: Muhammad Abdullah Rasheed Research Assistant @ Cambridge | MSc Data Science & AI '25 | Google WTM Scholar
Overview
This project demonstrates cutting-edge browser-based AI by running a complete 3B parameter multimodal language model entirely client-side using WebGPU acceleration. No servers, no API calls, no data sent anywhere - complete privacy and instant inference.
Key Features
- Privacy-First Architecture: The entire Ministral-3B model runs locally in your browser using WebGPU - your video feed never leaves your device
- Real-Time Multimodal AI: Live camera feed processing with visual question answering capabilities
- WebGPU Acceleration: Leveraging the latest browser GPU APIs for near-native performance
- Zero Backend Dependencies: No API keys, no server calls, no external services required
- Cross-Platform: Works seamlessly across modern browsers with WebGPU support
Technical Stack
- Model: Ministral-3-3B-Instruct (quantized for browser deployment)
- Runtime: Transformers.js for in-browser inference
- Compute: WebGPU API for GPU acceleration
- Frontend: Modern JavaScript with WebAssembly integration
Use Cases
- Visual question answering from live camera feed
- Real-time scene understanding and description
- Privacy-sensitive AI applications
- Edge computing demonstrations
- Educational tool for AI and browser technologies
Why This Matters
This project showcases the future of AI deployment - moving powerful language models from cloud servers to the edge, where they can provide instant, private, and accessible intelligence without compromising user privacy or requiring expensive infrastructure.
Author
Muhammad Abdullah Rasheed
Research Assistant | AI & Machine Learning Researcher
- 🎓 MSc Data Science & AI '25, Google WTM Scholar
- 🔬 Research areas: Computer Vision, NLP, Climate AI
- 💼 Experience: Gesture Recognition, Backend Development, ML Engineering
- 🔗 LinkedIn | GitHub | HuggingFace
License
Apache-2.0