Abdullahrasheed45's picture
Update README.md
b274517 verified
metadata
title: Ministral WebGPU
emoji: ⚡️
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: Frontier multimodal AI, running entirely in your browser.
app_build_command: npm run build
app_file: dist/index.html
models:
  - mistralai/Ministral-3-3B-Instruct-2512-ONNX
  - mistralai/Ministral-3-3B-Instruct-2512

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

AI Multimodal WebGPU Assistant

Developer: Muhammad Abdullah Rasheed Research Assistant @ Cambridge | MSc Data Science & AI '25 | Google WTM Scholar

Overview

This project demonstrates cutting-edge browser-based AI by running a complete 3B parameter multimodal language model entirely client-side using WebGPU acceleration. No servers, no API calls, no data sent anywhere - complete privacy and instant inference.

Key Features

  • Privacy-First Architecture: The entire Ministral-3B model runs locally in your browser using WebGPU - your video feed never leaves your device
  • Real-Time Multimodal AI: Live camera feed processing with visual question answering capabilities
  • WebGPU Acceleration: Leveraging the latest browser GPU APIs for near-native performance
  • Zero Backend Dependencies: No API keys, no server calls, no external services required
  • Cross-Platform: Works seamlessly across modern browsers with WebGPU support

Technical Stack

  • Model: Ministral-3-3B-Instruct (quantized for browser deployment)
  • Runtime: Transformers.js for in-browser inference
  • Compute: WebGPU API for GPU acceleration
  • Frontend: Modern JavaScript with WebAssembly integration

Use Cases

  • Visual question answering from live camera feed
  • Real-time scene understanding and description
  • Privacy-sensitive AI applications
  • Edge computing demonstrations
  • Educational tool for AI and browser technologies

Why This Matters

This project showcases the future of AI deployment - moving powerful language models from cloud servers to the edge, where they can provide instant, private, and accessible intelligence without compromising user privacy or requiring expensive infrastructure.

Author

Muhammad Abdullah Rasheed
Research Assistant | AI & Machine Learning Researcher

  • 🎓 MSc Data Science & AI '25, Google WTM Scholar
  • 🔬 Research areas: Computer Vision, NLP, Climate AI
  • 💼 Experience: Gesture Recognition, Backend Development, ML Engineering
  • 🔗 LinkedIn | GitHub | HuggingFace

License

Apache-2.0