Spaces:

Abdullahrasheed45
/

AI_Multimodal_Web_GPU_Assistant

Running

App Files Files Community

AI_Multimodal_Web_GPU_Assistant / README.md

Abdullahrasheed45

Update README.md

b274517 verified 3 days ago

preview code

raw

history blame contribute delete

2.69 kB

metadata

title: Ministral WebGPU
emoji: ⚡️
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: apache-2.0
short_description: Frontier multimodal AI, running entirely in your browser.
app_build_command: npm run build
app_file: dist/index.html
models:
  - mistralai/Ministral-3-3B-Instruct-2512-ONNX
  - mistralai/Ministral-3-3B-Instruct-2512

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

AI Multimodal WebGPU Assistant

Developer: Muhammad Abdullah Rasheed Research Assistant @ Cambridge | MSc Data Science & AI '25 | Google WTM Scholar

Overview

This project demonstrates cutting-edge browser-based AI by running a complete 3B parameter multimodal language model entirely client-side using WebGPU acceleration. No servers, no API calls, no data sent anywhere - complete privacy and instant inference.

Key Features

Privacy-First Architecture: The entire Ministral-3B model runs locally in your browser using WebGPU - your video feed never leaves your device
Real-Time Multimodal AI: Live camera feed processing with visual question answering capabilities
WebGPU Acceleration: Leveraging the latest browser GPU APIs for near-native performance
Zero Backend Dependencies: No API keys, no server calls, no external services required
Cross-Platform: Works seamlessly across modern browsers with WebGPU support

Technical Stack

Model: Ministral-3-3B-Instruct (quantized for browser deployment)
Runtime: Transformers.js for in-browser inference
Compute: WebGPU API for GPU acceleration
Frontend: Modern JavaScript with WebAssembly integration

Use Cases

Visual question answering from live camera feed
Real-time scene understanding and description
Privacy-sensitive AI applications
Edge computing demonstrations
Educational tool for AI and browser technologies

Why This Matters

This project showcases the future of AI deployment - moving powerful language models from cloud servers to the edge, where they can provide instant, private, and accessible intelligence without compromising user privacy or requiring expensive infrastructure.

Author

Muhammad Abdullah Rasheed
Research Assistant | AI & Machine Learning Researcher

🎓 MSc Data Science & AI '25, Google WTM Scholar
🔬 Research areas: Computer Vision, NLP, Climate AI
💼 Experience: Gesture Recognition, Backend Development, ML Engineering
🔗 LinkedIn | GitHub | HuggingFace

License

Apache-2.0