Spaces:

Abdullahrasheed45
/

AI_Multimodal_Web_GPU_Assistant

Running

App Files Files Community

Abdullahrasheed45 commited on 4 days ago

Commit

413bdc4

verified ·

1 Parent(s): 7b4c8a5

Update README.md

Browse files

Files changed (1) hide show

README.md +49 -11

README.md CHANGED Viewed

@@ -1,13 +1,51 @@
----
-title: AI Multimodal Web GPU Assistant
-emoji: 🌖
-colorFrom: purple
-colorTo: red
-sdk: gradio
-sdk_version: 6.1.0
-app_file: app.py
 pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+title: Ministral WebGPU
+emoji: ⚡️
+colorFrom: red
+colorTo: yellow
+sdk: static
 pinned: false
+license: apache-2.0
+short_description: Frontier multimodal AI, running entirely in your browser.
+app_build_command: npm run build
+app_file: dist/index.html
+models:
+  - mistralai/Ministral-3-3B-Instruct-2512-ONNX
+  - mistralai/Ministral-3-3B-Instruct-2512
+AI Multimodal WebGPU Assistant
+Developer: Muhammad Abdullah Rasheed Research Assistant @ Cambridge | MSc Data Science & AI '25 | Google WTM Scholar
+Overview
+This project demonstrates cutting-edge browser-based AI by running a complete 3B parameter multimodal language model entirely client-side using WebGPU acceleration. No servers, no API calls, no data sent anywhere - complete privacy and instant inference.
+Key Features
+Privacy-First Architecture: The entire Ministral-3B model runs locally in your browser using WebGPU - your video feed never leaves your device
+Real-Time Multimodal AI: Live camera feed processing with visual question answering capabilities
+WebGPU Acceleration: Leveraging the latest browser GPU APIs for near-native performance
+Zero Backend Dependencies: No API keys, no server calls, no external services required
+Cross-Platform: Works seamlessly across modern browsers with WebGPU support
+Technical Stack
+Model: Ministral-3-3B-Instruct (quantized for browser deployment)
+Runtime: Transformers.js for in-browser inference
+Compute: WebGPU API for GPU acceleration
+Frontend: Modern JavaScript with WebAssembly integration
+Use Cases
+Visual question answering from live camera feed
+Real-time scene understanding and description
+Privacy-sensitive AI applications
+Edge computing demonstrations
+Educational tool for AI and browser technologies
+Why This Matters
+This project showcases the future of AI deployment - moving powerful language models from cloud servers to the edge, where they can provide instant, private, and accessible intelligence without compromising user privacy or requiring expensive infrastructure.
+Author
+Muhammad Abdullah Rasheed
+Research Assistant | AI & Machine Learning Researcher
+🎓 MSc Data Science & AI '25, Google WTM Scholar
+🔬 Research areas: Computer Vision, NLP, Climate AI
+💼 Experience: Gesture Recognition, Backend Development, ML Engineering
+🔗 LinkedIn | GitHub | HuggingFace
+License
+Apache-2.0