MD3 Preview - Int4 Quantized (MLX)
Pre-quantized version of Moondream 3 Preview for MLX inference.
Quantization Details
- MoE Experts: int4 affine quantization (bits=4, group_size=64)
- Other weights: bf16 (unchanged)
- Memory savings: ~60% reduction in MoE weight memory
Usage
This model is designed for use with the moondream-station MLX backend.
# In moondream-station, use with:
moondream-station serve --backend mlx
Source
Quantized from moondream/moondream3-preview
- Downloads last month
- 17
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support