MD3 Preview - Int4 Quantized (MLX)

MoE Experts: int4 affine quantization (bits=4, group_size=64)
Other weights: bf16 (unchanged)
Memory savings: ~60% reduction in MoE weight memory

Pre-quantized version of Moondream 3 Preview for MLX inference.

Quantization Details

This model is designed for use with the moondream-station MLX backend.

# In moondream-station, use with:
moondream-station serve --backend mlx

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support