Training details and a stable version of the model will be released soon. In the meantime, feel free to give this model a try.

Model details

Tarka-Embedding-10M-V1 has the following features:

  • Model Type: Text Embedding
  • Supported Languages: English.
  • Number of Paramaters: 10M
  • Context Length: Optimal performance is observed with inputs under 1K tokens
  • Embedding Dimension: 1024

Training Details

  • Initialization: Based on Qwen3/Qwen3-Embedding-0.6B
  • Architecture Modifications: The tokenizer is replaced with modernbert tokenizer . We use SVD decomposition with rank of 64 for the compression of both Transformer layers and the embedding layer.
  • Teacher Model: Qwen3/Qwen3-Embedding-0.6B
Model Number of Parameters (B) Embedding Dimensions Mean (Task) Mean (TaskType) Classification Clustering Pair Classification Reranking Retrieval STS Summarization
gte-micro 0.017 384 53.89 52.5 67.47 41.86 80.76 43.16 27.66 77.86 28.76
Wartortle 0.017 384 54.11 52.64 70.31 40.56 80.72 42.18 26.91 78.52 29.31
Bulbasaur 0.017 384 57.75 55.19 72.89 42.51 82.73 44.63 36.96 78.84 27.76
gte-micro-v4 0.019 384 58.9 56.04 73.04 43.89 82.67 44.78 39.51 79.78 28.59
all-MiniLM-L6-v2 0.023 384 59.03 55.93 69.25 44.9 82.37 47.14 42.92 78.95 25.96
snowflake-arctic-embed-xs 0.023 384 59.77 56.12 67 42.44 81.33 45.26 52.65 76.21 27.96
Tarka-Embedding-10M-V1 0.010 1024 58.15 55.19 74.05 44.66 77.27 42.69 39.98 76.21 31.5
Downloads last month
19
Safetensors
Model size
10.5M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Tarka-AIR/Tarka-Embedding-10M-V1-Preview