Tarka-AIR
/

Tarka-Embedding-10M-V1-Preview

Feature Extraction

sentence-transformers

sentence-similarity

text-embeddings-inference

Model card Files Files and versions

Training details and a stable version of the model will be released soon. In the meantime, feel free to give this model a try.

Model details

Tarka-Embedding-10M-V1 has the following features:

Model Type: Text Embedding
Supported Languages: English.
Number of Paramaters: 10M
Context Length: Optimal performance is observed with inputs under 1K tokens
Embedding Dimension: 1024

Training Details

Initialization: Based on Qwen3/Qwen3-Embedding-0.6B
Architecture Modifications: The tokenizer is replaced with modernbert tokenizer . We use SVD decomposition with rank of 64 for the compression of both Transformer layers and the embedding layer.
Teacher Model: Qwen3/Qwen3-Embedding-0.6B

Model	Number of Parameters (B)	Embedding Dimensions	Mean (Task)	Mean (TaskType)	Classification	Clustering	Pair Classification	Reranking	Retrieval	STS	Summarization
gte-micro	0.017	384	53.89	52.5	67.47	41.86	80.76	43.16	27.66	77.86	28.76
Wartortle	0.017	384	54.11	52.64	70.31	40.56	80.72	42.18	26.91	78.52	29.31
Bulbasaur	0.017	384	57.75	55.19	72.89	42.51	82.73	44.63	36.96	78.84	27.76
gte-micro-v4	0.019	384	58.9	56.04	73.04	43.89	82.67	44.78	39.51	79.78	28.59
all-MiniLM-L6-v2	0.023	384	59.03	55.93	69.25	44.9	82.37	47.14	42.92	78.95	25.96
snowflake-arctic-embed-xs	0.023	384	59.77	56.12	67	42.44	81.33	45.26	52.65	76.21	27.96
Tarka-Embedding-10M-V1	0.010	1024	58.15	55.19	74.05	44.66	77.27	42.69	39.98	76.21	31.5

Downloads last month: 19

Safetensors

Model size

10.5M params

Tensor type

BF16

·

Collection including Tarka-AIR/Tarka-Embedding-10M-V1-Preview

Tarka Embed V1

Efficient DFKD embeddings for language understanding • 5 items • Updated 1 day ago • 6