Fine-tuning Small Model (Qwen3-0.6B) for Domain Knowledge + Reasoning: Seeking Optimization Advice

magic2313 · December 15, 2025, 7:16am

Background & Goal

I’m working with a small model (Qwen3-0.6B, <1B parameters) due to resource constraints, aiming to achieve:

1. High accuracy in domain-specific knowledge (mechanical engineering/CAD, text format)

2. Maintain general conversational ability

3. Enable reasoning capability for MCP tool selection

Current Setup

· Model: Qwen3-0.6B

· Platform: LLaMA-Factory

· Method: Fine-tuning only

Training Experiments & Results

Experiment 1: Domain Knowledge Only

Dataset:

· Chinese Mechanical engineering QA (mostly structured + unstructured)

· Format: Alpaca

o self-instruct/evol-instruction haven’t got good results due to closed-domain QA constraints

· Size: 2300 samples

Training config:

· Method: LoRA (rank=192, lower rank got lower domain accuracy)

· Cutoff length: 1024

· Epochs: 1 (lower epoch to avoid catastrophic forgetting of general ability)

Results:

· High accuracy on single-turn domain QA

· limit ability of 2-4 turn multi-turn conversations in domain

· Limited general conversation ability – sometimes model will answer general questions with domain knowledge

Experiment 2: Domain + Reasoning (1:1 ratio)

Motivation:

· Qwen3-0.6B can select MCP tools with prompting (without fine-tuning)

· After domain fine-tuning, the model lost reasoning/thinking process

· Need to restore reasoning capability

Dataset:

· Domain QA: 2300 samples

· Reasoning dataset: 2300 samples from twinkle-ai/tw-reasoning-instruct-50k

Training config:

· Method: Full fine-tuning (switched from LoRA because even rank=512 didn’t outperform full fine-tuning with increased data diversity and amount)

· Epochs: 1

Results:

· Domain knowledge accuracy dropped significantly

· General conversation improved

· Reasoning ability on reasoning-like questions

· Reasonable MCP tool selection accuracy

· Cannot maintain both strong domain knowledge AND reasoning ability

Experiment 3: Train all the domain Data

Dataset:

· Domain QA: 7,000 samples

· Reasoning: 7,000 samples

· Result: Domain knowledge accuracy degraded even more, MCP tool calling ability decrease

Experiment 4: Overfitting Attempt

· Extended domain QA length to reduce sample count (1000 samples), also reduce reasoning data(1000 samples) to keep ratio 1:1

· Trained both datasets to overfit (epochs 3-5)

· Result: High domain accuracy, some reasoning ability, no MCP tool calling

Key Questions

1. Training Strategy: Is this the inherent limitation of fine-tuning small models (<1B) on multiple datasets with these data amount ? or is there room for optimization?

2. MCP Tool Selection:Should MCP tool selection require its own dedicated training dataset in my training scenario?

Any insights on balancing multiple capabilities in resource-constrained scenarios would be greatly appreciated!

John6666 · December 15, 2025, 12:28pm

Improvements seem possible. Given size constraints, it’s unclear how much can be resolved…

sirev · December 15, 2025, 3:45pm

bro that model is too small. even if u perfect the fine tuning for that, u won’t achieve your goal. maybe u should try RAG

John6666 · December 15, 2025, 9:10pm

True… When there are no particular constraints, using the RAG mechanism allows for more accurate utilization of domain-specific knowledge.

JackJackJ · December 19, 2025, 12:25am

Hey, if you’re still working with that model or if you want to experiment with larger ones, I have some unused A100s/V100s I can let you use for a bit. Email me at jack.lee - @ - rice.edu

Topic		Replies	Views
Adding domain knowledge in LLMs via fine tuning Research	2	5759	July 23, 2023
Can a Small LLM Learn to Reason Like a Larger One? Reflection-based Fine-Tuning vs Classical SFT on LLaMA 3.2 (Java CodeGen) Research	4	257	June 20, 2025
Finetuning on a recent topic/domain Research	2	586	May 25, 2023
What’s the best strategy for fine-tuning a large language model (LLM) on domain-specific data without catastrophic forgetting? Beginners	1	41	October 11, 2025
How can LLMs be fine-tuned for specialized domain knowledge? 🤗Transformers	2	915	June 3, 2025

Fine-tuning Small Model (Qwen3-0.6B) for Domain Knowledge + Reasoning: Seeking Optimization Advice

Background & Goal

Current Setup

Training Experiments & Results

Experiment 1: Domain Knowledge Only

Experiment 2: Domain + Reasoning (1:1 ratio)

Experiment 3: Train all the domain Data

Experiment 4: Overfitting Attempt

Key Questions

Related topics