Announcing LiteCoder-Terminal: Lightweight Terminal Agents with <1k Synthesized Trajectories

Community Article Published December 18, 2025

Today, we are excited to release LiteCoder-Terminal-Preview, a series of models specialized in terminal-based interactions. This release is part of our recent efforts to develop capable small and medium-sized code agent models.

Notably, LiteCoder achieves competitive results using fewer than 1,000 training samples. By relying entirely on a fully synthetic pipeline—without converting any existing datasets—we were able to secure significant gains on the challenging Terminal Bench, matching the performance of leading open-source models in the same weight class with extreme data efficiency.

Released Artifacts

2025/12/17
LiteCoder-4b-Terminal-preview	Model	https://huggingface.co/Lite-Coder/LiteCoder-4b-Terminal-preview
LiteCoder-SFT-Terminal-preview	Dataset	https://huggingface.co/datasets/Lite-Coder/LiteCoder-SFT-Terminal-preview
icip-cas/LiteCoder	Code	https://github.com/icip-cas/LiteCoder

Data Construction Pipeline

To build a robust terminal agent model, we developed a rigorous data synthesis pipeline consisting of three stages: Task Curation, Environment Preparation, and Trajectory Generation.

Task Sampling

In the first version of our data, we established a taxonomy covering seven core domains of terminal usage: ai_ml, build_tools, data_science, networking, security, system_admin, and version_control.

Based on the taxonomy, We adapt MAGPIE-like method to synthesize long-horizon agentic tasks. By feeding the model a domain-specific system message followed by the standard chat template prefix for a user turn (e.g., <|user|>), the model "autocompletes" the sequence, generating a plausible and high-quality task tailored to the specified domain.

Feasibility Check

To ensure data integrity, we employ an LLM-as-a-Judge to validate raw tasks. This stage evaluates entries against criteria—including complexity balance, clarity of specification, and resource availability—filtering out unfeasible or ambiguous tasks to maintain a high-quality task set.

Environment Preparation

Many terminal tasks (e.g., fixing a bug in an existing repo or managing git conflicts) rely on specific starting states. To address this, we utilize an agent to interactively generate the necessary starting artifacts within a Docker container. Once setup is complete, we extract the final state to serve as the initial environment for the actual task execution.

Trajectory Generation

We utilize the Harbor framework to generate trajectories based on the curated tasks using strong models as the teacher. We further filter out trajectories exhibiting looping behavior.

Implementation

We employ Kimi-K2-Instruct-0905 for task sampling and MiniMax-M2 for environment preparation and trajectory generation.

Results

Our models achieve competitive results on Terminal Bench, significantly outperforming general-purpose models of similar (and even larger) sizes.

Terminal Bench 1.0 Performance

Model	Agent	Results
LiteCoder-30a3b-Terminal-preview	Terminus 2	18.75%
LiteCoder-4b-Terminal-preview	Terminus 2	13.75%
Qwen3-30B-A3B-Instruct	Terminus 2	12.5%
Qwen3-4B-Instruct	Terminus 2	5.0%

Terminal Bench 2.0 Performance

Model	Agent	Results
LiteCoder-30a3b-Terminal-preview	Terminus 2	5.6%
LiteCoder-4b-Terminal-preview	Terminus 2	3.3%
Qwen3-32B	Terminus 2	1.9%
InternLM3-8B-Nex-N1	Terminus 2	0%
Qwen3-8B	Terminus 2	0%

Findings

Environment Adaptability: High-performing models show strong capability in interpreting system feedback (stdout/stderr) and dynamically adjusting their strategies, rather than simply following a rigid plan.
Context Maintenance: Successful agents maintain coherence over long interaction turns without losing track of the original objective.
Scaffolding Sensitivity: We identified a significant sensitivity to the agent framework. Models trained heavily within a specific scaffolding (prompt structure/tool definition) struggle to generalize when transferred to different agent frameworks. This highlights the importance of framework-agnostic training data.

Citation

@misc{LiteCoder Team,
  title={LiteCoder: Advancing Small and Medium-sized Code Agents},
  author={Xiaoxuan Peng and Xinyu Lu and Kaiqi Zhang and Taosong Fang and Boxi Cao and Yaojie Lu},
  year={2025},
}

Future Directions

Scaling Environments: Expanding the diversity of Docker environments and teacher models to improve generalization.
Agentic RL: Implementing Reinforcement Learning specifically for multi-turn agentic workflows.

Team & Contributions

Xiaoxuan Peng: Main Contributor
Xinyu Lu: Project Lead
Kaiqi Zhang: Contributor
Taosong Fang: Contributor
Boxi Cao: Contributor
Yaojie Lu: Contributor

Acknowledgements

LiteCoder builds upon multiple open-source projects, including Harbor. The models are trained using AutoAlign.

Join Us

Join the discussion on our Discord.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote