2 23

Cheng Qian

chengq9

https://qiancheng0.github.io

qiancheng0

AI & ML interests

Agent, Tool Learning

Recent Activity

upvoted a paper 7 days ago

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

upvoted a paper 2 months ago

Multimodal Policy Internalization for Conversational Agents

upvoted a paper 2 months ago

Self-Improving LLM Agents at Test-Time

View all activity

Organizations

upvoted a paper 7 days ago

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Paper • 2512.16649 • Published 8 days ago • 22

upvoted 2 papers 2 months ago

Multimodal Policy Internalization for Conversational Agents

Paper • 2510.09474 • Published Oct 10 • 4

Self-Improving LLM Agents at Test-Time

Paper • 2510.07841 • Published Oct 9 • 9

upvoted 3 papers 3 months ago

Where LLM Agents Fail and How They can Learn From Failures

Paper • 2509.25370 • Published Sep 29 • 11

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Paper • 2509.19736 • Published Sep 24 • 12

Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts

Paper • 2509.04500 • Published Sep 2 • 4

upvoted a paper 4 months ago

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Paper • 2502.16143 • Published Feb 22 • 6

upvoted 2 papers 5 months ago

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29 • 29

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28 • 82

upvoted a paper 6 months ago

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Paper • 2507.07957 • Published Jul 10 • 79

upvoted 3 papers 7 months ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30 • 15

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published May 29 • 8

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published May 16 • 15

upvoted a collection 8 months ago

RM-R1

Collection

RM-R1: Reward Modeling as Reasoning • 16 items • Updated Jun 29 • 9

upvoted a paper 8 months ago

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 79

upvoted a collection 8 months ago

Qwen3

Collection

84 items • Updated Aug 6 • 1.52k

upvoted a paper 8 months ago

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

Paper • 2502.08820 • Published Feb 12 • 5

upvoted a collection 8 months ago

ToolRL

Collection

The ToolRL model trained for tool use through GRPO • 3 items • Updated Apr 22 • 2

upvoted 2 papers 8 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 35

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 48

Cheng Qian

AI & ML interests

Recent Activity

Organizations

chengq9's activity