Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning Paper • 2505.16022 • Published May 21 • 4
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25 • 26
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12 • 201
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance Paper • 2510.03528 • Published Oct 3 • 17
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published Dec 18, 2024 • 21
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13 • 51
Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation? Paper • 2508.19827 • Published Aug 27 • 33
Direct Preference Optimization Using Sparse Feature-Level Constraints Paper • 2411.07618 • Published Nov 12, 2024 • 17
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published Aug 10 • 98
Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation Paper • 2508.05909 • Published Aug 8 • 21
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation Paper • 2502.21074 • Published Feb 28 • 4