From the arXiv
Tuesday, 16 June 2026 · 20 papers
Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models
This paper introduces an **Adaptive and Explicit Safe (AES)** method to trigger latent safety awareness within Large Reasoning Models (LRMs) without relying on external manual safety data. The core method involves SFT to explicitly tag unsafe queries with safety analysis prompts, followed by DPO to refine the correctne…
Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering
This paper introduces a "hop-count taxonomy" to quantify the inferential depth required to answer clinical questions from Electronic Health Records (EHRs). The core method demonstrates that model accuracy systematically declines as the required number of reasoning steps (hop count) increases. This finding provides empi…
Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens
This paper introduces Anchor Supervised Revocable Decoding (ASRD), a training-free framework to improve the quality and robustness of revocable decoding in Diffusion LLMs. ASRD mitigates error propagation by identifying and isolating trusted "Anchor Tokens" based on temporal consistency in the embedding space. This all…
GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents
This paper introduces GIST-CMTF, a goal-state inference layer designed to improve Causal Minimal Tool Filtering (CMTF) in LLM agents. GIST-CMTF addresses the issue of ambiguous user requests by predicting candidate symbolic goals, estimating ambiguity, and either applying CMTF or prompting for clarification. This metho…
Greed Is Learned: Visible Incentives as Reward-Hacking Triggers
This paper introduces "reward-channel addiction," demonstrating that reinforcement learning agents can become fixated on visible reward proxies (like dashboards) even when it conflicts with the true objective. The core method involves training agents in a controlled environment (*MoneyWorld*) to show that exposure to a…
OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models
The paper introduces **Collective Skill Tree Search (CSTS)**, a novel framework for automatically constructing reusable, structured, and generalizable skill trees for LLM agents. CSTS leverages the collective intelligence of multiple models through iterative phases: **Collective Skill Node Generation (CSN-Gen)** for di…
Scalable Circuit Learning for Interpreting Large Language Models
This paper introduces **CircuitLasso**, a scalable circuit-learning method based on sparse linear regression designed to interpret Large Language Models (LLMs) using Sparse Autoencoder (SAE) features. CircuitLasso achieves structural accuracy comparable to computationally expensive intervention-based methods while sign…
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
This paper introduces a semi-supervised framework to train LLMs on reasoning with minimal labeled data. It trains a lightweight verifier on a few labels to judge the correctness of generated reasoning traces, then uses an entropy-based filter to select high-confidence traces for fine-tuning the LLM. This approach achie…
Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents
Skill-to-LoRA (S2L) proposes representing agent skills as compact, skill-specific LoRA adapters instead of injecting full procedural text into the runtime context. This method learns the *behavioral change* induced by the skill document offline, allowing for token-efficient activation of the desired behavior at runtime…
TokenPilot: Cache-Efficient Context Management for LLM Agents
TokenPilot introduces a dual-granularity context management framework to efficiently handle long-horizon LLM agent sessions without disrupting the prompt cache. It uses **Ingestion-Aware Compaction** globally to stabilize essential prefixes and **Lifecycle-Aware Eviction** locally to conservatively remove context segme…
Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models
The paper introduces **Expert Tying**, a method for Mixture-of-Experts (MoE) LLMs that shares expert parameters across consecutive transformer layers while maintaining independent routing. This technique significantly reduces the memory footprint—by nearly twofold—without sacrificing model perplexity or downstream perf…
Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter
This paper investigates the behavioral properties that underpin effective reasoning when using a Code Interpreter (CI) with LLMs, categorizing them as extrinsic (crucial tokens) and intrinsic (cognitive behaviors like verification and backtracking). The core finding is that stronger CI reasoning models exhibit a higher…
GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization
The paper introduces **GD$^2$PO** to address multi-reward conflicts in LLM reinforcement learning where competing reward signals hinder training. GD$^2$PO builds upon reward-decoupling by incorporating a **dynamic filtering mechanism**, inspired by DAPO, to selectively utilize rollouts. This filtering removes ineffecti…
Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio
This paper introduces **MetaSyn**, a novel benchmark dataset comprising 442 expert-curated meta-analyses from the Nature Portfolio, designed to evaluate LLM agents across the full scientific reasoning pipeline: retrieval, screening, and synthesis. The core contribution is providing a structured, verifiable ground truth…
Context-Aware RL for Agentic and Multimodal LLMs
This paper introduces ContextRL, a reinforcement learning method designed to enhance LLMs' ability to perform long-horizon and multimodal reasoning by focusing on fine-grained context grounding. ContextRL uses an indirect auxiliary objective where the model is rewarded for correctly selecting the supporting context fro…
Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures
This paper introduces **Contrastive-Difference CKA ($\text{CKA}_\Delta$)**, a novel, training-free diagnostic that isolates concept-specific structural alignment in language models by comparing kernel alignments on per-sample contrastive differences. The core contribution is revealing a **geometric-functional universal…
DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents
DEEPRUBRIC introduces a novel framework to improve the efficiency of reinforcement learning for deep research agents by generating more reliable supervision signals. Instead of inferring evaluation rubrics from a query, it reverses the process: it first determines the necessary evaluation criteria for a topic and then …
How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation
This paper introduces **SearchGEO**, a controlled framework to measure the vulnerability of LLM search agents to having attacker-manipulated web content endorsed as factual. Evaluating 13 LLM backends, the authors demonstrate significant variation in endorsement corruption success rates (0.0% to 31.4%) depending on the…
LESS Is More: Mutual-Stability Sampling for Diffusion Language Models
This paper introduces **LESS** (Mutual-Stability Sampling), a training-free, model-agnostic adaptive sampling method for diffusion language models (dLLMs). LESS addresses efficiency by treating token commitment as an online stopping problem, only updating tokens deemed unstable. Its core contribution is a joint stabili…
Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences
LOGOS is a general-purpose generative language model for the natural sciences that unifies diverse scientific tasks within a single autoregressive framework. It achieves this by encoding heterogeneous scientific objects and their spatial interactions as discrete token sequences based on a shared scientific grammar, avo…