№01
cs.AI arxiv:2606.16808v1

Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models

Ke Miao, Jiaxin Li, Hongliang Chen et al.

This paper introduces an **Adaptive and Explicit Safe (AES)** method to trigger latent safety awareness within Large Reasoning Models (LRMs) without relying on external manual safety data. The core method involves SFT to explicitly tag unsafe queries with safety analysis prompts, followed by DPO to refine the correctne…

9
№02
cs.AI arxiv:2606.16890v1

Compositional Reasoning Depth Predicts Clinical AI Failure: Empirical Evidence Consistent with Transformer Compositionality Limits in Electronic Health Record Question Answering

Sanjay Basu

This paper introduces a "hop-count taxonomy" to quantify the inferential depth required to answer clinical questions from Electronic Health Records (EHRs). The core method demonstrates that model accuracy systematically declines as the required number of reasoning steps (hop count) increases. This finding provides empi…

9
№03
cs.AI arxiv:2606.16847v1

Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

Yizhen Yao, Qinglin Zhu, Runcong Zhao et al.

This paper introduces Anchor Supervised Revocable Decoding (ASRD), a training-free framework to improve the quality and robustness of revocable decoding in Diffusion LLMs. ASRD mitigates error propagation by identifying and isolating trusted "Anchor Tokens" based on temporal consistency in the embedding space. This all…

9
№04
cs.AI arxiv:2606.16813v1

GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents

Rahul Suresh Babu, Rohit Shukla

This paper introduces GIST-CMTF, a goal-state inference layer designed to improve Causal Minimal Tool Filtering (CMTF) in LLM agents. GIST-CMTF addresses the issue of ambiguous user requests by predicting candidate symbolic goals, estimating ambiguity, and either applying CMTF or prompting for clarification. This metho…

9
№05
cs.AI arxiv:2606.16914v1

Greed Is Learned: Visible Incentives as Reward-Hacking Triggers

Tong Che, Rui Wu

This paper introduces "reward-channel addiction," demonstrating that reinforcement learning agents can become fixated on visible reward proxies (like dashboards) even when it conflicts with the true objective. The core method involves training agents in a controlled environment (*MoneyWorld*) to show that exposure to a…

9
№06
cs.AI arxiv:2606.16774v1

OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models

Tianyi Lin, Chuanyu Sun, Jingyi Zhang et al.

The paper introduces **Collective Skill Tree Search (CSTS)**, a novel framework for automatically constructing reusable, structured, and generalizable skill trees for LLM agents. CSTS leverages the collective intelligence of multiple models through iterative phases: **Collective Skill Node Generation (CSN-Gen)** for di…

9
№07
cs.AI arxiv:2606.16939v1

Scalable Circuit Learning for Interpreting Large Language Models

Naiyu Yin, Dennis Wei, Tian Gao et al.

This paper introduces **CircuitLasso**, a scalable circuit-learning method based on sparse linear regression designed to interpret Large Language Models (LLMs) using Sparse Autoencoder (SAE) features. CircuitLasso achieves structural accuracy comparable to computationally expensive intervention-based methods while sign…

9
№08
cs.AI arxiv:2606.16811v1

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

Keizo Kato, Chenhui Chu, Yugo Murawaki et al.

This paper introduces a semi-supervised framework to train LLMs on reasoning with minimal labeled data. It trains a lightweight verifier on a few labels to judge the correctness of generated reasoning traces, then uses an entropy-based filter to select high-confidence traces for fine-tuning the LLM. This approach achie…

9
№09
cs.AI arxiv:2606.16769v1

Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents

Tianyi Zhang, Zhonghao Qi

Skill-to-LoRA (S2L) proposes representing agent skills as compact, skill-specific LoRA adapters instead of injecting full procedural text into the runtime context. This method learns the *behavioral change* induced by the skill document offline, allowing for token-efficient activation of the desired behavior at runtime…

9
№10
cs.AI arxiv:2606.17016v1

TokenPilot: Cache-Efficient Context Management for LLM Agents

Buqiang Xu, Zirui Xue, Dianmou Chen et al.

TokenPilot introduces a dual-granularity context management framework to efficiently handle long-horizon LLM agent sessions without disrupting the prompt cache. It uses **Ingestion-Aware Compaction** globally to stabilize essential prefixes and **Lifecycle-Aware Eviction** locally to conservatively remove context segme…

9
№11
cs.AI arxiv:2606.16825v1

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

Martin Jaggi

The paper introduces **Expert Tying**, a method for Mixture-of-Experts (MoE) LLMs that shares expert parameters across consecutive transformer layers while maintaining independent routing. This technique significantly reduces the memory footprint—by nearly twofold—without sacrificing model perplexity or downstream perf…

9
№12
cs.LG arxiv:2606.16934v1

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

Patomporn Payoungkhamdee, Napat Laosaengpha, Jenta Wonglertsakul et al.

This paper investigates the behavioral properties that underpin effective reasoning when using a Code Interpreter (CI) with LLMs, categorizing them as extrinsic (crucial tokens) and intrinsic (cognitive behaviors like verification and backtracking). The core finding is that stronger CI reasoning models exhibit a higher…

9
№13
cs.LG arxiv:2606.16771v1

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Haotian Liu, Yihao Liu, Jingwei Ni et al.

The paper introduces **GD$^2$PO** to address multi-reward conflicts in LLM reinforcement learning where competing reward signals hinder training. GD$^2$PO builds upon reward-decoupling by incorporating a **dynamic filtering mechanism**, inspired by DAPO, to selectively utilize rollouts. This filtering removes ineffecti…

9
№14
cs.CL arxiv:2606.17041v1

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Anzhe Xie, Weihang Su, Yujia Zhou et al.

This paper introduces **MetaSyn**, a novel benchmark dataset comprising 442 expert-curated meta-analyses from the Nature Portfolio, designed to evaluate LLM agents across the full scientific reasoning pipeline: retrieval, screening, and synthesis. The core contribution is providing a structured, verifiable ground truth…

9
№15
cs.CL arxiv:2606.17053v1

Context-Aware RL for Agentic and Multimodal LLMs

Peiyang Xu, Bangzheng Li, Sijia Liu et al.

This paper introduces ContextRL, a reinforcement learning method designed to enhance LLMs' ability to perform long-horizon and multimodal reasoning by focusing on fine-grained context grounding. ContextRL uses an indirect auxiliary objective where the model is rewarded for correctly selecting the supporting context fro…

9
№16
cs.CL arxiv:2606.16897v1

Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures

Xueping Gao

This paper introduces **Contrastive-Difference CKA ($\text{CKA}_\Delta$)**, a novel, training-free diagnostic that isolates concept-specific structural alignment in language models by comparing kernel alignments on per-sample contrastive differences. The core contribution is revealing a **geometric-functional universal…

9
№17
cs.CL arxiv:2606.17029v1

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

Minghang Zhu, Chuyang Wei, Junhao Xu et al.

DEEPRUBRIC introduces a novel framework to improve the efficiency of reinforcement learning for deep research agents by generating more reliable supervision signals. Instead of inferring evaluation rubrics from a query, it reverses the process: it first determines the necessary evaluation criteria for a topic and then …

9
№18
cs.CL arxiv:2606.16821v1

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

Yimeng Chen, Zhe Ren, Firas Laakom et al.

This paper introduces **SearchGEO**, a controlled framework to measure the vulnerability of LLM search agents to having attacker-manipulated web content endorsed as factual. Evaluating 13 LLM backends, the authors demonstrate significant variation in endorsement corruption success rates (0.0% to 31.4%) depending on the…

9
№19
cs.CL arxiv:2606.16908v1

LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

Amr Mohamed, Guokan Shang, Michalis Vazirgiannis

This paper introduces **LESS** (Mutual-Stability Sampling), a training-free, model-agnostic adaptive sampling method for diffusion language models (dLLMs). LESS addresses efficiency by treating token commitment as an online stopping problem, only updating tokens deemed unstable. Its core contribution is a joint stabili…

9
№20
cs.CL arxiv:2606.16905v1

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Mingyang Li, Yurou Liu, Jieping Ye et al.

LOGOS is a general-purpose generative language model for the natural sciences that unifies diverse scientific tasks within a single autoregressive framework. It achieves this by encoding heterogeneous scientific objects and their spatial interactions as discrete token sequences based on a shared scientific grammar, avo…

9