№01
cs.AI arxiv:2605.07926v1

AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

Zhengkang Guo, Yiyang Li, Lin Qiu et al.

AgentEscapeBench is a novel benchmark designed to evaluate LLM agents' ability to perform complex, out-of-domain tool-grounded reasoning. It uses escape-room style tasks with long-range dependencies, requiring agents to infer and execute multi-step procedures involving real external tools and state tracking. The benchm…

9
№02
cs.AI arxiv:2605.08037v1

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

Ning Liu, Chuanneng Sun, Kristina Klinkner et al.

This paper introduces **Graph Direct Preference Optimization (GraphDPO)**, a principled generalization of DPO that moves beyond simple pairwise comparisons. GraphDPO leverages richer preference data structured as directed acyclic graphs (induced by ranked rollouts) to enforce transitivity and aggregate supervision acro…

9
№03
cs.AI arxiv:2605.07830v1

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim et al.

This paper introduces **CyBiasBench**, a comprehensive benchmark to quantify the attack-selection bias exhibited by LLM agents in cyber-attack scenarios. The core method involves systematically testing five agents across various targets and prompts to reveal that each agent disproportionately favors a narrow subset of …

9
№04
cs.AI arxiv:2605.08019v1

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews et al.

This paper investigates whether frontier Large Reasoning Models (LRMs) can mimic human learning and planning in novel game environments. The core method involves jointly evaluating LRMs against RL agents using human gameplay data, concurrent fMRI recordings, and a Bayesian model. The key contribution is demonstrating t…

9
№05
cs.AI arxiv:2605.08060v1

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Jiayuan Liu, Tianqin Li, Shiyi Du et al.

This paper introduces the "memory curse," demonstrating that expanding the context window for LLM agents systematically *erodes* cooperation in multi-agent social dilemmas. The core mechanism identified is not increased paranoia, but the degradation of forward-looking intent within the agent's reasoning traces. Restori…

9
№06
cs.AI arxiv:2605.07990v1

Tool Calling is Linearly Readable and Steerable in Language Models

Zekun Wu, Ze Wang, Seonglae Cho et al.

This paper demonstrates that the tool selection within language models is **linearly readable and steerable** by analyzing internal activations across various models. By manipulating the mean-difference between tool activation vectors, the authors can reliably **switch the model's chosen tool** (up to 100% accuracy) an…

9
№07
cs.LG arxiv:2605.07840v1

RelAgent: LLM Agents as Data Scientists for Relational Learning

Xingyue Huang, Louis Tichelman, Jinwoo Kim et al.

RelAgent is an LLM-based autonomous agent designed for relational learning, operating in two phases. First, the agent uses tools to autonomously construct feature-generating SQL programs and select a predictive model. The core contribution is that the final predictor relies solely on the executed SQL queries and a clas…

9
№08
cs.LG arxiv:2605.07977v1

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Seohyun Lee, Wenzhi Fang, Dong-Jun Han et al.

This paper introduces SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement), an efficient online learning algorithm for federated LLM fine-tuning. SPEAR enables a self-improvement loop by using incoming real-time feedback to generate naturally contrastive self-play pairs for training, without requiring offlin…

9
№09
cs.CL arxiv:2605.07883v1

Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

Ying Zhang, Congyu Qiao, Xin Geng et al.

This paper introduces **LANCE** to combat rigid rejection in LLMs by moving beyond binary refusal. LANCE uses variational inference to enhance safety labels, predicting a continuous distribution across multiple rejection categories. This fine-grained distribution provides textual gradients that guide a refinement model…

9
№10
cs.CL arxiv:2605.07982v1

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney et al.

GLiGuard reframes LLM content moderation as a schema-conditioned classification task, moving away from slow, large autoregressive models. It uses a small (0.3B parameter) bidirectional encoder that encodes task definitions and label semantics directly into the input sequence as structured schemas. This allows for the s…

9
№11
cs.CL arxiv:2605.07933v1

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov et al.

This paper introduces the Latent Diffusion Language Model (LDLM), which jointly trains a latent encoder, diffusion model, and decoder for non-autoregressive text generation. The core method involves constructing a suitable latent space by reshaping pre-trained language model representations via a trainable encoder. The…

9
№12
cs.CL arxiv:2605.07925v1

How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter, Katherine Metcalf et al.

This paper investigates the unintended consequences of value induction (fine-tuning LLMs with value-laden language) on model behavior. The authors fine-tune models using curated value subsets and measure the impact on related values, safety, anthropomorphism, and QA performance. They find that inducing specific values …

9
№13
cs.CL arxiv:2605.08083v1

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang et al.

This paper introduces **AutoTTS**, an environment-driven framework that automates the discovery of optimal Test-Time Scaling (TTS) strategies for Large Language Models (LLMs). Instead of manual heuristic design, AutoTTS creates a tractable discovery environment where a controller learns when to allocate computation (br…

9
№14
cs.AI arxiv:2605.08011v1

Abductive Reasoning with Probabilistic Commonsense

Joseph Cotnareanu, Chiara Roverato, Han Zhou et al.

This paper introduces **PACS (Probabilistic Abductive CommonSense)**, a novel framework for abductive reasoning that explicitly models the variation in human commonsense beliefs. It combines an LLM and a formal solver to sample proofs representing individual perspectives, aggregating these conclusions to determine the …

8
№15
cs.AI arxiv:2605.08063v1

Flow-OPD: On-Policy Distillation for Flow Matching Models

Zhen Fang, Wenxuan Huang, Yu Zeng et al.

Flow-OPD introduces a novel post-training framework for Flow Matching text-to-image models to overcome multi-task alignment issues like reward sparsity and gradient interference. It employs a two-stage strategy: first training specialized teacher models via single-reward fine-tuning, and then using On-Policy Distillati…

8
№16
cs.AI arxiv:2605.07865v1

KL for a KL: On-Policy Distillation with Control Variate Baseline

Minjae Oh, Sangjun Song, Gyubin Choi et al.

This paper introduces **vOPD (On-Policy Distillation with a control variate baseline)** to stabilize On-Policy Distillation (OPD) for LLMs by framing it as policy-gradient Reinforcement Learning. The core contribution is deriving a **closed-form control variate baseline** directly from the per-token negative reverse KL…

8
№17
cs.AI arxiv:2605.08013v1

Learning CLI Agents with Structured Action Credit under Selective Observation

Haoyang Su, Ying Wen

This paper introduces a novel method for training Command Line Interface (CLI) agents by leveraging the inherent structure of CLI actions for better credit assignment. The core contribution involves two mechanisms: $\sigma$-Reveal, which selectively extracts task-relevant context from partial observations, and Action A…

8
№18
cs.AI arxiv:2605.08012v1

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

Zezheng Lin, Fengming Liu

This paper argues that mechanistic interpretability research, which frequently employs causal language, often fails to explicitly state the necessary identification assumptions underpinning its causal claims. The authors audit existing literature, finding a pervasive pattern where validation metrics are presented as ca…

8
№19
cs.AI arxiv:2605.07935v1

TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples

Shuren Xia, Qiwei Li, Taqiya Ehsan et al.

TraceFix is a verification-first pipeline that uses the TLA+ model checker to iteratively repair LLM-generated coordination protocols for multi-agent systems. The method synthesizes a protocol topology, generates PlusCal logic, and uses TLA+ counterexamples to drive repairs until formal verification succeeds. This ensu…

8
№20
cs.LG arxiv:2605.07863v1

ADKO: Agentic Decentralized Knowledge Optimization

Lucas Nerone Rillo, Zhanhong Jiang, Nastaran Saadati et al.

ADKO is a framework for sample-efficient, privacy-preserving collaborative black-box optimization among autonomous agents. Agents use private Gaussian Processes and communicate only via compact "knowledge tokens" summarizing directional signals and advantage scores, avoiding raw data sharing. The paper's core contribut…

8