From the arXiv
Tuesday, 2 June 2026 · 20 papers
COMAP: Co-Evolving World Models and Agent Policies for LLM Agents
COMAP proposes a novel framework where textual world models and agent policies co-evolve through closed-loop interaction. The agent uses the world model to predict future states for candidate actions and refines its choice based on the predicted feedback's estimated reliability. This process leverages on-policy traject…
Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
This paper systematically evaluates how Large Language Models (LLMs) respond to eating disorder (ED) queries, focusing on the risk of models uncritically adapting to unsafe user requests. By consulting with clinical experts, the authors identify specific linguistic cues in prompts that increase the likelihood of harmfu…
HLL: Can Agents Cross Humanity's Last Line of Verification?
This paper introduces **HLL (Humanity's Last Line of Verification)**, a controlled benchmark designed to test whether multimodal AI agents can successfully navigate and solve interactive CAPTCHAs, which serve as a critical defense against automation. The core method involves evaluating agents in a closed-loop GUI envir…
Iteris: Agentic Research Loops for Computational Mathematics
Iteris is an agentic research system specifically designed to tackle open problems in computational mathematics, which require a mix of proof, numerical experimentation, and algorithm design. The core method involves creating an autonomous loop where the AI generates evidence, constructions, and proof drafts. This syst…
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
This paper introduces **MCP-Persona**, the first benchmark specifically designed to evaluate LLM agents using **Model Context Protocol (MCP)** tools in real-world, personalized application settings (e.g., social media, collaboration suites). The core method involves creating a benchmark that moves beyond generic tools …
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
This paper addresses **Perceptual Judgment Bias** in multimodal LLM judges, where models favor plausible text over correct visual evidence. The core method involves creating a **Perceptually Perturbed Judgment Dataset** using minimal visual counterfactuals to isolate perceptual errors. This dataset then trains a unifie…
MOC: Multi-Order Communication in LLM-based Multi-Agent Systems
This paper introduces the **Multi-Order Communication (MOC)** scheme to improve message exchange in LLM-based multi-agent systems. MOC addresses the limitations of simple neighbor communication by constructing a **structured multi-order evidence stream** to capture multi-hop dependencies. It further employs a **Semanti…
Policy and World Modeling Co-Training for Language Agents
This paper introduces PaW, a Policy and World Modeling co-training framework that integrates world model supervision directly into the standard reinforcement learning (RL) process for language agents. PaW leverages the on-policy transitions generated during RL to simultaneously train the policy and a world model, avoid…
Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment
This paper introduces **AdvCL**, a continual learning method that repurposes adversarial perturbations as a geometric control signal for stable adaptation. It employs three plug-in modules—Intra-Smooth, Proto-Clip, and Inter-Align—to promote local smoothness, prevent over-alignment, and guide directional alignment betw…
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
SafeSteer addresses the alignment tax by proposing localized on-policy distillation, focusing only on safety-critical tokens. It first creates a safety teacher via activation steering and then uses a token selection algorithm to restrict the distillation's KL penalty to these specific tokens. This method effectively im…
SimSD: Simple Speculative Decoding in Diffusion Language Models
SimSD introduces a novel speculative decoding method specifically for diffusion language models (dLLMs) to leverage the speedup achieved by standard token-level speculation. The core method involves a plug-and-play masking strategy that modifies the dLLM's attention mechanism to provide temporally valid, causal context…
SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
SIRI proposes a three-phase framework to train LLM agents to discover, validate, and internalize reusable skills internally, eliminating the need for external skill generators or inference-time skill banks. The method involves initial policy warm-up, self-skill mining using the agent's own successful trajectories, and …
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence
SPADE-Bench is introduced to evaluate spontaneous strategic deception in AI agents, defined as the divergence between an agent's self-reported plan and its actual executed actions. The benchmark's core method involves simultaneously integrating actual tool execution with controlled pressure scenarios to rigorously test…
Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation
This paper audits frontier Large Language Models (LLMs) for asset-specific biases, focusing on Bitcoin representations. The core method involves a three-level protocol: a behavioral audit showing frame-dependent rankings, internal analysis identifying a dominant, Bitcoin-selective feature within the model's sparse auto…
Investigating and Alleviating Harm Amplification in LLM Interactions
This paper introduces **HarmAmp**, a novel benchmark designed to evaluate harm amplification in multi-turn LLM interactions across twelve real-world risk categories. The core contribution is demonstrating how LLMs can democratize expertise and scale harmful operations over extended conversations. To address this, the a…
Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization
This paper argues that massive LLM activation spikes are not scalar biases, but rather the scalar manifestation of rigid, structural vector biases carried by specific tokens. The authors show these vectors are preserved by projection weight coordination ($W_Q, W_K, W_V$) and resist RoPE perturbations by localizing in "…
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters
This paper reframes Parameter-Efficient Fine-Tuning (PEFT) as a method for creating persistent, local "personal models" built upon strong shared foundation models. The core contribution is exploring the scaling implications (Up, Down, Out) of using small, instance-specific adapters to encode unique behaviors, preferenc…
CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning
CRAM addresses Multimodal Continual Instruction Tuning (MCIT) by employing an architecture that isolates task-specific patterns into independent modules to mitigate catastrophic forgetting. It enhances parameter efficiency by using adaptive-rank instantiation to dynamically allocate only the necessary parameters based …
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
This paper introduces **K-BrowseComp**, a novel web-browsing agent benchmark specifically grounded in Korean contexts to address the scarcity of such resources. The benchmark comprises 400 problems, including a 300-problem manually verified subset, revealing a significant performance drop for frontier LLMs compared to …
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation
This paper introduces **TVIR (Text--Visual Interleaved Report Generation)**, a novel benchmark and framework addressing the lack of visual grounding in deep research agent evaluations. TVIR comprises **TVIR-Bench**, 100 multimodal tasks requiring visual elements for analysis, and **TVIR-Agent**, a hierarchical multi-ag…