№01
cs.AI arxiv:2606.02372v1

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

Youwei Liu, Jian Wang, Hanlin Wang et al.

COMAP proposes a novel framework where textual world models and agent policies co-evolve through closed-loop interaction. The agent uses the world model to predict future states for candidate actions and refines its choice based on the predicted feedback's estimated reliability. This process leverages on-policy traject…

10
№02
cs.AI arxiv:2606.02444v1

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

Giulia Pucci, Emily Hemendinger, Ruizhe Li et al.

This paper systematically evaluates how Large Language Models (LLMs) respond to eating disorder (ED) queries, focusing on the risk of models uncritically adapting to unsafe user requests. By consulting with clinical experts, the authors identify specific linguistic cues in prompts that increase the likelihood of harmfu…

9
№03
cs.AI arxiv:2606.02449v1

HLL: Can Agents Cross Humanity's Last Line of Verification?

Xinhao Song, Su Su, Sirui Song et al.

This paper introduces **HLL (Humanity's Last Line of Verification)**, a controlled benchmark designed to test whether multimodal AI agents can successfully navigate and solve interactive CAPTCHAs, which serve as a critical defense against automation. The core method involves evaluating agents in a closed-loop GUI envir…

9
№04
cs.AI arxiv:2606.02484v1

Iteris: Agentic Research Loops for Computational Mathematics

Leheng Chen, Zihao Liu, Wanyi He et al.

Iteris is an agentic research system specifically designed to tackle open problems in computational mathematics, which require a mix of proof, numerical experimentation, and algorithm design. The core method involves creating an autonomous loop where the AI generates evidence, constructions, and proof drafts. This syst…

9
№05
cs.AI arxiv:2606.02470v1

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

Wenhao Wang, Peizhi Niu, Gongyi Zou et al.

This paper introduces **MCP-Persona**, the first benchmark specifically designed to evaluate LLM agents using **Model Context Protocol (MCP)** tools in real-world, personalized application settings (e.g., social media, collaboration suites). The core method involves creating a benchmark that moves beyond generic tools …

9
№06
cs.AI arxiv:2606.02578v1

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

Seojeong Park, Jiho Choi, Junyong Kang et al.

This paper addresses **Perceptual Judgment Bias** in multimodal LLM judges, where models favor plausible text over correct visual evidence. The core method involves creating a **Perceptually Perturbed Judgment Dataset** using minimal visual counterfactuals to isolate perceptual errors. This dataset then trains a unifie…

9
№07
cs.AI arxiv:2606.02359v1

MOC: Multi-Order Communication in LLM-based Multi-Agent Systems

Yao Guan, Lin Wang, Zhihu Lu et al.

This paper introduces the **Multi-Order Communication (MOC)** scheme to improve message exchange in LLM-based multi-agent systems. MOC addresses the limitations of simple neighbor communication by constructing a **structured multi-order evidence stream** to capture multi-hop dependencies. It further employs a **Semanti…

9
№08
cs.AI arxiv:2606.02388v1

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu et al.

This paper introduces PaW, a Policy and World Modeling co-training framework that integrates world model supervision directly into the standard reinforcement learning (RL) process for language agents. PaW leverages the on-policy transitions generated during RL to simultaneously train the policy and a world model, avoid…

9
№09
cs.AI arxiv:2606.02322v1

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Ran Liu, Min Yu, Mingqi Liu et al.

This paper introduces **AdvCL**, a continual learning method that repurposes adversarial perturbations as a geometric control signal for stable adaptation. It employs three plug-in modules—Intra-Smooth, Proto-Clip, and Inter-Align—to promote local smoothness, prevent over-alignment, and guide directional alignment betw…

9
№10
cs.AI arxiv:2606.02530v1

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

Hao Li, Jingkun An, Zijun Song et al.

SafeSteer addresses the alignment tax by proposing localized on-policy distillation, focusing only on safety-critical tokens. It first creates a safety teacher via activation steering and then uses a token selection algorithm to restrict the distillation's KL penalty to these specific tokens. This method effectively im…

9
№11
cs.AI arxiv:2606.02544v1

SimSD: Simple Speculative Decoding in Diffusion Language Models

Junxia Cui, Haotian Ye, Runchu Tian et al.

SimSD introduces a novel speculative decoding method specifically for diffusion language models (dLLMs) to leverage the speedup achieved by standard token-level speculation. The core method involves a plug-and-play masking strategy that modifies the dLLM's attention mechanism to provide temporally valid, causal context…

9
№12
cs.AI arxiv:2606.02355v1

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

Zhongyu He, Yuanfan Li, Fei Huang et al.

SIRI proposes a three-phase framework to train LLM agents to discover, validate, and internalize reusable skills internally, eliminating the need for external skill generators or inference-time skill banks. The method involves initial policy warm-up, self-skill mining using the agent's own successful trajectories, and …

9
№13
cs.AI arxiv:2606.02380v1

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Yuyan Bu, Haowei Li, Qirui Zheng et al.

SPADE-Bench is introduced to evaluate spontaneous strategic deception in AI agents, defined as the divergence between an agent's self-reported plan and its actual executed actions. The benchmark's core method involves simultaneously integrating actual tool execution with controlled pressure scenarios to rigorously test…

9
№14
cs.LG arxiv:2606.02528v1

Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation

Wenbin Wu

This paper audits frontier Large Language Models (LLMs) for asset-specific biases, focusing on Bitcoin representations. The core method involves a three-level protocol: a behavioral audit showing frame-dependent rankings, internal analysis identifying a dominant, Bitcoin-selective feature within the model's sparse auto…

9
№15
cs.LG arxiv:2606.02423v1

Investigating and Alleviating Harm Amplification in LLM Interactions

Ruohao Guo, Wei Xu, Alan Ritter

This paper introduces **HarmAmp**, a novel benchmark designed to evaluate harm amplification in multi-turn LLM interactions across twelve real-world risk categories. The core contribution is demonstrating how LLMs can democratize expertise and scale harmful operations over extended conversations. To address this, the a…

9
№16
cs.LG arxiv:2606.02288v1

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

Yung-Chin Chen, Chung Peng Lee, Ze-Wei Liou et al.

This paper argues that massive LLM activation spikes are not scalar biases, but rather the scalar manifestation of rigid, structural vector biases carried by specific tokens. The authors show these vectors are preserved by projection weight coordination ($W_Q, W_K, W_V$) and resist RoPE perturbations by localizing in "…

9
№17
cs.LG arxiv:2606.02437v1

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Mind Lab, :, Song Cao et al.

This paper reframes Parameter-Efficient Fine-Tuning (PEFT) as a method for creating persistent, local "personal models" built upon strong shared foundation models. The core contribution is exploring the scaling implications (Up, Down, Out) of using small, instance-specific adapters to encode unique behaviors, preferenc…

9
№18
cs.CL arxiv:2606.02502v1

CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning

Jun-Tao Tang, Zhen-Hao Xie, Yu-Cheng Shi et al.

CRAM addresses Multimodal Continual Instruction Tuning (MCIT) by employing an architecture that isolates task-specific patterns into independent modules to mitigate catastrophic forgetting. It enhances parameter efficiency by using adaptive-rank instantiation to dynamically allocate only the necessary parameters based …

9
№19
cs.CL arxiv:2606.02404v1

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Nahyun Lee, Dongkeun Yoon, Guijin Son et al.

This paper introduces **K-BrowseComp**, a novel web-browsing agent benchmark specifically grounded in Korean contexts to address the scarcity of such resources. The benchmark comprises 400 problems, including a 300-problem manually verified subset, revealing a significant performance drop for frontier LLMs compared to …

9
№20
cs.CL arxiv:2606.02320v1

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Xinkai Ma, Zhiqi Bai, Dingling Zhang et al.

This paper introduces **TVIR (Text--Visual Interleaved Report Generation)**, a novel benchmark and framework addressing the lack of visual grounding in deep research agent evaluations. TVIR comprises **TVIR-Bench**, 100 multimodal tasks requiring visual elements for analysis, and **TVIR-Agent**, a hierarchical multi-ag…

9