№01
cs.AI arxiv:2605.13652v1

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

Namrata Shivagunde, Vijeta Deshpande, Sherin Muckatira et al.

This paper moves beyond simple perplexity comparisons to geometrically and spectrally analyze the solutions produced by five distinct low-rank pre-training methods against full-rank training. The core contribution is a rigorous characterization of how rank constraints alter the learned internal representations and loss…

9
№02
cs.AI arxiv:2605.13825v1

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodríguez Salgado

This paper introduces **HistoryAnchor-100**, a benchmark to test if prior harmful actions steer Large Language Models (LLMs) toward continued unsafe behavior. The core finding is that frontier LLMs, even highly aligned ones, exhibit a striking vulnerability: a simple instruction to "stay consistent with the prior histo…

9
№03
cs.AI arxiv:2605.13625v1

How to Interpret Agent Behavior

Jie Gao, Kaiser Sun, Jen-tse Huang et al.

This paper introduces **ACT*ONOMY**, a novel, three-level hierarchical taxonomy (10 actions, 46 subactions, 120 leaf categories) designed to systematically describe and analyze the runtime behavior of autonomous agents from their natural-language traces. The core contribution is providing a structured framework, couple…

9
№04
cs.AI arxiv:2605.13579v1

Position: Assistive Agents Need Accessibility Alignment

Jie Hu, Changyuan Yan, Yu Zheng et al.

This paper argues that current assistive AI systems fail BVI users because they are designed assuming sighted interaction and low-cost verification. The core contribution is introducing the concept of **accessibility alignment** as a first-class design objective, rather than a usability afterthought. The authors propos…

9
№05
cs.AI arxiv:2605.13737v1

Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

Trung Nguyen Quang, Yiming Gao, Fanyi Pu et al.

This paper introduces IMAVB, a benchmark to test if omnimodal LLMs can detect contradictions between a textual premise and their own sensory input (vision/audio). The core finding is a "Representation-Action Gap": models reliably encode these premise-perception mismatches in their internal states but almost always fail…

9
№06
cs.AI arxiv:2605.13537v1

Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

Ye Wang, Jing Liu, Toshiaki Koike-Akino

This paper introduces **SLOP (Sharpened Logarithmic Opinion Pool)**, an extension of inference-time alignment that generalizes techniques to combine ensembles of generative reward models using temperature-adjusted reference models. The core contribution is a novel algorithm for calibrating the SLOP weight parameters to…

9
№07
cs.AI arxiv:2605.13772v1

Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry

Tyler Alvarez, Ali Baheri

This paper introduces a novel method for detecting step-level hallucinations in LLM reasoning by analyzing the geometry of the hidden-state trajectory during a single forward pass. The core idea is that correct reasoning follows a stable manifold, and the first error manifests as a localized excursion in transport cost…

9
№08
cs.CL arxiv:2605.13839v1

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

Wenrui Bao, Huan Wang, Jian Wang et al.

This paper introduces TFlow, a novel weight-space communication framework for multi-agent LLMs that replaces costly natural language message passing with direct weight updates. The core method involves frozen sender agents generating internal activations, which a learned parameter generator maps into low-rank LoRA pert…

9
№09
cs.AI arxiv:2605.13548v1

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

Daojie Peng, Fulong Ma, Jiahang Cao et al.

This paper introduces **AttenA+**, a framework designed to address the "action inequality" in robotic foundation models where all actions are treated equally during training. AttenA+ rectifies this by implementing a **velocity-driven action attention mechanism** that dynamically reweights the training objective, priori…

8
№10
cs.AI arxiv:2605.13709v1

Children's English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety

Qian Shen, Fanghua Cao, Min Yao et al.

This paper introduces a method for generating controllable and age-appropriate children's English reading stories by **supervised fine-tuning compact (8B-parameter) LLMs** using expert-designed curriculum data. The core contribution is demonstrating that **fine-tuning prioritizes controllability and affordability over …

8
№11
cs.AI arxiv:2605.13540v1

Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models

Haonan Yuan, Qingyun Sun, Junhua Shi et al.

This paper introduces **DyGFM**, a novel Dynamic Graph Foundation Model designed for multi-domain generalization. The core method employs a **decoupled and divergence-conditioned prompting** strategy: a dual-branch pre-training disentangles transferable semantics from domain-specific temporal dynamics, and a divergence…

8
№12
cs.AI arxiv:2605.13841v1

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz et al.

EVA-Bench is a novel end-to-end framework designed to evaluate voice agents by addressing two key challenges: generating realistic, multi-turn audio conversations and comprehensively measuring quality. It achieves realistic simulation through bot-to-bot orchestration with automatic error detection and regeneration. The…

8
№13
cs.AI arxiv:2605.13821v1

Harnessing Agentic Evolution

Jiayi Zhang, Yongfeng Gu, Jianhao Ruan et al.

This paper introduces **AEvo**, a harnessed meta-editing framework for agentic evolution. It models the evolution process as an interactive environment where the accumulated context acts as the state. The core contribution is using a **meta-agent to observe this state and edit the underlying evolution procedure** itsel…

8
№14
cs.AI arxiv:2605.13542v1

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Chengzhi Shen, Weixiang Shen, Tobias Susetzky et al.

This paper introduces **RealICU**, a novel benchmark designed to evaluate LLMs on long-context ICU data by moving beyond imitating potentially suboptimal past clinician actions. Its core contribution is using **hindsight annotations** created by senior physicians reviewing the *full* patient trajectory to establish mor…

8
№15
cs.AI arxiv:2605.13725v1

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

Yitian Yang, Yiqun Duan, Linghan Huang et al.

ScioMind introduces a cognitively grounded framework for LLM-based multi-agent social simulation, bridging fixed rules and unconstrained LLM interaction. Its core method integrates a belief update rule modulated by personality-conditioned anchoring strength, a hierarchical memory for experience-driven belief formation,…

8
№16
cs.AI arxiv:2605.13846v1

WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

Ziheng Zhang, Yunzhong Hou, Naijing Liu et al.

WARDEN is a system designed to transcribe and translate the endangered Wardaman language into English using only 6 hours of training data. It addresses the low-resource challenge by employing a two-stage pipeline: a dedicated model for audio-to-phonemic transcription, followed by a separate model for transcription-to-E…

8
№17
cs.LG arxiv:2605.13740v1

Learning POMDP World Models from Observations with Language-Model Priors

Valentin Six, Frederik Panse, Mathis Fajeau et al.

This paper introduces **Pinductor**, a method that leverages **Large Language Model (LLM) priors** to learn **Partially-Observable Markov Decision Process (POMDP) world models** from limited observation-action trajectories. Pinductor uses the LLM to propose and iteratively refine candidate POMDP models based on a belie…

8
№18
cs.LG arxiv:2605.13711v1

MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

Hsing-Huan Chung, Shijun Li, Yoav Wald et al.

MILM addresses multimodal irregular time series (MITS) by converting them into time-ordered XML triplets to leverage Large Language Models (LLMs). The core method involves a two-stage fine-tuning strategy: first, training the LLM solely on sampling patterns (with redacted values) to learn temporal structure, and second…

8
№19
cs.LG arxiv:2605.13681v1

Sampling from Flow Language Models via Marginal-Conditioned Bridges

Iskander Azangulov, Leo Zhang

This paper introduces a novel sampling method for Flow Language Models (FLMs) that leverages their unique structure where each denoising block yields a posterior marginal distribution over the clean token. Instead of collapsing to a single conditional mean, the proposed "marginal-conditioned bridge" sampler works by it…

8
№20
cs.CL arxiv:2605.13793v1

An LLM-Based System for Argument Reconstruction

Paulo Pirozelli, Victor Hugo Nascimento Rocha, Fabio G. Cozman et al.

This paper introduces an end-to-end LLM-based system designed to reconstruct natural language arguments into abstract argument graphs. The system employs a multi-stage pipeline to identify argumentative components (premises and conclusions) and their logical relations (support, attack, undercut). Its contribution lies …

8