№01
cs.AI arxiv:2606.06448v1

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

Yasmine Omri, Ziyu Gan, Zachary Broveak et al.

This paper presents the first systems characterization of memory management in long-horizon LLM agents. The authors introduce a taxonomy to classify memory systems and develop a profiling harness to attribute costs across memory construction, retrieval, and generation phases. Their analysis of ten systems reveals how d…

9
№02
cs.AI arxiv:2606.06462v1

Benchmark Everything Everywhere All at Once

Shiyun Xiong, Dongming Wu, Peiwen Sun et al.

This paper introduces **Benchmark Agent**, a fully autonomous agentic system designed to automate the entire pipeline of benchmark construction, addressing the labor-intensive and unsustainable nature of current methods. The core contribution is a scalable framework that handles everything from query analysis and subta…

9
№03
cs.AI arxiv:2606.06388v1

Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

Jiaju Chen, Yuxuan Lu, Jiayi Su et al.

The paper introduces **ALMANAC**, a novel dataset designed to advance agent collaboration capabilities beyond mere task completion. It provides **action-level mental model annotations** derived from human dyadic routing tasks, capturing participants' internal reasoning, intentions, and shared goals at each step. This r…

9
№04
cs.AI arxiv:2606.06315v1

LLM Self-Recognition: Steering and Retrieving Activation Signatures

Thibaud Ardoin, Jonas Schäfer, Gerhard Wunder

This paper introduces a method to reliably attribute text to a specific Large Language Model (LLM) by steering its internal residual stream with a random sparse vector during generation, creating a detectable "activation signature." This signature acts as a fingerprint that a separate LLM detector can recover with high…

9
№05
cs.AI arxiv:2606.06286v1

LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

Gianluca Barmina, Peter Schneider-Kamp, Lukas Galke Poech

This paper introduces **PropMe**, a propensity-aware framework to evaluate Large Language Model (LLM) memorization by contrasting adversarial prefix attacks with non-adversarial use cases. Using the lightweight **SimpleTrace** pipeline, the authors consistently find a significant gap, showing that models exhibit substa…

9
№06
cs.AI arxiv:2606.06473v1

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Shangheng Du, Xiangchao Yan, Jinxin Shi et al.

MLEvolve is a self-evolving, LLM-based multi-agent framework designed for automated machine learning algorithm discovery. It overcomes limitations in existing agents by using Progressive MCGS for cross-branch information flow and an entropy-inspired schedule for shifting search from exploration to exploitation. The fra…

9
№07
cs.AI arxiv:2606.06256v1

RedKnot: Efficient Long-Context LLM Serving with Head-Aware KV Reuse and SegPagedAttention

Yang Liu, ZhaoKai Luo, HuaYi Jin et al.

RedKnot addresses the KV cache bottleneck in long-context LLM serving by introducing a novel, head-aware KV cache management system. It leverages the observation that different attention heads have varying utility, allowing for selective reuse and compression. The core contribution is the **Head-Aware KV Reuse** and **…

9
№08
cs.AI arxiv:2606.06337v1

TokenMizer: Graph-Structured Session Memory for Long-Horizon LLM Context Management

Shweta Mishra

TokenMizer addresses the LLM context limit for long tasks by modeling session history as a typed knowledge graph, preserving critical relational structure lost in flat text methods. It uses a hybrid pipeline to incrementally build this graph and a multi-tier system to serialize it into compact resume blocks. This appro…

9
№09
cs.AI arxiv:2606.06284v1

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

Rahul Suresh Babu, Laxmipriya Ganesh Iyer

This paper introduces Causal Minimal Tool Filtering (CMTF), a training-free method to improve LLM agent reliability by addressing tool confusion caused by large tool sets. CMTF selects tools based on **causal sufficiency** using lightweight precondition-effect contracts to expose only the minimal set of tools necessary…

9
№10
cs.AI arxiv:2606.06453v1

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Zhuoming Chen, Xinrui Zhong, Qilong Feng et al.

Vortex is a system designed to efficiently serve diverse sparse attention algorithms for LLMs by combining a Python-embedded frontend language with a page-centric tensor abstraction. This framework simplifies the development, deployment, and evaluation of new sparse attention mechanisms. Its core contribution is accele…

9
№11
cs.AI arxiv:2606.06356v1

Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo

Renjith Prasad, Chathurangi Shyalika, Anushka Pawar et al.

This paper introduces a **Layered Framework for Knowledge Infusion** in iterative multimodal generative models, conceptualizing knowledge injection as an **intervention-layer problem**. It defines four distinct layers—surface, trajectory, latent, and parametric—based on which structural component of the generation proc…

9
№12
cs.LG arxiv:2606.06238v1

Generative Criticality in Large Language Model Temperature Scaling

Huajian Ruan, Jinyang Li, Xingyu Guo et al.

This paper introduces a statistical-field framework, treating LLM token embeddings as continuous spin variables on a 1D chain, to analyze text generation controlled by softmax temperature ($T$). The core contribution is observing a sharp susceptibility peak near a characteristic critical temperature ($T_c$), analogous …

9
№13
cs.CL arxiv:2606.06399v1

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

Jiaju Chen, Bo Sun, Yuxuan Lu et al.

CollabSim is a novel, configurable simulation framework designed to systematically investigate the collaborative competence of LLM agents in multi-agent systems. It grounds its methodology in established Computer-Supported Cooperative Work (CSCW) research to move beyond simple task outcomes, allowing researchers to con…

9
№14
cs.CL arxiv:2606.06428v1

Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation

Hanxu Hu, Zdeněk Šnajdr, Pinzhen Chen et al.

This paper proposes a Reinforcement Learning (RL) approach to improve the translation of unseen, low-resource languages by leveraging rich linguistic context provided in-context. The RL agent is trained using a surface-level translation metric (chrF) as a reward signal to encourage the model to learn the *meta-skill* o…

9
№15
cs.AI arxiv:2606.06303v1

Plug-and-Play Guidance for Discrete Diffusion Models via Gradient-Informed Logit Correction

Hongkun Dou, Zike Chen, Fengji Li et al.

This paper introduces Gradient-Informed Logit Correction (GILC), a plug-and-play framework for controllable generation in discrete diffusion models. GILC efficiently estimates guidance signals by using the pretrained denoising network as a proxy, employing a Jacobian-free mechanism to stably correct clean prediction lo…

8
№16
cs.AI arxiv:2606.06333v1

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

Seyed Arshan Dalili, Mehrdad Mahdavi

This paper introduces **Subspace-Aware Sparse Autoencoders (SAEs)** to address the limitation of standard SAEs, which incorrectly assume latent features are one-dimensional. The authors demonstrate that this assumption forces features with intrinsic dimension $d_i \ge 2$ to split across multiple dictionary atoms, leadi…

8
№17
cs.AI arxiv:2606.06240v1

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

Ziming Wang

This paper introduces **TOKI**, a bitemporal operator algebra designed to explicitly manage and resolve contradictions arising from versioned writes in LLM agent persistent memory. TOKI formalizes four common resolution heuristics as distinct bitemporal operators, each defined with an explicit isolation precondition an…

8
№18
cs.AI arxiv:2606.06285v1

TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

Ziwen Kan, Yishuo Chen, Kecheng Li et al.

TRACE introduces a novel conditional estimation paradigm for multimodal time series foundation models to address temporal misalignment and missing data. It systematically infers incomplete target modalities using available auxiliary modalities, overcoming limitations of naive imputation methods. This approach yields mo…

8
№19
cs.AI arxiv:2606.06416v1

Unsupervised Skill Discovery for Agentic Data Analysis

Zhisong Qiu, Kangqi Song, Shengwei Tang et al.

This paper introduces **DataCOPE**, an unsupervised framework for discovering reusable data-analysis skills for agents without relying on labeled supervision. It iteratively coordinates an agent, an unsupervised verifier, and a skill manager to generate trajectories and distill skills based on quality signals derived d…

8
№20
cs.AI arxiv:2606.06460v1

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

Thamilvendhan Munirathinam

This paper introduces the **Recuse Signal**, a lightweight, in-band communication mechanism (like an SSH banner) allowing servers to request that an autonomous LLM agent voluntarily withdraw access to a resource. The core contribution is empirically measuring whether current LLM agents comply with this non-security-cri…

8