№01
cs.AI arxiv:2606.13608v1

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

Xiaoyuan Liu, Jianhong Tu, Yuqi Chen et al.

The paper introduces Agentified Agent Assessment (AAA), a novel framework where evaluation is conducted by judge agents interacting with participants via standardized protocols (A2A and MCP). This approach unifies the assessment interface, decoupling evaluation logic from agent implementation. AgentBeats is the concret…

9
№02
cs.AI arxiv:2606.13669v1

Agents-K1: Towards Agent-native Knowledge Orchestration

Zongsheng Cao, Bihao Zhan, Jinxin Shi et al.

Agents-K1 introduces an end-to-end pipeline to transform raw scientific documents into agent-native knowledge graphs, addressing the limitations of existing LLM agents in scientific knowledge orchestration. Its core method involves a multimodal parser capturing detailed entities, evidence, and relations across the full…

9
№03
cs.AI arxiv:2606.13572v1

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Tanmoy Kanti Halder, Akash Ghosh, Subhadip Baidya et al.

ArogyaSutra is a multi-agent framework designed to enhance multimodal medical reasoning in Indic languages. It leverages a novel actor-critic architecture with dual-memory mechanisms and tool grounding to perform step-wise reasoning on complex medical queries involving text and images. The framework is supported by Aro…

9
№04
cs.AI arxiv:2606.13361v1

Can I Buy Your KV Cache?

Luoyuan Zhang

This paper proposes a simple yet impactful method to eliminate redundant computation in large language models: **precomputing and selling the Key-Value (KV) cache for documents.** By allowing agents to buy and load a precomputed cache instead of re-running the expensive prefill step, the authors achieve significant com…

9
№05
cs.AI arxiv:2606.13662v1

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Amy Xin, Jiening Siow, Junjie Wang et al.

The paper introduces **EurekAgent**, an agent system arguing that the bottleneck for autonomous scientific discovery is shifting to **agent environment engineering**. EurekAgent focuses on designing the environment—including resources, constraints, and interfaces—to amplify desired agent behaviors (like exploration and…

9
№06
cs.AI arxiv:2606.13607v1

Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

Zach Studdiford, Gary Lupyan

This paper challenges the notion that human reasoning relies on abstract world models while LLMs only perform pattern matching. By testing both humans and LLMs on everyday common-sense reasoning, the authors found similar error patterns in both groups. They further demonstrated that specific LLM attention heads impleme…

9
№07
cs.AI arxiv:2606.13598v1

Reward Modeling for Multi-Agent Orchestration

King Yeung Tsang, Zihao Zhao, Vishal Venkataramani et al.

The paper introduces **Orchestration Reward Modeling (OrchRM)**, a self-supervised framework to evaluate the quality of multi-agent orchestration without requiring human labels. OrchRM constructs win-lose pairs from intermediate execution artifacts to train a Bradley-Terry reward model, enabling efficient, reward-guide…

9
№08
cs.CL arxiv:2606.13681v1

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Jundong Xu, Qingchuan Li, Jiaying Wu et al.

EvoArena is a novel benchmark suite designed to evaluate LLM agents in dynamic environments by modeling progressive changes across terminal, software, and social domains. The core contribution is the introduction of EvoMem, a patch-based memory paradigm that explicitly tracks and structures memory evolution as update h…

9
№09
cs.CL arxiv:2606.13663v1

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Yaxin Du, Yifan Zhou, Yujie Ge et al.

HyperTool addresses the execution-granularity mismatch in tool-augmented agents by introducing a unified, executable interface that allows models to invoke complex, multi-step tool workflows within a single outer call. This "folding" of deterministic subroutines reduces the number of model-visible decisions, saving con…

9
№10
cs.CL arxiv:2606.13643v1

Recursive Agent Harnesses

Elias Lumer, Sahil Sen, Kevin Paul et al.

The paper introduces the **Recursive Agent Harness (RAH)**, framing it as a code-first extension to model recursion, where the recursive unit is a full agent harness with tools and planning, not just a model call. RAH leverages a parent agent to generate and execute scripts that spawn parallel subagent harnesses for fi…

9
№11
cs.AI arxiv:2606.13566v1

A Three-Layer Framework for AI in Scientific Discovery

Guojun Liao

This paper introduces a **three-layer framework** for AI in scientific discovery, arguing that the crucial, yet underdeveloped, layer is **Layer 2: model formation through qualitative reasoning**. This layer involves recognizing the structural inadequacy of existing frameworks and understanding the problem within a bro…

8
№12
cs.AI arxiv:2606.13544v1

Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

Soumyajit Mitra, Prabhat Pandey, Abhinav Jain et al.

This paper introduces **ModeratorLM**, a streaming speech large language model that adapts turn-taking behavior in multi-party conversations by conditioning it on an explicitly assigned conversational role. The core contribution is demonstrating that role-conditioning, especially enhanced with chain-of-thought reasonin…

8
№13
cs.AI arxiv:2606.13392v1

MiniMax Sparse Attention

Xunhao Lai, Weiqi Xu, Yufeng Yang et al.

MiniMax Sparse Attention (MSA) addresses the quadratic cost of long-context attention by integrating a lightweight Index Branch with Grouped Query Attention (GQA). This branch independently scores and selects a Top-k subset of key-value blocks for each GQA group, allowing the Main Branch to perform exact attention only…

8
№14
cs.AI arxiv:2606.13405v1

Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda

Alexander Rombach, Chantale Lauer, Nijat Mehdiyev

This paper proposes **compliance-by-construction** as a core architectural paradigm for LLM agents operating in regulated industries, integrating existing symbolic structures (like regulations and process models) directly into the agent's decision-making framework. The core contribution is advocating for this structura…

8
№15
cs.AI arxiv:2606.13449v1

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

Ali Arabat, Mohammed Sayagh

This paper investigates the impact of providing explicit instruction files on the performance of AI agents generating pull requests (Agentic-PRs). Analyzing 15,549 agentic PRs, the authors compare project performance (merge rate, complexity, merge time) before and after instruction file creation. The core finding is th…

8
№16
cs.AI arxiv:2606.13468v1

Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset

Mahmoud Abujadallah, Ali Arabat, Mohammed Sayagh

This paper investigates why AI-generated code fixes in pull requests are frequently rejected, using a representative sample from the AIDev dataset. The core method involves a qualitative study followed by quantitative analysis to categorize the rejection reasons. The main contribution is the identification of 14 distin…

8
№17
cs.AI arxiv:2606.13385v1

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

Zihao Wang, Yiming Li, Yutong Wu et al.

This paper introduces **StakeBench**, a novel benchmark for evaluating prompt injection attacks against web agents from a **stakeholder-centric** perspective. Unlike existing attack-centric methods, StakeBench systematically categorizes and attributes the resulting harm based on which specific stakeholder (e.g., user, …

8
№18
cs.AI arxiv:2606.13441v1

Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models

Joseph Keshet

This paper argues that Large Language Models (LLMs) do not possess the necessary agency for moral responsibility. The authors contend that genuine moral responsibility requires commitment-bearing agency grounded in *intrinsic* intentionality and self-attributed action, which LLMs lack. Their operation is purely probabi…

8
№19
cs.LG arxiv:2606.13565v1

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang, Yuchen Zhu, Molei Tao et al.

A2D2 introduces a unified framework for reward-guided fine-tuning of any-length discrete diffusion models by jointly optimizing insertion and unmasking policies. The core contribution is deriving the Radon-Nikodym derivative for the joint path measure, enabling theoretically guaranteed convergence to the reward-tilted …

8
№20
cs.LG arxiv:2606.13426v1

Accelerating Speculative Diffusions via Block Verification

Alexander Soen, Hisham Husain, Valentin De Bortoli et al.

This paper introduces a novel method to efficiently adapt speculative decoding, traditionally used in LLMs, to continuous diffusion models by enabling block verification. This adaptation significantly improves the acceptance rate of draft predictions compared to existing diffusion acceleration techniques. The authors a…

8