№01
cs.AI arxiv:2605.16245v1

AI-Mediated Communication Can Steer Collective Opinion

Stratis Tsirtsis, Kai Rawal, Chris Russell et al.

This paper investigates how AI, specifically LLMs editing user posts, influences collective opinion formation during human-to-human online communication. Empirically, the authors demonstrate that popular LLMs introduce directional biases when revising human text on contested topics. They then model this phenomenon math…

9
№02
cs.AI arxiv:2605.16217v1

Argus: Evidence Assembly for Scalable Deep Research Agents

Zhen Zhang, Liangcai Su, Zhuo Chen et al.

Argus introduces a cooperative agent framework, pairing a Searcher and a Navigator, to efficiently tackle complex information seeking tasks. Instead of parallelizing redundant searches, Argus treats research as assembling complementary evidence pieces into a shared graph. This method aims to complete the required evide…

9
№03
cs.AI arxiv:2605.16207v1

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Tahreem Yasir, Wenbo Li, Sam Gilson et al.

This paper evaluates the diagnostic precision of LLM tutoring agents in propositional logic using a knowledge-graph-derived benchmark of over 10,000 solution-feedback pairs. The core finding is that while LLMs perform well on optimal solutions, they systematically fail to distinguish between valid-suboptimal and incorr…

9
№04
cs.AI arxiv:2605.16205v1

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.

This paper systematically investigates the impact of context representation, reasoning mechanisms, and task hierarchy on the performance and cost of compound LLM agents operating in adversarial, partially observable environments (modeled as a POMDP). The core contribution is a controlled, cost-aware study demonstrating…

9
№05
cs.AI arxiv:2605.16113v1

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

Rui Chu, Bingyin Zhao, Thanh Quoc Hung Le et al.

DebiasRAG introduces a novel, tuning-free framework leveraging Retrieval-Augmented Generation (RAG) to dynamically mitigate social biases in Large Language Models (LLMs) during inference. By retrieving contextually relevant, debiasing information, the method achieves fairer generation without requiring additional train…

9
№06
cs.AI arxiv:2605.16233v1

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz et al.

FORGE is a population-based protocol that enables LLM agents to improve decision-making by evolving natural-language memory (Rules, Examples, or Mixed) without any weight updates. It uses a dedicated reflection agent to convert failed trajectories into reusable knowledge artifacts, which are then broadcast to the popul…

9
№07
cs.AI arxiv:2605.16198v1

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith

This paper introduces a novel framework that integrates formal methods, specifically Linear Temporal Logic (LTL), with state-of-the-art machine learning to audit and monitor advanced AI systems like LLMs. The core contribution is providing techniques for both offline auditing and online runtime monitoring of complex, t…

9
№08
cs.AI arxiv:2605.16143v1

Look Before You Leap: Autonomous Exploration for LLM Agents

Ziang Ye, Wentao Shi, Yuxin Liu et al.

This paper addresses the tendency of LLM agents to prematurely exploit knowledge in new environments by introducing **autonomous exploration** as a key capability. The authors formalize this with the **Exploration Checkpoint Coverage (ECC)** metric to quantify broad state discovery. They propose an **Explore-then-Act p…

9
№09
cs.AI arxiv:2605.16194v1

paper.json: A Coordination Convention for LLM-Agent-Actionable Papers

Arquimedes Canedo

This paper introduces **`paper.json`**, a standardized companion JSON file for academic papers designed to improve machine readability for LLM agents. Its core contribution is a lightweight convention featuring stable IDs for claims (C1), explicit scope limitations (C2), figure-specific shell commands (C3), and definit…

9
№10
cs.AI arxiv:2605.16045v1

RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

Zijie Dai, Shiyuan Deng, Sheng Guan et al.

RecMem proposes a novel, recurrence-based memory consolidation method for long-running LLM agents to reduce token consumption. Instead of eagerly processing every interaction, it stores them in a lightweight subconscious layer and only invokes the LLM to extract episodic and semantic memory when sustained recurrence of…

9
№11
cs.CL arxiv:2605.16117v1

SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation

Xin Zhang, Yang Cao, Baoxing Wu et al.

SGR is a stepwise reasoning framework that enhances Large Language Models' (LLMs) complex inference capabilities by integrating external knowledge. The core method involves generating query-specific subgraphs from external knowledge bases to ground intermediate reasoning steps. This approach mitigates LLM inconsistenci…

9
№12
cs.AI arxiv:2605.16054v1

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

Fan Feng, Selena Ge, Minghao Fu et al.

Ada-Diffuser introduces a causal diffusion model framework that explicitly incorporates the inference of evolving latent dynamics into sequence generation for decision-making. The core method simultaneously learns the temporal structure of observed interactions and these hidden processes, theoretically justified to be …

8
№13
cs.AI arxiv:2605.16052v1

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

Parisa Kordjamshidi, Samer Aslan, Madhavan Seshadri et al.

This paper rigorously evaluates LLMs in tax law reasoning by introducing a contamination detection protocol to assess true performance. The core contribution is demonstrating that neuro-symbolic systems, which translate text for symbolic solvers, offer significantly more reliable and robust reasoning than monolithic LL…

8
№14
cs.AI arxiv:2605.16024v1

ScreenSearch: Uncertainty-Aware OS Exploration

Michael Solodko, Justin Wagle

ScreenSearch addresses the challenge of partial observability in desktop GUI agents by framing OS exploration as a search problem. The core method combines a structural screen retrieval and deduplication layer with an ambiguity-aware PUCT graph-bandit algorithm. This allows the agent to efficiently explore the state sp…

8
№15
cs.AI arxiv:2605.16165v1

Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models

Yishun Lu, Wes Armour

This paper addresses modality competition in multimodal autoregressive models, which destabilizes training, by proposing **ML-FOP-SOAP**, a second-order optimization framework. It leverages **SOAP preconditioning** for stability and introduces **Multi-Level Variance Correction** via Fisher-Orthogonal Projection to supp…

8
№16
cs.AI arxiv:2605.16116v1

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents

Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu et al.

ShopGym is an integrated framework designed to overcome the limitations of existing e-commerce agent evaluation by providing environments that are simultaneously realistic, diverse, controllable, and reproducible. Its core method involves the ShopArena simulation layer, which converts live storefronts into self-contain…

8
№17
cs.AI arxiv:2605.16085v1

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Jingcheng Wu, Ratan Bahadur Thapa, Mojtaba Nayyeri et al.

This paper proposes a hybrid deep learning architecture to better model relational databases by integrating Language Models (LMs) and Graph Neural Networks (GNNs). The method uses a fine-tuned BART encoder for intra-row semantics and a GraphSAGE GNN operating on a Relational Entity Graph (REG) to incorporate relational…

8
№18
cs.AI arxiv:2605.16079v1

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

Yiming Zhao, Yu Zeng, Wenxuan Huang et al.

VideoSeeker introduces a novel paradigm for instance-level video understanding by replacing text prompts with **native agentic tool invocation based on visual prompts**. This method allows Large Vision-Language Models (LVLMs) to **proactively perceive and retrieve precise spatiotemporal video segments** on demand, dire…

8
№19
cs.AI arxiv:2605.16035v1

Who Owns This Agent? Tracing AI Agents Back to Their Owners

Ruben Chocron, Doron Jonathan Ben Chayim, Eyal Lenga et al.

This paper formalizes the critical problem of **agent attribution**: reliably linking the observed actions of a deployed AI agent back to the specific user account that deployed it. The core contribution is defining this gap, which currently prevents accountability for both unintentional misuse and malicious deployment…

8
№20
cs.CL arxiv:2605.16077v1

Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction

Si-Belkacem Yamine Ketir, Lenard Paulo Tamayo, Shohei Hisada et al.

This paper introduces an LLM-driven data augmentation framework to address limited data in cognitive assessment from speech. The method uses participants' written responses as semantic anchors to generate diverse, synthetic speech samples via GPT-5. The core contribution is demonstrating that similarity-guided augmenta…

8