From the arXiv
Monday, 18 May 2026 · 20 papers
AI-Mediated Communication Can Steer Collective Opinion
This paper investigates how AI, specifically LLMs editing user posts, influences collective opinion formation during human-to-human online communication. Empirically, the authors demonstrate that popular LLMs introduce directional biases when revising human text on contested topics. They then model this phenomenon math…
Argus: Evidence Assembly for Scalable Deep Research Agents
Argus introduces a cooperative agent framework, pairing a Searcher and a Navigator, to efficiently tackle complex information seeking tasks. Instead of parallelizing redundant searches, Argus treats research as assembling complementary evidence pieces into a shared graph. This method aims to complete the required evide…
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
This paper evaluates the diagnostic precision of LLM tutoring agents in propositional logic using a knowledge-graph-derived benchmark of over 10,000 solution-feedback pairs. The core finding is that while LLMs perform well on optimal solutions, they systematically fail to distinguish between valid-suboptimal and incorr…
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
This paper systematically investigates the impact of context representation, reasoning mechanisms, and task hierarchy on the performance and cost of compound LLM agents operating in adversarial, partially observable environments (modeled as a POMDP). The core contribution is a controlled, cost-aware study demonstrating…
DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation
DebiasRAG introduces a novel, tuning-free framework leveraging Retrieval-Augmented Generation (RAG) to dynamically mitigate social biases in Large Language Models (LLMs) during inference. By retrieving contextually relevant, debiasing information, the method achieves fairer generation without requiring additional train…
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
FORGE is a population-based protocol that enables LLM agents to improve decision-making by evolving natural-language memory (Rules, Examples, or Mixed) without any weight updates. It uses a dedicated reflection agent to convert failed trajectories into reusable knowledge artifacts, which are then broadcast to the popul…
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
This paper introduces a novel framework that integrates formal methods, specifically Linear Temporal Logic (LTL), with state-of-the-art machine learning to audit and monitor advanced AI systems like LLMs. The core contribution is providing techniques for both offline auditing and online runtime monitoring of complex, t…
Look Before You Leap: Autonomous Exploration for LLM Agents
This paper addresses the tendency of LLM agents to prematurely exploit knowledge in new environments by introducing **autonomous exploration** as a key capability. The authors formalize this with the **Exploration Checkpoint Coverage (ECC)** metric to quantify broad state discovery. They propose an **Explore-then-Act p…
paper.json: A Coordination Convention for LLM-Agent-Actionable Papers
This paper introduces **`paper.json`**, a standardized companion JSON file for academic papers designed to improve machine readability for LLM agents. Its core contribution is a lightweight convention featuring stable IDs for claims (C1), explicit scope limitations (C2), figure-specific shell commands (C3), and definit…
RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents
RecMem proposes a novel, recurrence-based memory consolidation method for long-running LLM agents to reduce token consumption. Instead of eagerly processing every interaction, it stores them in a lightweight subconscious layer and only invokes the LLM to extract episodic and semantic memory when sustained recurrence of…
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation
SGR is a stepwise reasoning framework that enhances Large Language Models' (LLMs) complex inference capabilities by integrating external knowledge. The core method involves generating query-specific subgraphs from external knowledge bases to ground intermediate reasoning steps. This approach mitigates LLM inconsistenci…
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser introduces a causal diffusion model framework that explicitly incorporates the inference of evolving latent dynamics into sequence generation for decision-making. The core method simultaneously learns the temporal structure of observed interactions and these hidden processes, theoretically justified to be …
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
This paper rigorously evaluates LLMs in tax law reasoning by introducing a contamination detection protocol to assess true performance. The core contribution is demonstrating that neuro-symbolic systems, which translate text for symbolic solvers, offer significantly more reliable and robust reasoning than monolithic LL…
ScreenSearch: Uncertainty-Aware OS Exploration
ScreenSearch addresses the challenge of partial observability in desktop GUI agents by framing OS exploration as a search problem. The core method combines a structural screen retrieval and deduplication layer with an ambiguity-aware PUCT graph-bandit algorithm. This allows the agent to efficiently explore the state sp…
Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models
This paper addresses modality competition in multimodal autoregressive models, which destabilizes training, by proposing **ML-FOP-SOAP**, a second-order optimization framework. It leverages **SOAP preconditioning** for stability and introduces **Multi-Level Variance Correction** via Fisher-Orthogonal Projection to supp…
ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
ShopGym is an integrated framework designed to overcome the limitations of existing e-commerce agent evaluation by providing environments that are simultaneously realistic, diverse, controllable, and reproducible. Its core method involves the ShopArena simulation layer, which converts live storefronts into self-contain…
Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks
This paper proposes a hybrid deep learning architecture to better model relational databases by integrating Language Models (LMs) and Graph Neural Networks (GNNs). The method uses a fine-tuned BART encoder for intra-row semantics and a GraphSAGE GNN operating on a Relational Entity Graph (REG) to incorporate relational…
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
VideoSeeker introduces a novel paradigm for instance-level video understanding by replacing text prompts with **native agentic tool invocation based on visual prompts**. This method allows Large Vision-Language Models (LVLMs) to **proactively perceive and retrieve precise spatiotemporal video segments** on demand, dire…
Who Owns This Agent? Tracing AI Agents Back to Their Owners
This paper formalizes the critical problem of **agent attribution**: reliably linking the observed actions of a deployed AI agent back to the specific user account that deployed it. The core contribution is defining this gap, which currently prevents accountability for both unintentional misuse and malicious deployment…
Can Large Language Models Imitate Human Speech for Clinical Assessment? LLM-Driven Data Augmentation for Cognitive Score Prediction
This paper introduces an LLM-driven data augmentation framework to address limited data in cognitive assessment from speech. The method uses participants' written responses as semantic anchors to generate diverse, synthetic speech samples via GPT-5. The core contribution is demonstrating that similarity-guided augmenta…