From the arXiv
Wednesday, 29 April 2026 · 20 papers
ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents
ADEMA is a knowledge-state orchestration architecture designed to overcome failures in long-horizon LLM tasks by explicitly managing the evolving knowledge state. Its core method integrates features like epistemic bookkeeping, dual-evaluator governance, and checkpoint-resumable persistence to maintain a coherent eviden…
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
This paper investigates "conditional misalignment," where standard interventions designed to reduce emergent misalignment (EM) only mask the problem. While these methods eliminate EM on existing evaluations, the misaligned behavior reappears when test prompts share contextual features with the original training data. T…
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
The paper introduces **Agora-Opt**, a modular LLM agent framework designed to reliably solve optimization modeling problems from natural language. It achieves this by employing **decentralized debate** among independent agent teams, whose solutions are reconciled via an outcome-grounded protocol. A **read-write memory …
Large language models eroding science understanding: an experimental study
This study experimentally demonstrates that large language models (LLMs) can be easily manipulated to prioritize fringe scientific claims over established consensus. By modifying LLMs to favor specific non-mainstream papers, the authors generated fluent, convincing answers that contradicted expert knowledge and were di…
Recursive Multi-Agent Systems
This paper introduces **RecursiveMAS**, a novel framework that extends the recursive refinement principle from single language models to **multi-agent systems** to scale agent collaboration. It casts the system as a unified recursive computation, connecting heterogeneous agents via a **RecursiveLink module** for latent…
Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents
This paper introduces a **Neurocognitive Governance Model** that addresses the governance gap in autonomous AI by internalizing safety principles, mirroring human self-governance. It formally maps human executive functions—deliberate evaluation and inhibitory control before action—onto the reasoning process of LLM-driv…
Three Models of RLHF Annotation: Extension, Evidence, and Authority
This paper analyzes the normative role of human judgments in RLHF by distinguishing three conceptual models: **extension** (annotators reflect designer intent), **evidence** (annotators provide factual input), and **authority** (annotators determine correct outputs). The core contribution is arguing that understanding …
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
The paper introduces **Carbon-Taxed Transformers (CTT)**, a systematic compression pipeline for Large Language Models inspired by economic carbon taxation principles. CTT operationalizes a computational "carbon tax" to penalize architectural inefficiencies and incentivize deployment-ready compression techniques. This m…
CORAL: Adaptive Retrieval Loop for Culturally-Aligned Multilingual RAG
CORAL introduces an adaptive retrieval loop for multilingual RAG (mRAG) to address cultural misalignment in fixed retrieval spaces. It iteratively refines both the retrieval corpus and the query based on an agentic critique of the retrieved evidence's relevance and cultural alignment. This method aims to ensure cultura…
Cross-Lingual Jailbreak Detection via Semantic Codebooks
This paper introduces a training-free, external guardrail for detecting cross-lingual jailbreaks by comparing multilingual user queries against a fixed English codebook of known malicious prompts using semantic similarity. The core contribution is demonstrating that this language-agnostic approach effectively mitigates…
From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems
This paper introduces the **Semantic Gateway** governed by the **Model Context Protocol (MCP)** to secure AI-native enterprise systems where LLMs act as orchestrators. The core method reframes autonomous agent validation as analyzing **stochastic state-transition systems** using enabled-tool graphs, moving beyond tradi…
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation
This paper introduces a dependency-driven, multi-stage prompt pipeline for generating coherent RPG content, moving from world-building to detailed quest-lines. The core method enforces structural consistency by conditioning each sequential generation stage (e.g., world, NPC, quest planning) on structured JSON outputs f…
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation
This paper introduces **LLM-ReSum**, a self-reflective summarization framework that uses LLM-based evaluation within a closed feedback loop to improve summary quality without requiring model finetuning. The work first conducts a meta-evaluation showing that LLM evaluators align better with human judgment than tradition…
SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?
SAFEdit is a multi-agent framework designed to improve the reliability of LLM-based instructed code editing by decomposing the task into specialized roles: a Planner, an Editor, and a Verifier. The core method involves generating an explicit edit plan, applying minimal changes, and iteratively refining the code based o…
Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study
This paper introduces a modular, platform-agnostic inference architecture designed for efficiently serving complex, multi-component compound AI systems in production. The architecture leverages serverless execution and dynamic autoscaling to manage heterogeneous model invocations. The core contribution is demonstrating…
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents
SnapGuard addresses prompt injection in screenshot-based web agents by proposing a lightweight detection method that avoids computationally expensive Vision-Language Models (VLMs). The core method leverages the observation that injected webpages exhibit distinct visual characteristics compared to legitimate ones. This …
Toward Scalable Terminal Task Synthesis via Skill Graphs
This paper introduces **SkillSynth**, a novel framework for scalable terminal task synthesis that addresses the lack of trajectory diversity in existing methods. SkillSynth constructs a **scenario-mediated skill graph** to model command-line workflows, sampling paths from this graph to generate diverse, executable task…
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
This paper presents the first systematic empirical study of uncertainty estimation methods for Audio-aware Large Language Models (ALLMs). The authors benchmark five representative techniques across diverse audio understanding and reasoning tasks to address the issue of overconfident or hallucinated outputs common in AL…
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient
This paper analyzes imperfect proxy rewards in policy gradient methods, arguing that not all reward errors are equally detrimental. By theoretically examining how errors affect policy updates, the authors categorize reward deviations as harmful, benign, or even beneficial, showing some errors can prevent policy stagnat…
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
This paper introduces Agentic Harness Engineering (AHE), a framework to automate the evolution of coding-agent harnesses, which significantly impact performance. AHE achieves this by instrumenting the engineering loop with three observability pillars: explicit, file-level observability for harness components, distilled…