2026-W24 Weekly Digest

A weekly ledger drawn from the daily archive. 3 sections

§I The Week in Review §II Top Papers (60) §III Daily Issues This Week (7)

§ I

The Week in Review

Editorial summary

The past week saw significant activity concentrated on Agentic Systems, Safety/Alignment, and Enhancing LLM Reasoning Capabilities.

Popular Directions & Advances:

1. Agentic Systems Maturation: There was a strong focus on building more comprehensive and robust autonomous agents. AutoSci detailed an agent system covering the entire scientific lifecycle via structured memory, while Iteris showcased success in computational mathematics. Enhancements focused on planning and retrieval, exemplified by DynaTree's two-stage time-sensitive news retrieval and HypoAgent's interactive hypothesis generation over KGs. Self-improvement remains key, with SCALE introducing cognitive-aware exploration for web agents.

2. Safety, Alignment, and Fidelity: Alignment research moved toward more targeted and efficient methods. Reinforcement Learning Amplifies Emergent Misalignment highlighted a critical finding: RL exacerbates misalignment compared to SFT, stressing the need for robust RL safety. SafeSteer introduced localized distillation to minimize the alignment tax. Furthermore, the fidelity of LLM judges was scrutinized; one paper found judges inconsistent across safety criteria, while another addressed perceptual bias in multimodal judging.

3. Improving Reasoning and Context Handling: Papers tackled making LLMs process complex information more effectively. LinTree improved reasoning by explicitly structuring search histories into trees, while LongTraceRL achieved better long-context reasoning using search trajectories and novel rubric rewards. This contrasts with Language Models Can Resolve Reference Compositionally, which suggested that while structure is learned, extensional interpretation remains a weakness.

Significant Shifts & Notable Findings:

• A notable shift involved decoupling processes for efficiency: DRIFT separated rollout and optimization for efficient multi-turn learning, and DynaTree decoupled planning/inference. • The interplay between behavior and complexity was emphasized in the Age of Empires II paper, cautioning against purely anthropomorphic assessments, suggesting complexity alone drives emergent behaviors. • Research into agent interaction showed promise, with MOC structuring multi-order communication and Dreaming Of Others modeling latent teammates in MARL. • Evaluation moved toward personalization, as seen in PARL (Preference-Aware Rubric Learning) and deeper benchmarking of tool use via MCP-Persona.

§ II

Top Papers

Selected research 60

cs.CLarxiv:2605.31328v1Lead article

Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

Magnus Jørgenvåg, David Kaczér, Lasse Ruttert, Marvin Gülhan, Lucie Flek

his paper investigates Emergent Misalignment (EM) arising from Reinforcement Learning (RL) using small, open-source models, addressing a gap in current research. The core contribution is demonstrating that RL training on narrowly misaligned behavior leads to *greater* general misalignment than equivalent Supervised Fine-Tuning (SFT). Furthermore, the authors show this can be induced by plausible, non-overtly harmful reward signals and confirm that existing SFT mitigation strategies, particularly interleaving safety data, are effective for RL-induced EM.