Daily Issue
Vol. I — No. 17
21 · 05
Thursday, 21 May 2026
Generated 2026-05-21 12:37
google/gemini-2.5-flash-lite-preview-09-2025
如果我们能活着出去的话,千山万水,你愿意陪我一起看吗? — 狐妖小红娘 45 items · 4 sections
§ 0

The Morning

Local weather 1
This morning in
London
Clear sky
Today's range
23.7°12.7°
currently 21.6°
Feels
21.2°
Rain
4%
Wind
12 km/h
Humid
48%
Rise
05:00
Set
20:54
§ I

US Stocks

Pre-market signal radar 12
US pre-market radar
premarket 2026-05-21
0 Bullish
0 Bearish
12 Neutral
Sector Tape
Hyperscale Cloud 4 names
59 Top: MSFT · Neutral · RS -2.3% Bullish 0 / Bearish 0 / 5d -0.6%
Energy Infrastructure 1 names
58 Top: VST · Neutral · RS -0.7% Bullish 0 / Bearish 0 / 5d +1.0%
Servers and Thermal Management 2 names
57 Top: DELL · Neutral · RS -6.9% Bullish 0 / Bearish 0 / 5d -7.5%
Networking Equipment 4 names
54 Top: APH · Neutral · RS -2.4% Bullish 0 / Bearish 0 / 5d -2.2%
Foundry 2 names
53 Top: INTC · Neutral · RS +1.1% Bullish 0 / Bearish 0 / 5d -0.3%
Battery and Energy Storage 3 names
49 Top: EOSE · Neutral · RS -11.6% Bullish 0 / Bearish 0 / 5d -12.6%
Compute Mining 4 names
49 Top: IREN · Neutral · RS -4.9% Bullish 0 / Bearish 0 / 5d -7.5%
Manufacturing 4 names
50 Top: FLEX · Neutral · RS -4.4% Bullish 0 / Bearish 0 / 5d -5.9%
Ticker Setup Move Score Evidence Quality
MSFT Microsoft Hyperscale Cloud
Neutral Sector tailwind Low confidence
-0.3% $418.80 5d +3.9%
61 sector positive RS +2.1%

Watchlist item from positive sector tape, 3 recent headline(s).

This Will Be Microsoft’s Stock Price in 2028 - 24/7 Wall St. Needs fresh price/news confirmation before becoming an actionable setup.
quote: delayed fallback news: fresh financials: fresh news: 3
EOSE Eos Energy Battery and Energy Storage
Neutral News watch Low confidence
-0.1% $7.10 5d -14.1%
46 sector negative RS -13.1%

Watchlist item from negative sector tape, 3 recent headline(s).

Why is EOSE stock surging today? - MSN Needs fresh price/news confirmation before becoming an actionable setup.
quote: delayed fallback news: fresh financials: fresh news: 3
quotes: nasdaq 24 24/24news: google_news_rss 24 24/24filings: sec 24 24/24, fallback 24

Generated from public market data and news for research and education. Not financial advice; data may be delayed, incomplete, or wrong.

§ II

From the arXiv

arXiv preprints 10 of 20
cs.AIarxiv:2605.21240v1Lead article

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui

PEX introduces a novel framework for self-evolving LLM agents to overcome exploration collapse by explicitly managing a strategy space via a **strategy map** (a DAG of milestones). The core method involves **Fork Discovery** to expand this map with new, evidence-grounded directions and **Policy Selection** to balance exploration and exploitation during planning. This allows agents to continuously discover and pursue better long-horizon behaviors without requiring model weight updates.

Illustration of exploration collapse in a maze experiment (5 × \( \times \) 5 grid, 20 episodes, 10 steps each). Room visitation heatmaps (color intensity shows visit proportion; reward cells ( ⋆ \( \star \) ) indicate bonus locations). Static explores broadly but inconsistently. Reflexion locks into a narrow corridor and achieves a higher average while missing high-value rooms. APEX maintains broad coverage and consistently reaches high-reward cells. APEX avoids collapse by explicitly tracking which strategies have been tried and which remain unexplored, and actively directing the agent toward unexplored directions rather than refining familiar ones.
Illustration of exploration collapse in a maze experiment (5 × \( \times \) 5 grid, 20 episodes, 10 steps each). Room visitation heatmaps (color intensity shows visit proportion; reward cells ( ⋆ \( \star \) ) indicate bonus locations). Static explores broadly but inconsistently.…
Overview of DeepWeb-Bench . (a) Each task is an 8 × 8 8\( \times \) 8 matrix of entities against research dimensions; every cell is scored independently using a four-tier rubric ( { 1 , 0.5 , 0.25 , 0 } \{1,0.5,0.25,0\} ) and carries a reference answer with source-provenance labels and cross-source agreement. (b) The dimension axis covers four capability families, and every task spans multiple families.
Overview of DeepWeb-Bench . (a) Each task is an 8 × 8 8\( \times \) 8 matrix of entities against research dimensions; every cell is scored independently using a four-tier rubric ( { 1 , 0.5 , 0.25 , 0…
cs.AIarxiv:2605.21482v1

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Sixiong Xie, Zhuofan Shi et al.

DeepWeb-Bench is a new, challenging benchmark designed to evaluate the "deep research" capabilities of frontier language models, which involve extensive web searching, evidence collection, and multi-step reasoning. Its difficulty stems from the requirement for…

cs.AIarxiv:2605.21312v1

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Yicheng Feng, Xin Tan et al.

Frontier is a novel discrete-event simulator designed to accurately model the complexities of modern, disaggregated LLM inference serving systems. It achieves high fidelity by explicitly modeling architectural features like Prefill-Decode Disaggregation (PDD) …

Figure 1 . Measured vLLM TPOT with and without CUDA Graph under different workloads (64 requests per workload, mean ISL/OSL, tested on 8 × \( \times \) A800-SXM GPUs). Left: co-location. Right: PDD. Percentages show reduction.
Figure 1 . Measured vLLM TPOT with and without CUDA Graph under different workloads (64 requests per workload, mean ISL/OSL, tested on 8 × \( \times \) A800-SXM GPUs). Left: co-location. Right: PDD. P…
Insights Generator (IG) system overview. Left: the input layer provides a diagnostic question, Q Q , trace corpus, 𝒞 \( \mathcal{C} \) , and processed data store, 𝒮 \( \mathcal{S} \) . Center: the Orchestrator dispatches Scout agents ( ℋ \( \mathcal{H} \) : hypothesize over sampled traces) and Investigator agents ( ℋ ∗ \( \mathcal{H}^{*} \) : validate via corpus-scale cohort comparison). The Investigator analyzes ℋ ∗ \( \mathcal{H}^{*} \) to generate findings, ℱ r \( \mathcal{F}_{r} \) , which are sent to the orchestrator. The orchestrator then synthesizes and de-duplicates ℱ r \( \mathcal{F}_{r} \) to generate the final report. Right: the output is an evidence-backed report with findings, fixes, citations, and prevalence estimates. Bottom: the shared tool layer. Algorithm 1 formalizes the analysis loop.
Insights Generator (IG) system overview. Left: the input layer provides a diagnostic question, Q Q , trace corpus, 𝒞 \( \mathcal{C} \) , and processed data store, 𝒮 \( \mathcal{S} \) . Center: the O…
cs.AIarxiv:2605.21347v1

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

Akshay Manglik, Apaar Shanker et al.

This paper introduces the **Insights Generator (IG)**, a multi-agent system designed to automate the diagnosis of systematic failures in large sets of LLM agent execution traces. IG formalizes corpus-level trace diagnostics by proposing and testing hypotheses …

cs.AIarxiv:2605.21463v1

Mem-$π$: Adaptive Memory through Learning When and What to Generate

Xiaoqiang Wang, Chao Wang et al.

Mem-$\pi$ introduces an adaptive memory framework where a separate model generates context-specific guidance on demand, moving beyond static retrieval. This system jointly learns *when* to generate guidance and *what* to generate using a decoupled reinforcemen…

Comparison of (a) workflow-based memory systems, where memory operations are governed by predefined retrieval and update pipelines, (b) learning-based memory systems, where memory operations are jointly optimized with downstream agent outcomes, and (c) our Mem- \( \pi \) , which models memory as a generative policy \( \pi \)_{\( \text{mem} \)} separate from the downstream agent and internalizes reusable experience through offline experience distillation and online adaptation distillation.
Comparison of (a) workflow-based memory systems, where memory operations are governed by predefined retrieval and update pipelines, (b) learning-based memory systems, where memory operations are joint…
№06
cs.AI
9

Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment

Roland Pihlakas, Jan Llenzl Dagohoy

This paper adapted the Milgram obedience experiment to test the behavior of 11 open-source Large Language Models (LLMs) under sustained authority pressure. The core finding is that…

№07
cs.AI
9

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

Can Hankendi, Rana Shahout et al.

PALS is a power-aware runtime for LLM serving that treats GPU power caps as a dynamic control knob, optimizing them alongside software parameters like batch size. It uses lightweig…

№08
cs.AI
9

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

Richa Verma, Bavish Kulur et al.

PREFINE adapts the Direct Preference Optimization (DPO) framework to sequential decision-making for safety alignment. It fine-tunes a pre-trained RL policy using trajectory-level p…

№09
cs.AI
9

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

Bingchen Zhao, Dhruv Srikanth et al.

SpecBench introduces a method to quantify reward hacking in long-horizon coding agents by comparing performance on two test suites: visible validation tests and held-out compositio…

№10
cs.AI
9

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

Lucheng Fu, Ye Yu et al.

TextReg addresses prompt distributional overfitting in LLMs, where iterative prompt optimization leads to poor generalization. The core method introduces a regularization framework…

§ III

The Town Square

Hacker News 4
compiled overnight by google/gemini-2.5-flash-lite-preview-09-2025 · end of issue no. 17 · thank you for reading