The Morning
US Stocks
weekend or us market holiday
Generated from public market data and news for research and education. Not financial advice; data may be delayed, incomplete, or wrong.
From the arXiv
AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems
his paper proposes a comprehensive AI assurance strategy for enterprise AI systems, shifting focus from classical verification to continuous risk reduction. The core method involves treating evaluation as a core engineering discipline, structured around a new AI Failure Taxonomy and a five-layer AI Assurance Pyramid. The contribution is a practical framework to manage the unique, probabilistic risks introduced by LLM-based systems in enterprise settings.

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment
This paper introduces Latent Adversarial Robustification (LAR) to improve the generality of intrinsic multimodal knowledge editing in MLLMs. LAR generates adversarial, semantically coherent variants in the latent space to expose fragile editing regions, ensuri…
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
DiLaDiff addresses the token correlation issue in diffusion language models by introducing a continuous, semantically rich latent space learned via an autoencoder. This latent space guides a diffusion model, and a subsequent consistency model distills this pro…


From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
This paper systematically studies the full lifecycle of model-generated agent skills, spanning experience generation, extraction, and consumption. The core contribution is a utility-grounded evaluation framework applied across five diverse domains to determine…
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
This paper demonstrates that geopolitical bias in LLMs primarily originates during the **post-training (fine-tuning/alignment) phase**, contrary to common assumptions about pre-training data. The authors found that models consistently develop biases favoring t…

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
This paper introduces the **Shannon Scaling Law**, modeling LLM training as information transmission over a noisy channel, mapping parameters to bandwidth and data to signal power.…
MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection
MemAudit is a post-hoc auditing framework designed to identify malicious memories injected into LLM agents' persistent storage. It combines a counterfactual memory influence score …
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
SkillOpt introduces a novel method to systematically optimize agent skills by treating the skill itself as an external, trainable state, analogous to weight optimization in deep le…
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
This paper introduces **Quantitative Goal Persistence (QGP)**, a metric to measure whether long-horizon LLM agents continue working until an external verifier confirms a specific c…
Strong Teacher Not Needed? On Distillation in LLM Pretraining
This paper investigates the conventional assumption that stronger teachers are necessary for effective knowledge distillation during Large Language Model (LLM) pretraining. The aut…
The Town Square
Memory components now account for almost two-thirds of the total cost of AI chips, significantly impacting overall hardware expenses.
Workshops
This repository provides open-source plugins designed for knowledge workers to enhance their productivity when using the Claude AI coworker.
This repository provides a comprehensive, hands-on guide to learning, building, and deploying AI engineering projects from the ground up.