Snorkel AICoding agents don’t need to be perfect, they need to recover
Eight frontier models are analyzed for how they recover from errors in agentic coding tasks, showing that recovery, not perfection, is the differentiator and outlining actionable patterns and fixes to boost resilience in automated agents.
PinterestGPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction
GPU-accelerated serving of two-tower models enables lightweight ads engagement prediction.
AWS MLCustomize AI agent browsing with proxies, profiles, and extensions in Amazon Bedrock AgentCore Browser
Guidance on configuring AgentCore Browser for AI agents with proxy routing, persistent browser profiles, and Chrome extensions to enable secure, stateful, enterprise web automation.
Snorkel AIWhat Separates Success from Failure?
Analyzes error patterns and recovery dynamics across eight frontier models on the Agentic Coding benchmark to reveal how resilience, not perfection, separates success from failure.
OpenAIScaling social science research
A concise, technical guide to scaling social science research by applying scalable data collection, analysis, and workflow methods to large datasets.
OpenAIGPT-5.2 derives a new result in theoretical physics
GPT-5.2 derives a novel result in theoretical physics, highlighting AI-assisted approach and its implications for future research.
OpenAIBeyond rate limits: scaling access to Codex and Sora
A practical guide to scaling access to Codex and Sora beyond rate limits, outlining high-throughput API patterns and resilient access strategies.
OpenAIIntroducing Lockdown Mode and Elevated Risk labels in ChatGPT
Technical overview of Lockdown Mode and Elevated Risk labels in ChatGPT, detailing the security controls and risk-aware behavioral changes.
Apple MLA Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation
Cadmus is a small-scale autoregressive program-synthesis system with an integer VM and a DSL, enabling controlled experimentation to study inductive reasoning, training-distribution control, and affordable, transparent model analysis.
CloudflareShedding old code with ecdysis: graceful restarts for Rust services at Cloudflare
A Rust-based exploration of ecdysis, a graceful-restart library that enables zero-downtime upgrades for Cloudflare's high-traffic services by forking and execing a new process while preserving live connections.
Apple MLCompleted Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
Examines transferring optimal global hyperparameters across model sizes through a unified Complete Parameterisation that scopes width, depth, batch size, and training duration, and demonstrates per-module hyperparameter transfer to speed up training of large language models.
Apple MLFaster Rates For Federated Variational Inequalities
Faster convergence in federated stochastic variational inequalities is achieved by refining Local Extra SGD guarantees and introducing the Local Inexact Proximal Point Algorithm with Extra Step (LIPPAX) to mitigate client drift and extend guarantees to composite VIs.