Apple MLClosing the Gap Between Text and Speech Understanding in LLMs
An analysis of the text-speech understanding gap in LLMs, presenting SALAD (Sample-efficient Alignment with Learning through Active selection and cross-modal Distillation) as a data-efficient cross-modal distillation approach to improve alignment between speech and text in 3B/7B LLMs, with implications for multimodal decoding and streaming TTS as seen in Visatronic and SpeakStream.
Apple MLdepyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
depyf demystifies the PyTorch 2.x compiler by decompiling bytecode back into source code and exposing in-memory objects to on-disk sources for line-by-line debugging with a lightweight two-context-manager workflow.
OpenAIArvind KC appointed Chief People Officer
An examination of Arvind KC's appointment as Chief People Officer and its implications for HR technology adoption, people analytics, and data-driven talent strategy.
MetaRCCLX: Innovating GPU communications on AMD platforms
RCCLX is the open-source AMD-optimized RCCL backend integrated with Torchcomms, introducing Direct Data Access (DDA) and Low-Precision Collectives to accelerate AI training and inference on AMD GPUs, with CTran integration enabling GPU-resident AllToAllvDynamic.
CloudflareHow we rebuilt Next.js with AI in one week
A one-week, AI-assisted reimplementation of Next.js as vinext on Vite, deploying to Cloudflare Workers with faster builds and smaller client bundles.
Apple MLAMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Explores AMUSE, a multimodal audio-visual benchmark for agentic multi-speaker understanding, and RAFT, a data-efficient alignment framework that enhances agentic reasoning via reward-based optimization and intrinsic self-evaluation in multimodal models.
Academic Publications & Airbnb Tech: 2025 Year in Review
A concise technical review of how Airbnb's engineering innovations intersect with academic publications to define the 2025 year-in-review in tech.
AWS MLBuild an intelligent photo search using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock
Build a scalable, serverless photo-search system that combines Amazon Rekognition for face and object detection, Amazon Neptune for relationship graphs, and Amazon Bedrock for contextual captioning to enable natural-language, semantic search across large image collections.
PinterestPiqama: Pinterest Quota Management Ecosystem
A technical overview of Piqama, an ecosystem for Pinterest quota management.
Jane StreetCan you reverse engineer our neural network?
A concise deep-dive into reverse-engineering a handcrafted neural network puzzle using mechanistic interpretability and constraint solving (linear/integer programming and SAT) to expose an MD5-like computation encoded in its layers.
Apple MLBeyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
Examines HTML-to-Text extraction for LLM pretraining and shows that unioning multiple extractors increases token yield by up to 71% while enhancing coverage for structured content such as tables and code blocks without compromising benchmark performance.
Apple MLThe Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
A technical synthesis of Chain-of-Thought reasoning, introducing Trace Dynamics and a 'potential' metric to quantify how CoT steps influence the likelihood of correct completions, with insights on transferability across LLMs and implications for LRMs and VLMs.