// Thinking
Writing from the team.
Technical analysis, thesis formation, and field notes — on inference cost, model serving, quantization, GPU markets, and the infrastructure layer we invest in. Written by practitioners, not communications teams.
May 22, 2026
Edge Inference: Emerging Architectures and Where the Value Goes
Mar 17, 2026
Inference Market Consolidation: A March 2026 Update
Jan 19, 2026
Model Serving Cost Economics in Early 2026
Oct 14, 2025
Multi-Modal Inference: The Next Frontier for Infrastructure Builders
Jul 22, 2025
Inference Optimization: State of the Art and Where It Goes Next
May 8, 2025
AI Winter Is Not Coming — But the Application Layer Will Thin
Mar 19, 2025
Fund II: Eighteen Months In
Feb 3, 2025
Continuous Batching and the Path to Real Throughput Gains
Dec 16, 2024
Sovereign AI Clouds: Why Geography Will Matter for Inference
Nov 4, 2024
What 'Operator Capital' Actually Means in AI Infrastructure
Sep 17, 2024
Real-Time Recommendation as an Inference Architecture Problem
Jul 29, 2024
Model Compression at Production Scale
Jun 10, 2024
Why CI/CD for ML Pipelines Is the Right Abstraction
Apr 22, 2024
Latency vs. Throughput: The Fundamental Trade-off in Inference Systems
Mar 6, 2024
GPU Cloud Consolidation: The Next Structural Shift
Jan 30, 2024
Quantization and the Efficiency Frontier
Nov 14, 2023
The Case for Serverless GPU Infrastructure
Sep 25, 2023
Serverless Model Serving as a Developer Primitive
Aug 7, 2023
Model Routing Is the New Load Balancer
Jun 18, 2023
Why Infrastructure Bets Win Before Application Layer Shakes Out
May 2, 2023
The Inference Cost Inflection Point
Mar 12, 2023
The Fine-Tuning Landscape in Early 2023
Jan 23, 2023
Inference vs. Training: Where Value Accrues Over Time
Nov 8, 2022
A Mental Model for the AI Infrastructure Stack
Sep 14, 2022
Why We Backed Inference Before the World Cared