Fund II closed in September 2024 at $40M. We've been actively deploying for about eighteen months, and I want to give a candid update on what we've seen — what the market looks like from a seed investor's vantage point in early 2025, what we got right in our initial thesis, and what we've updated based on what we actually found when we went looking for companies to back.

This is not a marketing document. It's a working memo for founders in our space who want to understand how we're thinking, and for LPs who want more signal than a quarterly letter provides.

What we got right: inference cost as the organizing axis

The central thesis of both funds is that inference cost and latency — not model capability — would determine which AI applications actually ship to users. Eighteen months into Fund II deployment, that bet looks correct, but in a more compressed and intense way than we anticipated.

The speed of commoditization at the model layer has accelerated. The open-weights ecosystem has moved faster than most infrastructure investors predicted: Llama 3 and its successors closed a significant portion of the capability gap to frontier models, and inference optimization techniques (quantization, distillation, speculative decoding) have made deploying capable open-weights models economically viable at production scale. The result: companies that built their moat on API access to a particular model have had their advantage compressed. Companies that built their moat on inference infrastructure — the serving layer, the optimization stack, the deployment automation — have held up better.

This validates the infrastructure-first framing. The model is increasingly a commodity; the systems that make models economically deployable are not.

What we updated: the multi-modal acceleration

When we were writing the Fund II thesis in early 2024, multi-modal models — vision-language models, audio-language models — were an interesting research area with limited production deployments. They accounted for a small fraction of the inference workloads we saw across the portfolio.

By the end of 2024, the trajectory had changed. Production deployments of vision-language models were growing faster than pure-text LLM deployments in several enterprise segments. The inference infrastructure challenges for multi-modal models are meaningfully different: higher memory pressure from image embeddings, different batching constraints because image encoding and text generation have different compute profiles, and serving architectures that have to handle both modalities efficiently in a single request pipeline.

We've updated our investment focus to include teams specifically addressing multi-modal inference infrastructure. The serving software that handles pure-text LLMs efficiently is not automatically efficient for multi-modal workloads — the abstraction layer isn't as portable as it might look from the outside.

Where we've deployed capital

Four investments from Fund II so far, with roughly $12M deployed against $40M committed. The remaining $28M is available for new investments and follow-on rounds in existing portfolio companies.

Tensorwave: GPU cloud infrastructure with a specific focus on the inference performance profile rather than the training-first capacity model. The bet here is that inference-optimized clusters — different networking topology, different memory configuration, different software stack — will win a meaningful share of the production inference market from training-first GPU clouds. We led the seed round in early 2024.

Unsloth: fine-tuning optimization that reduces the compute and memory cost of training small-to-medium models on domain-specific data. The core technical insight is that standard implementations of training operations leave significant efficiency on the floor for fine-tuning use cases — Unsloth's kernel optimizations recover that efficiency without sacrificing numerical precision. We invested at seed in 2025.

Nscale: European-domiciled GPU infrastructure with a compliance and data residency focus. The regulatory thesis — GDPR, DORA, and sector-specific data localization requirements creating demand for EU-sovereign AI compute — has played out faster than we expected as enterprise AI adoption has moved from pilot to procurement. Seed investment in 2025.

Inferless: serverless model serving with a focus on developer experience and cold-start latency reduction. The bet is that the developer primitives for deploying models as serverless functions are still too complex, and that a serving platform with dramatically lower cold-start latency unlocks a class of use cases currently impractical on existing serverless infrastructure. Seed investment in 2026.

What we're still looking for

About $28M remains to deploy over roughly 12-18 months. The areas where we're actively looking:

Inference observability. Production LLM systems have poor observability compared to what mature web infrastructure teams are accustomed to. Request-level telemetry, quality signal collection, and anomaly detection for serving systems are underdeveloped. The companies building this well will have a structural advantage as AI systems move into regulated industries where auditability is a requirement.

Long-context serving infrastructure. As context windows grow — 128K, 1M tokens are now realistic deployment configurations — the KV cache management problem becomes qualitatively harder. Standard continuous batching implementations struggle at these context lengths. Teams building serving architectures specifically for long-context workloads are addressing a real problem that will grow in importance.

Compound AI serving. Single-model inference is being replaced by multi-step pipelines — retrieval, reasoning, tool use, generation — that have to be orchestrated efficiently. The infrastructure for efficient compound AI system serving is poorly abstracted in current tooling. We're looking for teams tackling the serving layer for these systems specifically.

If you're building in any of these areas and want a technical conversation, the fastest path is a direct email to [email protected]. We review every inbound and respond to ones where there's genuine alignment, usually within a week.