We invest where inference meets production.
Seed and Pre-Seed in AI inference infrastructure and model optimization — the systems that close the gap between a trained model and a deployed one.
We invest in startups building the systems layer beneath foundation models.
The model is not the moat. The infrastructure is.
Foundation models are increasingly commoditized. GPT-4-class capability is available as open weights. The durable value in the AI stack is not who trained the largest model — it is who built the infrastructure that makes models economically deployable at the request volumes that real applications require.
Inference cost per token is still dropping by an order of magnitude every 18 months. Latency constraints are tightening as AI enters real-time workloads — recommendation, code generation, document processing. Fine-tuning is shifting from research experiment to production requirement. Model routing between providers is an active engineering problem with measurable cost consequences, not a future abstraction.
We invest in the founding teams who have built enough of this infrastructure to see its failure modes clearly — and who are building the production primitives that didn't exist when they needed them. GPU scheduling, serverless serving, continuous batching, quantization tooling, model routing, CI/CD for ML pipelines. The companies that solve these problems compound across every application vertical above them.
Continuous batching, INT4/INT8 quantization, speculative decoding, hardware-aware kernel scheduling. Teams driving down cost per token at the throughput levels production workloads actually require.
Serverless GPU inference, model compression for edge deployment, multi-cloud serving primitives, cold-start elimination. The gap between a model checkpoint and a production endpoint — these companies close it.
Parameter-efficient fine-tuning (LoRA, QLoRA), training memory optimization, domain adaptation pipelines. Moving custom model training from research scripts into repeatable engineering workflows.
CI/CD for ML pipelines, intelligent model routing and cost-based selection, observability and latency tracing for inference systems. The operational layer that production ML teams are still building by hand.
Two funds. One thesis.
Technical operators, not just capital.
Firntal's partners built GPU scheduling layers, ML serving platforms, and low-latency execution systems before they managed capital. The support is grounded in that experience — not in a playbook written for a different category.
Sarah reviews system architecture directly with founding teams as part of diligence — serving stack, batching design, hardware assumptions, scaling model. Founders who engage with those questions deeply are typically the ones we back. It is also an accurate preview of how we operate post-investment.
Our network spans hyperscale cloud engineering teams, GPU hardware suppliers, and open-source ML communities built through years of practitioner work. We make introductions that matter — to engineers who can become hires, to cloud allocation contacts, to infrastructure founders who have solved adjacent problems.
Lukas joins as board observer or director on most investments. We arrive with prepared positions on technical strategy, financial trajectory, and fundraising posture. We do not arrive at board meetings to ask what the team has been doing.
The ETH Zurich distributed systems and ML engineering community runs through Firntal. Introductions to senior ML engineers, inference infrastructure leads, and potential founding CPOs — from networks built as practitioners, not assembled as investors.
Niklas maps infrastructure buyers across financial services, healthcare, and manufacturing — sectors where AI inference is entering procurement cycles. Introductions go to engineering directors and CTO-level buyers who are actively evaluating inference infrastructure, not generic warm notes.
We start fundraising preparation 9 months before portfolio companies need capital. Financial model construction, investor targeting, and narrative positioning — built with time to iterate, not assembled in the two weeks before a runway conversation becomes urgent.
Building in this space?
We review every inbound from founders working on inference infrastructure, model optimization, or the systems layer beneath foundation models. The fastest path to a conversation is a direct note to Lukas — no deck required to start.
[email protected]