BoolSi announces its $6M seed round

§ 00TL;DR

We enable software developers to compile hotspots into custom hardware accelerators.

Software engineers can already get 100× speedups for their workloads. They just have to spend a decade learning to design digital logic first. We're building the compiler that removes that prerequisite.

Feed BoolSi a hotspot in C, C++, or any other high-level language, and out comes a custom-generated circuit and driver, in minutes instead of months. The target is reconfigurable silicon (FPGAs) sitting next to the CPU, and our initial focus is embedded developers in robotics, the people who feel every wasted microsecond in their product.

The core idea: we train machine learning (ML) models to learn a program's behavior, not its implementation. We've developed an architecture in which neural networks naturally converge into 100% accurate digital circuits, analyzable and optimizable with the existing chip design toolchain.

The AI isn't designing a chip. The AI gets solidified into a chip.

We've raised $6M led by Fine Structure Ventures (an F-Prime fund), with participation from Pillar VC, Fifth Quarter Ventures, and Coalition Ventures, to put this in the hands of developers who need better performance, lower latency, and faster time to market.

We're opening a private beta in Q3 2026. If your product is bottlenecked on a CPU or a microcontroller, we want to hear from you.

§ 01Problem

Chip design is closer to watchmaking than to writing software.

General-purpose CPUs are remarkably wasteful on any single workload. Every instruction has to be fetched, cached, decoded, and pushed through a pipeline designed to be okay at everything and great at nothing.

CPUs have to serially work around architectural limitations.

Custom hardware sidesteps all of that by trading silicon area for time. Instead of streaming a workload serially through a generalist pipeline, a purpose-built chip spreads the computation out in space: every operation gets its own gates, every data path its own wires, all running in parallel. The narrower the workload it has to support, the more aggressively it can be specialized, which is why purpose-built chips routinely run fixed workloads orders of magnitude faster than CPUs.

Field Programmable Gate Arrays (FPGAs) adapt their architecture to fit the problem.

So why isn't everyone using custom hardware? Because designing chips is closer to watchmaking than to programming.

In software, mistakes degrade gracefully. Bugs get skipped on most runs, exceptions get caught, stack state clears every time you exit a function, and you can reason about a single point of execution at a time. Chips on the other hand are extremely complex distributed systems with millions of subcircuits all talking to each other a billion times a second, often for years without downtime. Every part of a chip is tailor-made for its neighbors, more like cogs in a mechanical watch than functions in a library. You can't plug in a module the way you import a library, because you also have to know exactly when it does what it does. Does this multiplier return a result in one cycle or four? Does it finish faster if one of the inputs is a zero?

The result is that hardware is painful to build, worse to debug, and treated by the companies that build it as cherished IP that mustn't be shared. Open-source resources are thin, the cost of entry is enormous, and time to market is measured in years.

The vast majority of software engineers have no viable path into hardware, even when their applications would massively benefit from it.

§ 02Approach

Introducing the hardware glue layer.

Think of the scene in Episode IV where Chewbacca fixes C-3PO by jury-rigging a head onto a neck and splicing a couple of (maybe random?!) wires. That is what hardware engineering should feel like. You should be able to grab the parts, wire them up, and iterate until it works.

I started BoolSi because I wanted to live in a world where I can build chips at the same speed I write software. In software I can pull in a few libraries, glue them together, and solve my problem in an afternoon. LLMs have further collapsed the distance between an idea and working code, and the pool of people who can ship code has never been larger. We're nowhere close to that in hardware, and the gap is growing. General-purpose code is now reappearing as the slow part of the stack, and the next compression is from working code to working silicon.

Today, hardware is specified in prose documents and then manually translated into digital logic by an engineer who interprets the spec. But a program that exercises the desired behavior is the best kind of spec there is: exact, executable, exhaustive, and already machine-readable. What's needed is a second compiler stage that takes that code and turns it into circuits, closing the loop from intent to silicon.

The catch is that code doesn't map onto circuits the way it maps onto a CPU. A line of C doesn't correspond to a wire or a gate the way a node in a schematic does. Software is sequential, hardware is spatial.

The right compilation target isn't the program's text, it's the program's behavior: the transformation it performs from inputs to outputs. What does this function do, not how does it do it.

Running a program on inputs and collecting outputs reveals program behavior.

That reframing is the whole game. Once you accept that what you're trying to reproduce is the transformation, not the source, the question shifts from "how do I translate syntax into gates" to "how do I learn this function and emit a circuit that computes it." And learning input-output behavior turns out to be a problem ML handles well.

You might reasonably ask how this differs from high-level synthesis. HLS tools like Vivado HLS, Catapult, and Bambu have made existing computer architects more productive, but no HLS tool to date has opened the field to software engineers. BoolSi is aimed squarely at that gap.

§ 03Method

Breaking away from ML intuitions.

My background is in neural networks, but I did my PhD in computer architecture, and the thing that became impossible to ignore is that the two fields keep reinventing the same primitives under different names. Two communities, separated mostly by vocabulary, have spent decades building parallel taxonomies of the same underlying objects. That correspondence is what makes the whole approach possible: a neural network isn't a foreign thing being coerced into hardware, it already is (temporarily analog) hardware, described in a different language.

Feedforward DNN→Combinational circuit
RNN / LSTM / GRU→Latches and registers
Attention→Content-addressable memory
Neural Turing Machine→Addressing unit

Once you see the correspondence, the question stops being "can ML produce a circuit" and starts being "can a neural network be trained to behave like a digital circuit?" That's the architecture work BoolSi has been doing.

The setup is also unusual on the data side. Once you frame the problem as learning input-output behavior, the source program is a synthetic data generator. A fuzzer explores the input space, edge cases, hard-to-reach states, and the unglamorous middle, and every run is a perfectly labeled training example: these inputs, that output. There is no dataset to collect and no labels to annotate. The fuzzer does the job a data labeling pipeline would do in a normal ML project, except it's exact and effectively unlimited. That alone inverts a few ML intuitions:

The dataset is exhaustive and exact, so 100% accuracy is both achievable and required.
The training and test sets fully overlap, because you're targeting a finite, enumerable function and your fuzzer is unlikely to ever sample the same input twice.
Overfitting is essentially impossible. With an unbounded dataset, any circuit that reaches zero loss tends to be correct because it's small, not because it's memorized.

On the architecture side, we've developed networks that are a strict superset of the circuits that can implement a program's behavior. Training drives the weights toward exact discrete values, and reaching zero loss is synonymous with the network having become a fully digital circuit.

The model doesn't approximate the circuit it's converging on. It collapses into it.

BoolSi trains neural networks that naturally converge into digital circuits.

Verification falls out of the same setup. We run functional verification against the source program and train multiple independent models in parallel, then formally check those models against each other. A bug has to corrupt every replica in the same way to slip past both checks.

A concrete example: a 10,000-line C regex library, pointed at the hotspot that scans a text stream for email addresses. Compiled with gcc -O3 for an ARM Cortex-A9, the matcher runs in 2.66 ms; the same code compiled to a single BoolSi hardware agent runs in 0.325 ms, an 8.2× speedup. Because the fabric parallelizes, eight agents bring it to 0.042 ms, 63× over the CPU baseline. No timing was hand-tuned; the circuit is a build artifact, verified against the original C.

That's the technical core of what we're building. Feed in sensor inputs and desired actuator outputs, and the compiler learns the control logic. Feed in a UART stream and an SPI interface, and it learns the protocol translation. The developer describes what, not how.

§ 04Vision

Every CPU deserves a co-processor.

Our long-term bet is that every CPU in existence benefits from some amount of reconfigurable logic next to it. General-purpose processors are a remarkable feat of engineering, but the workloads people actually run on them aren't general-purpose. They're loops, kernels, and protocol handlers that the same silicon executes a billion times over. There's no good reason for those hotspots to keep paying the fetch-decode-execute tax forever.

Because BoolSi compiles behavior rather than parsing syntax, the input language is mostly a matter of taste. C, C++ and Rust are the obvious targets, but Python, Matlab, JavaScript, or anything else that can be executed on inputs and produce outputs works the same way. The compiler doesn't care how the spec was written, only what it does.

§ 05Next steps

Embedded first.

In the near term we're focused on the developers who feel the CPU tax most acutely: embedded engineers in robotics, where every microsecond and every wasted cycle shows up in the product. The workloads are concrete: tight motor control loops, sensor fusion, optical flow estimation, model-predictive control, the list goes on. They all share the same problem: a general-purpose processor wasting cycles and latency on the same hot loop, over and over. A custom accelerator changes what's physically possible for these teams.

This isn't only for engineers who can't build hardware. Even a seasoned hardware team that could design the accelerator by hand rarely has the months it takes. BoolSi compresses that into a loop they can run in an afternoon. We're starting with FPGAs because they let a developer ship something today, and because the toolchain we're building generalizes naturally to ASICs as workloads stabilize.

— Mihailo Isakov Founder & CEO, BoolSi