Show HN: I built Wool, a lightweight distributed Python runtime

Show HN: I built Wool, a lightweight distributed Python runtime (github.com)

4 points by bzurak 5 hours ago | 2 comments

I spent a long time working in the payments industry, specifically on a rather niche reporting/aggregation platform with spiky workloads that were not easily parallelized. To pump as much data through our pipeline as possible, we had to rely on complex locking schemes across half a dozen or so not-so-micro services - keeping a clear mental picture of how the services interacted for a given data source was a major headache. This problem always intrigued me, even after I no longer worked at the company, and lead to the development of Wool.

If you've worked with frameworks like Ray or Prefect, you're probably familiar with the promise of going from script to scale in two lines of code (or something along those lines). This is essentially the solution I was looking for: a framework with limited boilerplate that facilitated arbitrary distribution schemes within a single, coherent codebase. What I was hoping for, though, was something a little bit more focused - I wasn't working on ML pipelines and didn't need much else other than the distribution layer. This is where Wool comes in. While it's API is very similar to those of Ray and Prefect, where it differentiates itself is in its scope and architecture.

First, Wool is not a task orchestrator. It provides push-based, best-effort, at-most-once execution. There is no built-in coordination state, retry logic, or durable task tracking. Those concerns remain application-defined. The beauty of Wool is that it looks and feels like native async Python, allowing you to use purpose-built libraries for your needs as you would for any other Python app (with some caveats).

Second, Wool was designed with speed in mind. Because it's not bloated with features, it's actually pretty fast, even in its current nascent state. Wool routines are dispatched directly to a decentralized peer-to-peer network of gRPC workers, who can distribute nested routines amongst themselves in turn. This results in low dispatch latencies and high throughput. I won't make any performance claims until I can assemble some more robust benchmarks, but running local workers on my M4 MacBook Pro (a trivial example, I know), I can easily achieve sub-millisecond dispatch latencies.

Anyway, check it out, any and all feedback is welcome. Regarding docs- the code is the documentation for now, but I promise I'll sort that out soon. I've got plenty of ideas for next steps, but it's always more fun when people actually use what you've built, so I'm open to suggestions for impactful features.

-Conrad