Beavers

Beavers is a python library for stream processing, optimized for analytics.

It is used at Tradewell Technologies, to calculate analytics and serve model predictions, for both realtime and batch jobs.

Key Features

Works in real time (eg: reading from Kafka) and replay mode (eg: reading from Parquet files).
Optimized for analytics, using micro-batches (instead of processing records one by one).
Similar to incremental, it updates nodes in a dag incrementally.
Taking inspiration from kafka streams, there are two types of nodes in the dag:
- Stream: ephemeral micro-batches of events (cleared after every cycle).
- State: durable state derived from streams.
Clear separation between the business logic and the IO. So the same dag can be used in real time mode, replay mode or can be easily tested.
Functional interface: no inheritance or decorator required.
Support for complicated joins, not just "linear" data flow.

No concurrency support. To speed up calculation use libraries like pandas, pyarrow or polars.
No async code. To speed up IO use kafka driver native thread or parquet IO thread pool.
No support for persistent state. Instead of saving state, replay historic data from kafka to prime stateful nodes.