Bytewax v0.18 is out now!

Stream processing
purely in Python

Open source framework and distributed stream processing engine. Build streaming data pipelines and real-time apps with everything you need: recovery, scalability, windowing, aggregations, and connectors.

00:00 / 00:00
How it works

Build streaming data applications easily. In Python.

Step 1Easy install
> pip install bytewax
Step 2Connect to data sources
from bytewax import operators as op
from bytewax.connectors.kafka import operators as kop
from bytewax.dataflow import Dataflow

BROKERS = ["localhost:19092"]
IN_TOPICS = ["in_topic"]
OUT_TOPIC = "out_topic"

flow = Dataflow("kafka_in_out")
kinp = kop.input("inp", flow, brokers=BROKERS, topics=IN_TOPICS)
op.inspect("inspect-errors", kinp.errs)
op.inspect("inspect-oks", kinp.oks)
kop.output("out1", kinp.oks, brokers=BROKERS, topic=OUT_TOPIC)
Step 3Stateful operations like windowing and aggregations
from datetime import timedelta
import numpy as np
from bytewax.operators import window as window_op
from bytewax.operators.window import TumblingWindow, SystemClockConfig

cc = SystemClockConfig()
wc = TumblingWindow(length=timedelta(seconds=1))

def build_array():
    return np.empty(0)

def insert_value(np_array, value):
    return np.insert(np_array, 0, value)

windowed_stream = wop.fold_window("window", stream, cc, wc, build_array, insert_value)
Step 4Use the Python tools you are familiar with
import numpy as np

avg_stream = flow.map("average", windowed_stream, lambda x: np.mean(x[1]))
Step 5Run locally
> python -m bytewax.run my_dataflow:flow
Step 6Deploy anywhere
> waxctl df deploy my_dataflow.py
Scrapy, PyTorch, Huggingface, Pandas, Numpy, Tensorflow, Streamlit, Polars, spaCy, Requests, scikit learn, Matplotlib, SQLAlchemy
Easy integrations

Leverage the Python Ecosystem

Bytewax can be used out-of-the box with any python library to connect to hundreds of data sources and use the entire ecosystem of data processing libraries.

Guides

What can you build with bytewax?