Visualize Streaming Data in Python

Illustration credit to Stable Diffusion

Earlier in December, I gave a workshop at PyData Global (you can check out the slides and code) on how to work with streaming data in Python. In the workshop, there was a step in the dataflow where we visualized data from multiple sensors in real time. I wanted to show a visualization in the dataflow because it is a good example of how when we visualize data we can see outliers and come to conclusions fairly easily as humans, but it can be much more difficult to do so when writing a program to do the same. When I was developing the tutorial, I realized that it isn’t easy to use the existing Python visualization ecosystem to build a chart that updates in real time and doesn’t get overwhelmed when the data is coming fast and furious. After trying a myriad of different python libraries I decided to go to where real-time, interactive plotting has been happening forever, JavaScript. I landed on using Chart.js, which is one of the most widely used plotting libraries and makes it easy to create a simple and nice-looking visualization. For the workshop, we were using a single google colab notebook for everything so that it was in an easily replicable environment. For the rest of this post, I’ll outline what I used to get things working and how you can add visualizations to your dataflow in Jupyter Notebook environments. Want to skip the tutorial and go straight to the complete code? colab notebook and gist.

What we will cover:

What IPython Display is and how it works
Creating plots with Chart.js
Adding plots to a dataflow to visualize streaming data

IPython Display

If you aren’t familiar with IPython, it stands for Interactive Python. IPython is a set of tools, libraries and frameworks that allow for interactive, media rich Python environments and programs, one of these tools is the Jupyter Kernel which powers Jupyter Notebooks, which are an interactive computing platform.

In IPython environments, including Jupyter Notebooks, you have access to a suite of functionality for modifying and interacting with the environment, one of these is the Display module. The IPython Display module provides functionality for displaying rich media within an IPython environment. This allows us to display a wide range of content, including HTML, images, videos, and more, directly within an IPython notebook or terminal. You can use this to create HTML elements and use JavaScript to dynamically update them. You could run a cell to render and display some content or even display a JavaScript powered dashboard in a notebook.

For example, you could execute the code below in a notebook cell to render and display some content.

from IPython.display import display, HTML, Javascript

display(HTML('''<p>Having fun learning about IPython?</p>
<br>
<p>You can find out more <a href="https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html">in the docs</a></p>
'''))

Chart.JS

Chart.js is a common JavaScript charting library. From their website, it’s described as “Simple yet flexible JavaScript charting for designers & developers”. There are many examples in their docs and on their website of how to build and design different charts with the library and use it with a framework like React or as vanilla JavaScript embedded in some HTML as a script. It’s the latter functionality that we are going to take advantage of with our IPython display module. Let’s take a look at some boilerplate chart.js code embedded inline in some HTML and what is going on.

<canvas id="myChart"></canvas>
<script src="https://cdn.jsdelivr.net/npm/chart.js@4.0.1/dist/chart.umd.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/chart.js/dist/chart.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns/dist/chartjs-adapter-date-fns.bundle.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-streaming@2.0.0"></script>
<script>
  var ctx = document.getElementById('myChart').getContext('2d');
  var chart = new Chart(ctx, {
    // The type of chart we want to create
    type: 'line',

    // The data for our dataset
    data: {
      datasets: []
    },

      // Configuration options go here
    options: {
      animation: false,  // disable animations
      plugins: {
        streaming: {
          frameRate: 5   // chart is drawn 5 times every second
        }
      },
      scales: {
        x: {
          type: 'time',
        }
      }
    }
  });
</script>

In the HTML above, we have first created a canvas element with an id. This will allow us to update the element with JavaScript later. We have a few minimized JavaScript files. These are for chart.js and some additional time series functionality that is useful for real-time plots.

Below the linked files, we have JavaScript code that will create our chart, if you aren’t familiar with JavaScript, I am no expert either, but this was straightforward enough for me to piece together. In plain English, what we are doing is grabbing our canvas object and then defining our chart with some configurations. The configurations in our chart above are the chart type, an empty data and datasets object and options that control animation and transform the x-axis to work with timescales.

Now that we have our chart we need to add some additional JavaScript to update it. Inside the same script we can add a function that will take new data and update the chart.

function addData(label, value){
    // check if dataset exists
    const exists = chart.data.datasets.findIndex(o => o.label === label)
    // if not add the dataset
    if(exists === -1) {
      const randomColor = Math.floor(Math.random()*16777215).toString(16);
      chart.data.datasets.push({label: label, data: [value], borderColor : "#" + randomColor})
    } else {
      const len = chart.data.datasets.find(o => o.label === label).data.push(value)
      // optional windowing
      if(len > 10) {
        chart.data.datasets.find(o => o.label === label).data.shift()
      }
    }
    
    chart.update();
}

To better understand what is going on above, you need to know a bit about how Charts.js works. Chart.js will display anything added as a dataset and will update the chart dynamically when a dataset is added or the data in the dataset is modified. So, what we are doing in this function is checking if the dataset exists in the datasets object, if it does exist, we will push the data into the existing data array. At the same time, we check the length of the array and remove the 10th item to keep the window of the visualization to a reasonable size. If the dataset does not exist, we generate a color for the dataset and then push the label, data, and border color information to our datasets object.

Adding it to Python

To use the HTML and JavaScript in Python, we can add it using our IPython code we wrote earlier. To have it render properly, we would need to use the HTML() method and place that inside the display() method.

from IPython.display import display, HTML, Javascript

def build_chart():
  display(HTML('''
        <canvas id="myChart"></canvas>
        <script src="https://cdn.jsdelivr.net/npm/chart.js@4.0.1/dist/chart.umd.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chart.js/dist/chart.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns/dist/chartjs-adapter-date-fns.bundle.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-streaming@2.0.0"></script>
        <script>
                  var ctx = document.getElementById('myChart').getContext('2d');
                  var chart = new Chart(ctx, {
                    // The type of chart we want to create
                    type: 'line',
                
                    // The data for our dataset
                    data: {
                      datasets: []
                    },
                
                      // Configuration options go here
                    options: {
                      animation: false,  // disable animations
                      plugins: {
                        streaming: {
                          frameRate: 5   // chart is drawn 5 times every second
                        }
                      },
                      scales: {
                        x: {
                          type: 'time',
                        }
                      }
                    }
                  });

          
          
          function addData(label, value){
            // check if dataset exists
            const exists = chart.data.datasets.findIndex(o => o.label === label)
            // if not add the dataset
            if(exists === -1) {
              const randomColor = Math.floor(Math.random()*16777215).toString(16);
              chart.data.datasets.push({label: label, data: [value], borderColor : "#" + randomColor})
            } else {
              const len = chart.data.datasets.find(o => o.label === label).data.push(value)
              // optional windowing
              if(len > 10) {
                chart.data.datasets.find(o => o.label === label).data.shift()
              }
            }

            chart.update();
          }
        </script>
      '''))
build_chart()

This code, when used in a Jupyter notebook will display an empty chart. In order to add data to this chart, we would want to make a call to our JavaScript function addData . We can also use IPython display module to do this with the Javascript() method. Our addData function requires a label and the value, let’s look at an example down below.

func_string = '''addData("sensor1", {x: "2022-12-12 12:12:12", y: 25})'''
display(Javascript(func_string))

If we ran this in the same cell, we would get an output of our graph with a single plotted value on it from sensor1.

Since we ultimately want this chart to plot the data as it is received, we will want to automate the string creation shown above. We can put this in a Python function called plot. That receives a dictionary with the sensor_reading data.

def plot(sensor_reading):
  data_str = '{' + f'''x: "{sensor_reading['created_at']}", y: {sensor_reading['value']}''' + '}'
  dis_str = f'''addData("{sensor_reading['sensor_id']}", {data_str})'''
  display(Javascript(dis_str))

This is great! We have a working JavaScript chart that we can dynamically update in our Jupyter Notebook.

Putting it all together

Now let’s put it all together in a dataflow that will generate some sensor readings and output them to our chart dynamically in our notebook. To do this, let’s start with a dataflow input.

import json
from random import randint
from time import sleep
from datetime import datetime

from bytewax.dataflow import Dataflow
from bytewax.execution import run_main, spawn_cluster
from bytewax.inputs import ManualInputConfig
from bytewax.outputs import ManualOutputConfig

# Add data to input topic
sensors = [
    {"sensor_id":"a12"}, 
    {"sensor_id":"a34"}, 
    {"sensor_id":"a56"}, 
    {"sensor_id":"a78"}, 
    {"sensor_id":"a99"}
    ]

def input_builder(worker_index, worker_count, state):
    def file_input():
        for i in range(50):
          sleep(0.1)
          dt = datetime.now()
          for sensor in sensors:
            sensor_reading = sensor
            sensor_reading['created_at'] = dt
            sensor_reading['value'] = randint(0,100)
            
            yield None, sensor_reading
          
    return file_input()

flow = Dataflow()
flow.input('gen_sensor', ManualInputConfig(input_builder))

In the dataflow, we are defining a custom input via ManualInputConfig. This allows us to mock sensor readings from 5 different sensors. It will pass a dictionary with the sensor id, the time the sensor reading occurred, and the sensor reading value on to the next step in the dataflow.

Now that we have sensor readings generated in our custom input, we are going to leverage the capture operator to visualize the values using the code we wrote in the previous sections.

from IPython.display import display, HTML, Javascript

def build_chart():
  display(HTML('''
        <canvas id="myChart"></canvas>
        <script src="https://cdn.jsdelivr.net/npm/chart.js@4.0.1/dist/chart.umd.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chart.js/dist/chart.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns/dist/chartjs-adapter-date-fns.bundle.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-streaming@2.0.0"></script>
        <script>
                  var ctx = document.getElementById('myChart').getContext('2d');
                  var chart = new Chart(ctx, {
                    // The type of chart we want to create
                    type: 'line',
                
                    // The data for our dataset
                    data: {
                      datasets: []
                    },
                
                      // Configuration options go here
                    options: {
                      animation: false,  // disable animations
                      plugins: {
                        streaming: {
                          frameRate: 5   // chart is drawn 5 times every second
                        }
                      },
                      scales: {
                        x: {
                          type: 'time',
                        }
                      }
                    }
                  });

          
          
          function addData(label, value){
            // check if dataset exists
            const exists = chart.data.datasets.findIndex(o => o.label === label)
            // if not add the dataset
            if(exists === -1) {
              const randomColor = Math.floor(Math.random()*16777215).toString(16);
              chart.data.datasets.push({label: label, data: [value], borderColor : "#" + randomColor})
            } else {
              const len = chart.data.datasets.find(o => o.label === label).data.push(value)
              // optional windowing
              if(len > 10) {
                chart.data.datasets.find(o => o.label === label).data.shift()
              }
            }

            chart.update();
          }
        </script>
      '''))
  
build_chart()

def output_builder(worker_index, worker_count):
    return plot

def plot(sensor_reading):
  data_str = '{' + f'''x: "{sensor_reading['created_at']}", y: {sensor_reading['value']}''' + '}'
  dis_str = f'''addData("{sensor_reading['sensor_id']}", {data_str})'''
  display(Javascript(dis_str))

flow.capture(ManualOutputConfig(output_builder))
run_main(flow)

The modifications we made to the code are that we have added a capture operator and in the operator, we have used the ManualOutputConfig module, which will allow us to use some custom logic on each sensor reading. We are generating the JavaScript function call in the plot function and this then updates our display. Finally, we use the run_main execution method, which runs the dataflow in the current process and is required for the notebook functionality to work properly.

If you are following along in the notebook, let’s run it! Below the cell with our capture step, you should see something similar to the graphic below (colors will vary as they are dynamically generated).