{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Processing Pipelines" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import holoviews as hv\n", "from bokeh.sampledata import stocks\n", "from holoviews.operation.timeseries import rolling, rolling_outlier_std\n", "from holoviews.streams import Stream\n", "hv.notebook_extension('bokeh')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous guides we discovered how to load and declare [dynamic, live data](./5-Live_Data.ipynb) and how to [transform elements](./05-Transforming_Elements.ipynb) using operations. In this guide we will discover how to combine dynamic data with operations to declare lazy and declarative data processing pipelines, which can be used for interactive exploration but can also drive complex dashboards or even bokeh apps.\n", "\n", "## Declaring dynamic data\n", "\n", "We will begin by declaring a function which loads some data. In this case we will just load some stock data from the bokeh but you could imagine querying this data using REST interface or some other API or even loading some large collection of data from disk or generating the data from some simulation or data processing job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def load_symbol(symbol, **kwargs):\n", " df = pd.DataFrame(getattr(stocks, symbol))\n", " df['date'] = df.date.astype('datetime64[ns]')\n", " return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close'))\n", "\n", "stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']\n", "dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We begin by displaying our DynamicMap to see what we are dealing with. Recall that a ``DynamicMap`` is only evaluated when you request the key so the ``load_symbol`` function is only executed when first displaying the ``DynamicMap`` and whenever we change the widget dropdown:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts Curve [width=600] {+framewise}\n", "dmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is very common to want to process some data, for this purpose HoloViews provides so-called ``Operations``, which are described in detail in the [Transforming Elements](../user_guide/Transforming_Elements.ipynb). ``Operations`` are simply parameterized functions, which take HoloViews objects as input, transform them in some way and then return the output.\n", "\n", "In combination with [Dimensioned Containers](../user_guide/Dimensionsed_Containers.ipynb) such as ``HoloMap`` and ``GridSpace`` they are a powerful way to explore how the parameters of your transform affect the data. We will start with a simple example. HoloViews provides a ``rolling`` function which smoothes timeseries data with a rolling window. We will apply this operation with a ``rolling_window`` of 30, i.e. roughly a month of our daily timeseries data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "smoothed = rolling(dmap, rolling_window=30)\n", "smoothed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see the ``rolling`` operation applies directly to our ``DynamicMap``, smoothing each ``Curve`` before it is displayed. Applying an operation to a ``DynamicMap`` keeps the data as a ``DynamicMap``, this means the operation is also applied lazily whenever we display or select a different symbol in the dropdown widget." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Controlling operations via Streams" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous section we briefly mentioned that in addition to regular widgets ``DynamicMap`` also supports streams, which allow us to define custom events our ``DynamicMap`` should subscribe to. To learn more about streams see the [Responding to Events](../user_guide/11-Responding_to_Events.ipynb). Here we will declare a stream that controls the rolling window:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rolling_stream = Stream.define('rolling', rolling_window=5)\n", "stream = rolling_stream()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can define a function that both loads the symbol and applies the ``rolling`` operation passing our ``rolling_window`` parameter to the operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts Curve [width=600] {+framewise}\n", "\n", "def rolled_data(symbol, rolling_window, **kwargs):\n", " curve = load_symbol(symbol)\n", " return rolling(curve, rolling_window=rolling_window)\n", " \n", "rolled_dmap = hv.DynamicMap(rolled_data, kdims='Symbol',\n", " streams=[stream]).redim.values(Symbol=stock_symbols)\n", "\n", "rolled_dmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we have a handle on the ``Stream`` we can now send events to it and watch the plot above update, let's start by setting the ``rolling_window=50``. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stream.event(rolling_window=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of manually defining a function we can also do something much simpler, namely we can just apply the rolling operation to the original ``DynamicMap`` we defined and pass our ``rolling_stream`` to the operation. To make things a bit more interesting we will also apply the ``rolling_outlier_std`` function which computes outliers within the ``rolling_window``. We supply our stream to both:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts Scatter (color='red' marker='triangle')\n", "stream = rolling_stream()\n", "\n", "smoothed = rolling(dmap, streams=[stream])\n", "outliers = rolling_outlier_std(dmap, streams=[stream])\n", "\n", "smoothed * outliers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the ``rolling_stream`` instance we created is bound to both operations, triggering an event on the stream will trigger both the ``Curve`` and the ``Scatter`` of outliers to be updated:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stream.event(rolling_window=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can chain operations like this indefinitely and attach streams to each stage. By chaining we can watch our visualization update whenever we change a stream value anywhere in the pipeline and HoloViews will be smart about which parts of the pipeline are recomputed, which allows us to build complex visualizations very quickly.\n", "\n", "In later guides we will discover how to tie custom streams to custom widgets letting us easily control the stream values and making it trivial to define complex dashboards. ``paramNB`` is only one widget framework we could use: we could also choose ``paramBokeh`` to make use of bokeh widgets and deploy the dashboard on bokeh server, or we could manually link ``ipywidgets`` to our streams. For more information on how to deploy bokeh apps from HoloViews and build dashboards see the [Deploying Bokeh Apps](./Deploying_Bokeh_Apps.ipynb) and [Dashboards](./15-Dashboards.ipynb) guides." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }