{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy and Matplotlib #\n", "\n", "These are two of the most fundamental parts of the scientific python \"ecosystem\". Most everything else is built on top of them.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What did we just do? We _imported_ a package. This brings new variables (mostly functions) into our interpreter. We access them as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# find out what is in our namespace\n", "dir()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find out what's in numpy\n", "dir(np)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find out what version we have\n", "np.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The numpy documentation is crucial!\n", "\n", "http://docs.scipy.org/doc/numpy/reference/\n", "\n", "## NDArrays ##\n", "\n", "The core class is the numpy ndarray (n-dimensional array)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# create an array from a list\n", "a = np.array([9,0,2,1,0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find out the datatype\n", "a.dtype" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find out the shape\n", "a.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# what is the shape\n", "type(a.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# another array with a different datatype and shape\n", "b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# check dtype and shape\n", "b.dtype, b.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Important Concept__: The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called \"c-style\" indexing)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More array creation ##\n", "\n", "There are lots of ways to create arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# create some uniform arrays\n", "c = np.zeros((9,9))\n", "d = np.ones((3,6,3), dtype=np.complex128)\n", "e = np.full((3,3), np.pi)\n", "e = np.ones_like(c)\n", "f = np.zeros_like(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create some ranges\n", "np.arange(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# arange is left inclusive, right exclusive\n", "np.arange(2,4,0.25)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# linearly spaced\n", "np.linspace(2,4,20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# log spaced\n", "np.logspace(1,2,10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# two dimensional grids\n", "x = np.linspace(-2*np.pi, 2*np.pi, 100)\n", "y = np.linspace(-np.pi, np.pi, 50)\n", "xx, yy = np.meshgrid(x, y)\n", "xx.shape, yy.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing ##\n", "\n", "Basic indexing is similar to lists" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get some individual elements of xx\n", "xx[0,0], xx[-1,-1], xx[3,-5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# get some whole rows and columns\n", "xx[0].shape, xx[:,-1].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get some ranges\n", "xx[3:10,30:40].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many advanced ways to index arrays. You can read about them in the manual. Here is one example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use a boolean array as an index\n", "idx = xx<0\n", "yy[idx].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the array got flattened\n", "xx.ravel().shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array Operations ##\n", "\n", "There are a huge number of operations available on arrays. All the familiar arithemtic operators are applied on an element-by-element basis.\n", "\n", "### Basic Math ##" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [ { "ename": "NameError", "evalue": "name 'np' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0.5\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0myy\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'np' is not defined" ] } ], "source": [ "f = np.sin(xx) * np.cos(0.5*yy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point you might be getting curious what these arrays \"look\" like. So we need to introduce some visualization." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.pcolormesh(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manipulating array dimensions ##" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# transpose\n", "plt.pcolormesh(f.T)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# reshape an array (wrong size)\n", "#g = np.reshape(f, (8,9))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# reshape an array (right size) and mess it up\n", "print(f.size)\n", "g = np.reshape(f, (200,25))\n", "plt.pcolormesh(g)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# tile an array\n", "plt.pcolormesh(np.tile(f,(6,1)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Broadcasting ##\n", "\n", "Broadcasting is an efficient way to multiply arrays of different sizes\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',\n", " width=720)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# multiply f by x\n", "print(f.shape, x.shape)\n", "g = f * x\n", "print(g.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# multiply f by y\n", "print(f.shape, y.shape)\n", "h = f * y\n", "print(h.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# use newaxis special syntax\n", "h = f * y[:,np.newaxis]\n", "print(h.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.pcolormesh(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reduction Operations ##" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sum\n", "g.sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# mean\n", "g.mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# std\n", "g.std()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# apply on just one axis\n", "g_ymean = g.mean(axis=0)\n", "g_xmean = g.mean(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(x, g_ymean)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(g_xmean, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fancy Plotting ##\n", "\n", "Enough lessons, let's have some fun." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(12,8))\n", "ax1 = plt.subplot2grid((6,6),(0,1),colspan=5)\n", "ax2 = plt.subplot2grid((6,6),(1,0),rowspan=5)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(10,6))\n", "ax1 = plt.subplot2grid((6,6),(0,1),colspan=5)\n", "ax2 = plt.subplot2grid((6,6),(1,0),rowspan=5)\n", "ax3 = plt.subplot2grid((6,6),(1,1),rowspan=5, colspan=5)\n", "\n", "ax1.plot(x, g_ymean)\n", "ax2.plot(g_xmean, y)\n", "ax3.pcolormesh(x, y, g)\n", "\n", "ax1.set_xlim([x.min(), x.max()])\n", "ax3.set_xlim([x.min(), x.max()])\n", "ax2.set_ylim([y.min(), y.max()])\n", "ax3.set_ylim([y.min(), y.max()])\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Real Data ##\n", "\n", "ARGO float profile from North Atlantic" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# download with curl\n", "!curl -O http://www.ldeo.columbia.edu/~rpa/argo_float_4901412.npz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load numpy file and examine keys\n", "data = np.load('argo_float_4901412.npz')\n", "data.keys()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# access some data\n", "T = data['T']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# there are \"nans\", missing data, which screw up our routines\n", "T.min()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ar_w_mask = np.ma.masked_array([1, 2, 3, 4, 5],\n", " mask=[True, True, False, False, False])\n", "ar_w_mask" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ar_w_mask.mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "T_ma = np.ma.masked_invalid(T)\n", "T_ma.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Masked Arrays ##\n", "\n", "This is how we deal with missing data in numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create masked array\n", "T = np.ma.masked_invalid(data['T'])\n", "type(T)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# max and min\n", "T.max(), T.min()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# load other data\n", "S = np.ma.masked_invalid(data['S'])\n", "P = np.ma.masked_invalid(data['P'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# scatter plot\n", "plt.scatter(S, T, c=P)\n", "plt.grid()\n", "plt.colorbar()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "gist_info": { "gist_id": null, "gist_url": null }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }