{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Numpy and Matplotlib #\n",
    "\n",
    "These are two of the most fundamental parts of the scientific python \"ecosystem\". Most everything else is built on top of them.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What did we just do? We _imported_ a package. This brings new variables (mostly functions) into our interpreter. We access them as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# find out what is in our namespace\n",
    "dir()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find out what's in numpy\n",
    "dir(np)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find out what version we have\n",
    "np.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The numpy documentation is crucial!\n",
    "\n",
    "http://docs.scipy.org/doc/numpy/reference/\n",
    "\n",
    "## NDArrays ##\n",
    "\n",
    "The core class is the numpy ndarray (n-dimensional array)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image(url='http://docs.scipy.org/doc/numpy/_images/threefundamental.png')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# create an array from a list\n",
    "a = np.array([9,0,2,1,0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find out the datatype\n",
    "a.dtype"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find out the shape\n",
    "a.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# what is the shape\n",
    "type(a.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# another array with a different datatype and shape\n",
    "b = np.array([[5,3,1,9],[9,2,3,0]], dtype=np.float64)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check dtype and shape\n",
    "b.dtype, b.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Important Concept__: The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension. (This is called \"c-style\" indexing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More array creation ##\n",
    "\n",
    "There are lots of ways to create arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# create some uniform arrays\n",
    "c = np.zeros((9,9))\n",
    "d = np.ones((3,6,3), dtype=np.complex128)\n",
    "e = np.full((3,3), np.pi)\n",
    "e = np.ones_like(c)\n",
    "f = np.zeros_like(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create some ranges\n",
    "np.arange(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# arange is left inclusive, right exclusive\n",
    "np.arange(2,4,0.25)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# linearly spaced\n",
    "np.linspace(2,4,20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# log spaced\n",
    "np.logspace(1,2,10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# two dimensional grids\n",
    "x = np.linspace(-2*np.pi, 2*np.pi, 100)\n",
    "y = np.linspace(-np.pi, np.pi, 50)\n",
    "xx, yy = np.meshgrid(x, y)\n",
    "xx.shape, yy.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing ##\n",
    "\n",
    "Basic indexing is similar to lists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get some individual elements of xx\n",
    "xx[0,0], xx[-1,-1], xx[3,-5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# get some whole rows and columns\n",
    "xx[0].shape, xx[:,-1].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get some ranges\n",
    "xx[3:10,30:40].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many advanced ways to index arrays. You can read about them in the manual. Here is one example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use a boolean array as an index\n",
    "idx = xx<0\n",
    "yy[idx].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the array got flattened\n",
    "xx.ravel().shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Array Operations ##\n",
    "\n",
    "There are a huge number of operations available on arrays. All the familiar arithemtic operators are applied on an element-by-element basis.\n",
    "\n",
    "### Basic Math ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'np' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-1-13440c77e25b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcos\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0.5\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0myy\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m: name 'np' is not defined"
     ]
    }
   ],
   "source": [
    "f = np.sin(xx) * np.cos(0.5*yy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point you might be getting curious what these arrays \"look\" like. So we need to introduce some visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pcolormesh(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manipulating array dimensions ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# transpose\n",
    "plt.pcolormesh(f.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reshape an array (wrong size)\n",
    "#g = np.reshape(f, (8,9))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# reshape an array (right size) and mess it up\n",
    "print(f.size)\n",
    "g = np.reshape(f, (200,25))\n",
    "plt.pcolormesh(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# tile an array\n",
    "plt.pcolormesh(np.tile(f,(6,1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting ##\n",
    "\n",
    "Broadcasting is an efficient way to multiply arrays of different sizes\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Image(url='http://scipy-lectures.github.io/_images/numpy_broadcasting.png',\n",
    "     width=720)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# multiply f by x\n",
    "print(f.shape, x.shape)\n",
    "g = f * x\n",
    "print(g.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# multiply f by y\n",
    "print(f.shape, y.shape)\n",
    "h = f * y\n",
    "print(h.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use newaxis special syntax\n",
    "h = f * y[:,np.newaxis]\n",
    "print(h.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pcolormesh(g)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reduction Operations ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sum\n",
    "g.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mean\n",
    "g.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# std\n",
    "g.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# apply on just one axis\n",
    "g_ymean = g.mean(axis=0)\n",
    "g_xmean = g.mean(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(x, g_ymean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(g_xmean, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fancy Plotting ##\n",
    "\n",
    "Enough lessons, let's have some fun."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(12,8))\n",
    "ax1 = plt.subplot2grid((6,6),(0,1),colspan=5)\n",
    "ax2 = plt.subplot2grid((6,6),(1,0),rowspan=5)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(10,6))\n",
    "ax1 = plt.subplot2grid((6,6),(0,1),colspan=5)\n",
    "ax2 = plt.subplot2grid((6,6),(1,0),rowspan=5)\n",
    "ax3 = plt.subplot2grid((6,6),(1,1),rowspan=5, colspan=5)\n",
    "\n",
    "ax1.plot(x, g_ymean)\n",
    "ax2.plot(g_xmean, y)\n",
    "ax3.pcolormesh(x, y,  g)\n",
    "\n",
    "ax1.set_xlim([x.min(), x.max()])\n",
    "ax3.set_xlim([x.min(), x.max()])\n",
    "ax2.set_ylim([y.min(), y.max()])\n",
    "ax3.set_ylim([y.min(), y.max()])\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Real Data ##\n",
    "\n",
    "ARGO float profile from North Atlantic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download with curl\n",
    "!curl -O http://www.ldeo.columbia.edu/~rpa/argo_float_4901412.npz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load numpy file and examine keys\n",
    "data = np.load('argo_float_4901412.npz')\n",
    "data.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access some data\n",
    "T = data['T']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# there are \"nans\", missing data, which screw up our routines\n",
    "T.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ar_w_mask = np.ma.masked_array([1, 2, 3, 4, 5],\n",
    "                        mask=[True, True, False, False, False])\n",
    "ar_w_mask"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ar_w_mask.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "T_ma = np.ma.masked_invalid(T)\n",
    "T_ma.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Masked Arrays ##\n",
    "\n",
    "This is how we deal with missing data in numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create masked array\n",
    "T = np.ma.masked_invalid(data['T'])\n",
    "type(T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# max and min\n",
    "T.max(), T.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# load other data\n",
    "S = np.ma.masked_invalid(data['S'])\n",
    "P = np.ma.masked_invalid(data['P'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# scatter plot\n",
    "plt.scatter(S, T, c=P)\n",
    "plt.grid()\n",
    "plt.colorbar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "gist_info": {
   "gist_id": null,
   "gist_url": null
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}