Introduction to Scientific Programming with Python¶

Session 4: NumPy, SciPy & matplotlib

Denis Tome' -- [email protected] -- http://www0.cs.ucl.ac.uk/staff/D.Tome/

Importing numpy, scipy, and matplotlib¶

In [57]:

%matplotlib inline

Start the Ipython process with¶

ipython notebook or ipython qtconsole, depending on your prefered interface.

Start your notebook with the following command - it enables the interactive plotting in your notebook or qtconsole:

or use

%matplotlib qt

if you are using the qtconsole interface

Get plotting tools¶

we need both plotting tools and numpy for this tutorial

In [58]:

import matplotlib.pyplot as plt
import numpy as np

Basic Plotting¶

Basic line and scatter plots can be created using the plot function from the pylab package
Below, the first list determines the x-coordinates and the second list the y -coordinates.

In [59]:

plt.plot([1,2,3],[4,2,6])

Out[59]:

[<matplotlib.lines.Line2D at 0x107b67f98>]

No description has been provided for this image

Use the clf() function to clear a figure, or the figure() function to start a new one.
Commonly used markers are ".", "o", "x" and line types "-", "--", "-". but have a look at the help of the plot command if you need more.

In [60]:

plt.plot([1,2,3,4],[1,2,6,7], 'o');

Note the ; at the end of the plot statement above: it stops the statement from returning and so the [<matplotlib.lines.Line2D at 0xb64b0b8>] is not displayed.

In [61]:

help(plt.plot)

Help on function plot in module matplotlib.pyplot:

plot(*args, **kwargs)
    Plot lines and/or markers to the
    :class:`~matplotlib.axes.Axes`.  *args* is a variable length
    argument, allowing for multiple *x*, *y* pairs with an
    optional format string.  For example, each of the following is
    legal::
    
        plot(x, y)        # plot x and y using default line style and color
        plot(x, y, 'bo')  # plot x and y using blue circle markers
        plot(y)           # plot y using x as index array 0..N-1
        plot(y, 'r+')     # ditto, but with red plusses
    
    If *x* and/or *y* is 2-dimensional, then the corresponding columns
    will be plotted.
    
    If used with labeled data, make sure that the color spec is not
    included as an element in data, as otherwise the last case
    ``plot("v","r", data={"v":..., "r":...)``
    can be interpreted as the first case which would do ``plot(v, r)``
    using the default line style and color.
    
    If not used with labeled data (i.e., without a data argument),
    an arbitrary number of *x*, *y*, *fmt* groups can be specified, as in::
    
        a.plot(x1, y1, 'g^', x2, y2, 'g-')
    
    Return value is a list of lines that were added.
    
    By default, each line is assigned a different style specified by a
    'style cycle'.  To change this behavior, you can edit the
    axes.prop_cycle rcParam.
    
    The following format string characters are accepted to control
    the line style or marker:
    
    ================    ===============================
    character           description
    ================    ===============================
    ``'-'``             solid line style
    ``'--'``            dashed line style
    ``'-.'``            dash-dot line style
    ``':'``             dotted line style
    ``'.'``             point marker
    ``','``             pixel marker
    ``'o'``             circle marker
    ``'v'``             triangle_down marker
    ``'^'``             triangle_up marker
    ``'<'``             triangle_left marker
    ``'>'``             triangle_right marker
    ``'1'``             tri_down marker
    ``'2'``             tri_up marker
    ``'3'``             tri_left marker
    ``'4'``             tri_right marker
    ``'s'``             square marker
    ``'p'``             pentagon marker
    ``'*'``             star marker
    ``'h'``             hexagon1 marker
    ``'H'``             hexagon2 marker
    ``'+'``             plus marker
    ``'x'``             x marker
    ``'D'``             diamond marker
    ``'d'``             thin_diamond marker
    ``'|'``             vline marker
    ``'_'``             hline marker
    ================    ===============================
    
    
    The following color abbreviations are supported:
    
    ==========  ========
    character   color
    ==========  ========
    'b'         blue
    'g'         green
    'r'         red
    'c'         cyan
    'm'         magenta
    'y'         yellow
    'k'         black
    'w'         white
    ==========  ========
    
    In addition, you can specify colors in many weird and
    wonderful ways, including full names (``'green'``), hex
    strings (``'#008000'``), RGB or RGBA tuples (``(0,1,0,1)``) or
    grayscale intensities as a string (``'0.8'``).  Of these, the
    string specifications can be used in place of a ``fmt`` group,
    but the tuple forms can be used only as ``kwargs``.
    
    Line styles and colors are combined in a single format string, as in
    ``'bo'`` for blue circles.
    
    The *kwargs* can be used to set line properties (any property that has
    a ``set_*`` method).  You can use this to set a line label (for auto
    legends), linewidth, anitialising, marker face color, etc.  Here is an
    example::
    
        plot([1,2,3], [1,2,3], 'go-', label='line 1', linewidth=2)
        plot([1,2,3], [1,4,9], 'rs',  label='line 2')
        axis([0, 4, 0, 10])
        legend()
    
    If you make multiple lines with one plot command, the kwargs
    apply to all those lines, e.g.::
    
        plot(x1, y1, x2, y2, antialiased=False)
    
    Neither line will be antialiased.
    
    You do not need to use format strings, which are just
    abbreviations.  All of the line properties can be controlled
    by keyword arguments.  For example, you can set the color,
    marker, linestyle, and markercolor with::
    
        plot(x, y, color='green', linestyle='dashed', marker='o',
             markerfacecolor='blue', markersize=12).
    
    See :class:`~matplotlib.lines.Line2D` for details.
    
    The kwargs are :class:`~matplotlib.lines.Line2D` properties:
    
      agg_filter: unknown
      alpha: float (0.0 transparent through 1.0 opaque) 
      animated: [True | False] 
      antialiased or aa: [True | False] 
      axes: an :class:`~matplotlib.axes.Axes` instance 
      clip_box: a :class:`matplotlib.transforms.Bbox` instance 
      clip_on: [True | False] 
      clip_path: [ (:class:`~matplotlib.path.Path`, :class:`~matplotlib.transforms.Transform`) | :class:`~matplotlib.patches.Patch` | None ] 
      color or c: any matplotlib color 
      contains: a callable function 
      dash_capstyle: ['butt' | 'round' | 'projecting'] 
      dash_joinstyle: ['miter' | 'round' | 'bevel'] 
      dashes: sequence of on/off ink in points 
      drawstyle: ['default' | 'steps' | 'steps-pre' | 'steps-mid' | 'steps-post'] 
      figure: a :class:`matplotlib.figure.Figure` instance 
      fillstyle: ['full' | 'left' | 'right' | 'bottom' | 'top' | 'none'] 
      gid: an id string 
      label: string or anything printable with '%s' conversion. 
      linestyle or ls: ['solid' | 'dashed', 'dashdot', 'dotted' | (offset, on-off-dash-seq) | ``'-'`` | ``'--'`` | ``'-.'`` | ``':'`` | ``'None'`` | ``' '`` | ``''``]
      linewidth or lw: float value in points 
      marker: :mod:`A valid marker style <matplotlib.markers>`
      markeredgecolor or mec: any matplotlib color 
      markeredgewidth or mew: float value in points 
      markerfacecolor or mfc: any matplotlib color 
      markerfacecoloralt or mfcalt: any matplotlib color 
      markersize or ms: float 
      markevery: [None | int | length-2 tuple of int | slice | list/array of int | float | length-2 tuple of float]
      path_effects: unknown
      picker: float distance in points or callable pick function ``fn(artist, event)`` 
      pickradius: float distance in points 
      rasterized: [True | False | None] 
      sketch_params: unknown
      snap: unknown
      solid_capstyle: ['butt' | 'round' |  'projecting'] 
      solid_joinstyle: ['miter' | 'round' | 'bevel'] 
      transform: a :class:`matplotlib.transforms.Transform` instance 
      url: a url string 
      visible: [True | False] 
      xdata: 1D array 
      ydata: 1D array 
      zorder: any number 
    
    kwargs *scalex* and *scaley*, if defined, are passed on to
    :meth:`~matplotlib.axes.Axes.autoscale_view` to determine
    whether the *x* and *y* axes are autoscaled; the default is
    *True*.
    
    .. note::
        In addition to the above described arguments, this function can take a
        **data** keyword argument. If such a **data** argument is given, the
        following arguments are replaced by **data[<arg>]**:
    
        * All arguments with the following names: 'x', 'y'.

Other useful functions (see below)
Try it out!

In [62]:

plt.plot([1,2,3],[1,2,3])
plt.title("A line")
plt.xlabel("X-axis")
plt.ylabel("Y-axis");

Example: Plotting a function¶

Let’s say you want to plot the function (in the math sense) $f (x) = 2x^2 + 3$ for $x$ in the range $[−3, 3]$. How would you do it?

In [63]:

# Define a Python function that computes the function value
def f(x):
    return 2*x**2 + 3

print('f(0):', f(0))
print('f(-3):', f(-3))
print('f(3):', f(3))

plt.plot([-3, 0, 3], [f(-3), f(0), f(3)]);

f(0): 3
f(-3): 21
f(3): 21

We can use range() and list comprehensions

In [64]:

xs = range(-3, 4)
plt.plot(xs, [f(x) for x in xs],'o-');

Need smaller intervals ⇒ use linspace or arange

In [65]:

print('np.linspace\n', np.linspace(2.0, 3.0, num=5))
print('np.arange\n', np.arange(2, 3, 0.5))

np.linspace
 [ 2.    2.25  2.5   2.75  3.  ]
np.arange
 [ 2.   2.5]

In [66]:

xs_1 = np.arange(-3,3,0.1)
plt.plot(xs_1, [f(x) for x in xs_1]);

In [67]:

xs_2 = np.linspace(-3,3,100)
plt.plot(xs_2, [f(x) for x in xs_2]);

Wouldn’t it be nice if we could just write

In [68]:

plt.plot(xs_2, 2*xs_2**2 + 3);

Oh, wait, we can!
What is going on? The magic here is that arange and linspace does not return a list like range, but a numpy.ndarray.
If we try it with a list, it doesn’t work

In [69]:

xs = range(-3,4)
plt.plot(xs, 2*xs**2 + 3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-69-c9b93f34ec9e> in <module>()
      1 xs = range(-3,4)
----> 2 plt.plot(xs, 2*xs**2 + 3)

TypeError: unsupported operand type(s) for ** or pow(): 'range' and 'int'

Example: Linear Regression¶

Given pairs $\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}),\ldots, (x^{(N)}, y^{(N)})\}$, where $x^{(i)} \in \mathbb{R}^D,y^{(i)} \in \mathbb{R}$
Goal find $w$ such that $y^{(i)} \approx f(x^{(i)}) := w^T x^{(i)}$
Measure of (mean squared) error $$\mathrm{MSE} = \frac{1}{N}\sum_{i=1}^N (y^{(i)} - f(x^{(i)}))^2$$

Rewrite as matrices/vectors $$y = \begin{pmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(N)} \end{pmatrix}, \quad X = \begin{pmatrix} x^{(1)} \\ x^{(2)} \\ \vdots \\x^{(N)} \end{pmatrix}= \begin{pmatrix} x^{(1)}_1 & \cdots & x^{(1)}_{D} \\x_{1}^{(2)} & \cdots & x_{D}^{(2)} \\\vdots & \ddots & \vdots \\ x^{(N)}_{1} & \cdots &x^{(N)}_{D} \end{pmatrix}$$
Solution: $w_{ls} = X^{\dagger}y = (X^TX)^{-1}X^T y$

Simpler scalar case:¶

If each of the $x^{(i)}$ is just a scalar, i.e. it is in $\mathbb{R}$, the solutions simplifies to

$$\Rightarrow w= \frac{\sum_i x^{(i)} y^{(i)}}{\sum_i x^2} $$

Turn the math into python¶

The simplest case with scalar $x$s can be written as

In [70]:

def f(x):
    return 2 * x

# Training data (x,y) pairs)
X = np.arange(10)
y = [f(xi) for xi in X]

# plot training data
plt.plot(X,y, 'o')
plt.title("Training data")
plt.show() # causes the plot to be shown NOW
# linear regression in 1D -> SIMPLEST SCENARIO
w = ( sum([X[i]*y[i] for i in range(len(X))])/ float(sum([x**2 for x in X])) )
print("Learned w:", w)

Learned w: 2.0

How does the learned function look like?

In [71]:

plt.plot(X,y, 'o')
plt.plot(X, w*X, 'r-')
plt.title("Linear Regression")
plt.legend(["Training data", "Learned linear function"]);

The ndarray type¶

The numpy.ndarray type represents a multidimensional, homogeneous array of fixed-size items
You can think of this as a table, where each cell is of the same type and is indexed by a tuple of integer indices
Vectors (1-D) and matrices (2-D) are the most common examples, but higher-dimensional arrays are also often useful
For example, you can represent a video as a 4-D array (x, y, frame, channel)

A 1-D ndarray is similar to a list of numbers
However it makes it very easy to apply the same (mathematical) operation to all elements in the array
They also make it very easy to compute aggregates, like sums, means, products, sum-products, etc.

In [72]:

r = np.arange(1,4,1)
print('Element-wise operations')
print("r:", r)
print(3*r + 2)
print(np.sqrt(r))

print('\nOperations considering all the elements')
print(np.sum(r))
print(np.mean(r))
print(np.prod(r))

Element-wise operations
r: [1 2 3]
[ 5  8 11]
[ 1.          1.41421356  1.73205081]

Operations considering all the elements
6
2.0
6

ndarray objects have two important attributes:
The data type (dtype) of the objects in the cells
The shape of the array, describing the dimensionality and the size of the individual dimensions

In [73]:

a = np.zeros(shape = (3, 5), dtype = np.int64)
print("Array a:\n %r \n" % a)
print('Type of the variable:\n %r \n' % type(a))
print('Type of the objects in the cells:\n %r \n' % a.dtype)
print('Shape of the array:\n', a.shape)

Array a:
 array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]]) 

Type of the variable:
 <class 'numpy.ndarray'> 

Type of the objects in the cells:
 dtype('int64') 

Shape of the array:
 (3, 5)

Accessing Elements¶

Access to the individual elements is done using the familiar [ ] syntax
IMPORTANT: The first element is the the one in position 0, not 1 (like Matlab)

In [74]:

a = np.zeros(shape = (3, 5), dtype = np.int64)
print('Element in position [0,0]:', a[0,0])
print(a)

Element in position [0,0]: 0
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]

In [75]:

a[0,0] = 1
a[2,3] = 5
print(a)

[[1 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 5 0]]

Creating arrays¶

Arrays can be created in various ways:
from (nested) list objects (using array) by creating an empty array and then assigning values (using empty/zeros)
using built-in functions for special arrays (ones, eye, rand, randn, . . . )

In [76]:

a = np.array([3, 2, 1, 2])
print(a)

[3 2 1 2]

In [77]:

b = np.array([2.0, 4.1, 3.14])
print(b)

[ 2.    4.1   3.14]

In [78]:

c = np.array([[3, 1, 2], [2, 3, 4]])
print(c)

[[3 1 2]
 [2 3 4]]

Note that numpy tries to guess the data type:

In [79]:

print(a.dtype)
print(b.dtype)

int64
float64

We can force a particular data type by providing a dtype

In [80]:

a = np.array([3,2,1,2], dtype=np.float64)
print(a.dtype)
print(a)

float64
[ 3.  2.  1.  2.]

In [81]:

b = np.array([3,2,1,2], dtype=np.uint8)
print(b.dtype)
print(b)

uint8
[3 2 1 2]

CAREFUL ABOUT OVER/UNDERFLOWS!

In [82]:

b[0] = 256
print(b)

[0 2 1 2]

Data types¶

Numpy provides a range of data types
floating point data types: float32, float64
integer data types: int64, int32, . . . , uint8
object data type: object – any Python object
Unless you are sure you need something else, use float64. This is the default data type in numpy.
Exceptions to this rule:
use int32, int64 when you need to store (large) integers (e.g. counts)
use objects when you need to store other Python objects (dicts, lists, etc.)

Creating special arrays¶

Numpy provides various functions for creating useful special arrays. The most common ones are:
zeros – create an array filled with zeros
ones – create an array filled with ones
empty – create an array, but don’t initialize it’s elements.
arange – similar to range
eye – identity matrix of a given size
random.rand, random.randn – random arrays (uniform, normal)

In [83]:

np.zeros((2,3))

Out[83]:

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [84]:

np.ones((2,3))

Out[84]:

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [85]:

np.empty((2,3))

Out[85]:

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [86]:

np.arange(5,10)

Out[86]:

array([5, 6, 7, 8, 9])

In [87]:

np.eye(2)

Out[87]:

array([[ 1.,  0.],
       [ 0.,  1.]])

In [88]:

np.random.rand(2,3)

Out[88]:

array([[ 0.95800881,  0.27233108,  0.54525544],
       [ 0.51234556,  0.17209455,  0.03634038]])

In [89]:

np.random.randn(2,3)

Out[89]:

array([[-0.50142349, -1.42633263, -0.39798825],
       [ 1.54177749,  0.01696024,  0.06331506]])

In [90]:

plt.hist(np.random.randn(10000));

In [91]:

plt.hist(np.random.rand(100000));

Dimensionality¶

Numpy arrays support the slice notation familiar from strings and lists
Can be used to index contiguous sub-arrays
(Very similar to the notation used in Matlab)

In [92]:

a = np.eye(3)
print(a)

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

In [93]:

a[0,:]

Out[93]:

array([ 1.,  0.,  0.])

which has exactly the same effect of writing

In [94]:

a[0]

Out[94]:

array([ 1.,  0.,  0.])

To get a set of elements
from position $i$
to position $j-1$

In [95]:

a[0:2,0:2]

Out[95]:

array([[ 1.,  0.],
       [ 0.,  1.]])

In [96]:

a[1:2,0:2]

Out[96]:

array([[ 0.,  1.]])

In [97]:

a[0:1,:]

Out[97]:

array([[ 1.,  0.,  0.]])

We can also assign new values to a subarray

In [98]:

print(a)

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

In [99]:

a[0:2,0:2] = 5
print(a)

[[ 5.  5.  0.]
 [ 5.  5.  0.]
 [ 0.  0.  1.]]

In [100]:

a[1:3,1:3] = np.ones((2, 2))
print(a)

[[ 5.  5.  0.]
 [ 5.  1.  1.]
 [ 0.  1.  1.]]

Accessing Subarrays – Boolean indexing¶

Comparison operatorions on arrays return boolean arrays of the same size

In [101]:

a = np.eye(2)

In [102]:

a == 1

Out[102]:

array([[ True, False],
       [False,  True]], dtype=bool)

In [103]:

a > 0

Out[103]:

array([[ True, False],
       [False,  True]], dtype=bool)

In [104]:

a <= 0

Out[104]:

array([[False,  True],
       [ True, False]], dtype=bool)

These boolean arrays can then be used for indexing the original array

In [105]:

a = np.eye(2)
print(a)

[[ 1.  0.]
 [ 0.  1.]]

In [106]:

a[a == 1]

Out[106]:

array([ 1.,  1.])

In [107]:

a[a == 1] = 2
print(a)

[[ 2.  0.]
 [ 0.  2.]]

Elementwise operations¶

Numpy contains many standard mathematical functions that operate elementwise on arrays
These are called universal functions (ufuncs)
Included are things like add, multiply, log, exp, sin, . . .
See http://docs.scipy.org/doc/numpy/reference/ufuncs.html
These are very fast (in constrast to doing the same by loops)

In [108]:

a = np.ones((2, 2))*2
print(a)

[[ 2.  2.]
 [ 2.  2.]]

In [109]:

np.add(a, a)

Out[109]:

array([[ 4.,  4.],
       [ 4.,  4.]])

In [110]:

a * a

Out[110]:

array([[ 4.,  4.],
       [ 4.,  4.]])

In [111]:

np.log(a)

Out[111]:

array([[ 0.69314718,  0.69314718],
       [ 0.69314718,  0.69314718]])

DON'T DO THIS, it is very slow (for large arrays):

In [112]:

result = np.empty(a.shape)
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        result[i,j] = a[i,j]*2
        
print(result)

[[ 4.  4.]
 [ 4.  4.]]

Broadcasting¶

Broadcasting is how numpy handles operations between arrays of different, but compatible shapes
For example, you might want to add a column vector to all columns of a matrix

In [113]:

a = np.ones((2, 2))
b = 2 * np.ones((2, 1))
print("a:")
print(a)
print("b")
print(b)
print("a+b")
print(a + b)

a:
[[ 1.  1.]
 [ 1.  1.]]
b
[[ 2.]
 [ 2.]]
a+b
[[ 3.  3.]
 [ 3.  3.]]

In [114]:

a = np.ones((3, 2))
b = np.array([4, 5])
print("a:", a.shape)
print(a)
print("b:", b.shape)
print(b)

a: (3, 2)
[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]
b: (2,)
[4 5]

In [115]:

print(a + b)

[[ 5.  6.]
 [ 5.  6.]
 [ 5.  6.]]

In [116]:

print(a * b)

[[ 4.  5.]
 [ 4.  5.]
 [ 4.  5.]]

Reductions¶

One of the most common operation one needs to do to an array is to sum its values along one of it’s dimensions.
Again, this is much faster than looping over the array by hand.

In [117]:

a = np.array([[1, 2], [3, 4]])
print(a)

[[1 2]
 [3 4]]

In [118]:

np.sum(a)

Out[118]:

In [119]:

np.sum(a, axis = 0)

Out[119]:

array([4, 6])

In [120]:

np.sum(a, 1)

Out[120]:

array([3, 7])

There are other reductions the behave the same way as sum, for example

In [121]:

print(a)

[[1 2]
 [3 4]]

In [122]:

np.prod(a)

Out[122]:

In [123]:

np.mean(a)

Out[123]:

2.5

In [124]:

np.mean(a, 0)

Out[124]:

array([ 2.,  3.])

In [125]:

np.cumsum(a)

Out[125]:

array([ 1,  3,  6, 10])

See http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#calculation

Vector/Matrix operations¶

Many algorithms in scientific computing can be cast in a form that makes use of only a few linear algebra primitves.
Probably the most common such primitives are the dot-product between vectors, matrix-vector products, as well as matrix-matrix products.
The product between a matrix $A$ of size $M \times N$ and a matrix $B$ of size $N \times K $, written as $\langle A, B \rangle$ or just $AB$ is defined as $$(AB)_{ij} :=\sum_k A_{ik} B_{kj}$$

Numpy uses a single functions for all these operations: dot, both for arrays and matrices:

In [126]:

a = np.ones(2) * 3
print("a")
print(a)

a
[ 3.  3.]

In [127]:

b = np.array([[1, 2]]).T
print(b)

[[1]
 [2]]

In [128]:

A = np.random.rand(2,2)
print(A)

[[ 0.07784416  0.39286506]
 [ 0.71671045  0.5990694 ]]

In [129]:

np.dot(a,A)

Out[129]:

array([ 2.38366385,  2.97580337])

In [130]:

np.dot(A,b)

Out[130]:

array([[ 0.86357428],
       [ 1.91484926]])

In [131]:

np.dot(A,A)

Out[131]:

array([[ 0.28763021,  0.26593569],
       [ 0.48515103,  0.64045464]])

Regression revisited¶

Numpy makes your life much easier

In [132]:

# Training data (x,y) pairs)
X = np.array(range(10))
y = 2 * X

# linear regression in 1D
w = X.T.dot(y) / X.T.dot(X)
print("Learned w:", w)

Learned w: 2.0

Linear Regression on noisy data¶

Use randn to add (Gaussian) noise $\mathcal{N}(0,1)$ to data
Real Life data almost always contains noise!
The normal distribution often is a good approximation to that, cf https://en.wikipedia.org/wiki/Gaussian_noise

In [133]:

X = np.array(range(10))
y = 2 * X + np.random.randn(10)
w = X.T.dot(y) / X.T.dot(X)

plt.plot(X,y, 'o')
plt.plot(X, w * X, 'r-')
plt.title("Linear Regression on noisy data")
plt.legend(["Noisy training data", "Learned linear function"])

Out[133]:

<matplotlib.legend.Legend at 0x1086e4208>

Compare with the old code
The numpy one is not only easier to read but also much (!) faster
Worth diving into numpy documentation and built-in functions

In [134]:

w1 = X.T.dot(y) / X.T.dot(X)
w2 = ( sum([X[i]*y[i] for i in range(len(X))])/ float(sum([x**2 for x in X])) )
print(w1, w2)

2.02851564299 2.02851564299

Scipy Overview¶

SciPy is large collection of (sub-)packages that contains a variety of functions that are useful for scientific computing
Impossible to cover all, check out the documentation and examples at http://wiki.scipy.org/Cookbook
In particular, there are functions for
- Special functions (scipy.special)
- Integration (scipy.integrate)
- Optimization (scipy.optimize)
- Interpolation (scipy.interpolate)
- Fourier Transforms (scipy.fftpack)
- Signal Processing (scipy.signal)
- Linear Algebra (scipy.linalg)
- Statistics (scipy.stats)
- Multi-dimensional image processing (scipy.ndimage)

Examples¶

Least Square Problem¶

Trying to solve something similar to what we have done before

In [135]:

import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt

c1, c2 = 5.0, 2.0
noise_p = 0.05
xi = 0.1*np.arange(1,11)
yi = c1*np.exp(-xi) + c2*xi
# Adding noise to the original points
zi = yi + noise_p * np.max(yi) * np.random.randn(len(yi))

In [136]:

# Matrix containing the input data
A = np.hstack([np.exp(-xi)[:, np.newaxis], xi[:,np.newaxis]])

# Solving the problem
c, resid, rank, sigma = linalg.lstsq(A, zi)

In [137]:

# See how the learned model behaves
xi2 = np.r_[0.1:1.0:100j]
yi2 = c[0]*np.exp(-xi2) + c[1]*xi2

plt.plot(xi, zi, 'x', xi, yi, '--', xi2, yi2)
plt.axis([0, 1.1, 3.0, 5.5])
plt.xlabel('$x_i$')
plt.legend(["Noisy training data", "Training data", "Learned function"])
plt.title('Data fitting with linalg.lstsq')
plt.show()