AstroPython Blog: arrays

Showing posts with label arrays. Show all posts

Thursday, October 16, 2014

Parallel computing with IPython and Python

I uploaded to github a quick tutorial on how to parallelize easy computing tasks. I have chosen embarrassingly parallel examples which illustrate some of the powerful features of IPython.parallel and the multiprocessing module.

Examples included:

Parallel function mapping to a list of arguments (multiprocessing module)
Parallel execution of array function (scatter/gather) + parallel execution of scripts
Easy parallel Monte Carlo (parallel magics)

Parallel computing with Python.

Wednesday, August 27, 2014

Distributed arrays for parallel applications

I came across recently a very promising module: DistArray. The idea behind DistArray is to

provide general multidimensional NumPy-like distributed arrays to Python. It intends to bring the strengths of NumPy to data-parallel high-performance computing.

Neat!

Some examples for easily creating distributed arrays are given in this IPython notebook.

Unfortunately I could not test DistArray so far because I am getting weird errors in my system, probably related to installation issues with my MPI installation.

Wednesday, May 8, 2013

Comparing execution speed of Python, NumPy, Matlab, Fortran, etc

This is an interesting comparison of the numerical performance of python, numpy, fortran, java and matlab in performing some array operations.

The post is outdated (it is from 2009). Still, it demonstrates the importance of vectorizing array operations. It also shows that lower level languages like fortran will almost always have the upper hand in execution speed (but development time can easily balance that in python).

It would be worth repeating this comparison with the newer versions of numpy, python, matlab and fortran compilers to see what has changed in terms of relative speedups among the different languages. It would also be nice to include a "parallel" case of these examples using some multiprocessing module in python in a multicore machine (e.g. multiprocessing, iPython parallel).

Friday, February 24, 2012

Plots with several histograms

Creating a plot with two histograms

Here is a method that you can use to plot two histograms in the same figure sharing the same X-axis, keeping some distance between the histograms:

 def twohists(x1,x2,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',xlabel='',sharey=False):  
      """  
 Script that plots two histograms of quantities x1 and x2.   
   
 Arguments:  
 - x1,x2: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg: legends for each histogram       
 - xlabel: self-explanatory.  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012: Added sharey argument.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(2,1,1)  
      if sharey==True:  
           b=fig.add_subplot(2,1,2, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(2,1,2, sharex=a)  
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
   
      b.set_xlabel(xlabel)  
      b.set_ylabel('Number',verticalalignment='bottom')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()

... and here is a example script that uses the method above:

 """  
 Illustrates how to use the twohists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 nemmen.twohists(x1,x2,-3,3,'Normal','Uniform')  
   
 pylab.show()

... to create the following plot:

Creating a plot with three histograms

I also wrote a recipe that makes a plot with three histograms:

 def threehists(x1,x2,x3,xmin,xmax,x1leg='$x_1$',x2leg='$x_2$',x3leg='$x_3$',xlabel='',sharey=False):  
      """  
 Script that plots three histograms of quantities x1, x2 and x3.   
   
 Arguments:  
 - x1,x2,x3: arrays with data to be plotted  
 - xmin,xmax: lower and upper range of plotted values, will be used to set a consistent x-range  
      for both histograms.  
 - x1leg, x2leg, x3leg: legends for each histogram       
 - xlabel: self-explanatory.  
 - sharey: sharing the Y-axis among the histograms?  
   
 Example:  
 x1=Lbol(AD), x2=Lbol(JD), x3=Lbol(EHF10)  
 >>> threehists(x1,x2,x3,38,44,'AD','JD','EHF10','$\log L_{\\rm bol}$ (erg s$^{-1}$)',sharey=True)  
   
 Inspired by http://www.scipy.org/Cookbook/Matplotlib/Multiple_Subplots_with_One_Axis_Label.  
   
 Rodrigo Nemmen  
 v1 Dec. 2011  
 v1.1 Feb. 2012:     Added sharey keyword.  
      """  
   
      pylab.clf()  
      pylab.rcParams.update({'font.size': 15})  
      fig=pylab.figure()  
        
      a=fig.add_subplot(3,1,1)  
      if sharey==True:  
           b=fig.add_subplot(3,1,2, sharex=a, sharey=a)  
           c=fig.add_subplot(3,1,3, sharex=a, sharey=a)  
      else:  
           b=fig.add_subplot(3,1,2, sharex=a)  
           c=fig.add_subplot(3,1,3, sharex=a)            
        
      a.hist(x1,label=x1leg,color='b')  
      a.legend(loc='best',frameon=False)  
      a.set_xlim(xmin,xmax)  
        
      b.hist(x2,label=x2leg,color='r')  
      b.legend(loc='best',frameon=False)  
   
      c.hist(x3,label=x3leg,color='y')  
      c.legend(loc='best',frameon=False)  
        
      pylab.setp(a.get_xticklabels(), visible=False)  
      pylab.setp(b.get_xticklabels(), visible=False)  
   
      c.set_xlabel(xlabel)  
      b.set_ylabel('Number')  
      pylab.minorticks_on()  
      pylab.subplots_adjust(hspace=0.15)  
      pylab.draw()

... and as before, here is a script that illustrates how to use the above method:

 """  
 Illustrates how to use the threehists method.  
 """  
 import nemmen  
 import scipy, pylab  
   
 # Generates a normal distribution  
 x1=scipy.random.standard_normal(100)  
   
 # Generates a uniform random distribution  
 x2=scipy.random.uniform(-3,3,100)  
   
 x3=scipy.random.standard_normal(1000)  
   
 nemmen.threehists(x1,x2,x3,-3,3,'Normal ($n=100$)','Uniform','Normal ($n=1000$)')  
   
 pylab.show()

... creating this plot:

Friday, December 16, 2011

Script illustrating how to do a linear regression and plot the confidence band of the fit

The script below illustrates how to carry out a simple linear regression of data stored in an ASCII file, plot the linear fit and the 2 sigma confidence band.

The script invokes the confband method described here to plot the confidence bands. The confband is assumed to be lying inside the "nemmen" module (not yet publicly available, sorry) but you can place it in any module you want.

I got the test data here to perform the fitting.

After running the script you should get the following plot:

where the best-fit line is displayed in green and the shaded area is in gray.

The script below is also available at Github Gist.

   
 import numpy, pylab, scipy  
 import nemmen  
   
 # Data taken from http://orion.math.iastate.edu/burkardt/data/regression/x01.txt.  
 # I removed the header from the file and left only the data in 'testdata.dat'.  
 xdata,ydata = numpy.loadtxt('testdata.dat',unpack=True,usecols=(1,2))  
 xdata=numpy.log10(xdata) # take the logs  
 ydata=numpy.log10(ydata)  
   
 # Linear fit  
 a, b, r, p, err = scipy.stats.linregress(xdata,ydata)  
   
 # Generates arrays with the fit  
 x=numpy.linspace(xdata.min(),xdata.max(),100)  
 y=a*x+b  
   
 # Calculates the 2 sigma confidence band contours for the fit  
 lcb,ucb,xcb=nemmen.confband(xdata,ydata,a,b,conf=0.95)  
   
 # Plots the fit and data  
 pylab.clf()  
 pylab.plot(xdata,ydata,'o')  
 pylab.plot(x,y)  
   
 # Plots the confidence band as shaded area  
 pylab.fill_between(xcb, lcb, ucb, alpha=0.3, facecolor='gray')  
   
 pylab.show()  
 pylab.draw()

Wednesday, December 14, 2011

Quick Python reference documents

These quick reference sheets will help you to quickly learn and practice data analysis, plotting and statistics with Python:

Python data analysis reference card (created by Fabricio Ferrari)

Python commands for people with previous IDL/Matlab experience

IPython quick reference card

Generating random numbers from an arbitrary probability distribution using the rejection method

I am pretty used to generating random numbers from a normal distribution. There are built-in methods to carry out that task in Python and many other data analysis softwares.

But what if you need to produce random numbers based on a non-standard probability distribution? In my case that challenge appeared and I needed to produce random numbers from a power-law (?!) probability density function. Astronomy can be pretty interesting.

The method below solves that problem. It is not very efficient since I did not try to optimize it. But it gets the job done.

Let me know if you have a more clever way of solving this problem in Python or if you use the method and find bugs.

 def randomvariate(pdf,n=1000,xmin=0,xmax=1):  
  """  
 Rejection method for random number generation  
 ===============================================  
 Uses the rejection method for generating random numbers derived from an arbitrary   
 probability distribution. For reference, see Bevington's book, page 84. Based on  
 rejection*.py.  
   
 Usage:  
 >>> randomvariate(P,N,xmin,xmax)  
  where  
  P : probability distribution function from which you want to generate random numbers  
  N : desired number of random values  
  xmin,xmax : range of random numbers desired  
    
 Returns:   
  the sequence (ran,ntrials) where  
   ran : array of shape N with the random variates that follow the input P  
   ntrials : number of trials the code needed to achieve N  
   
 Here is the algorithm:  
 - generate x' in the desired range  
 - generate y' between Pmin and Pmax (Pmax is the maximal value of your pdf)  
 - if y'<P(x') accept x', otherwise reject  
 - repeat until desired number is achieved  
   
 Rodrigo Nemmen  
 Nov. 2011  
  """  
  # Calculates the minimal and maximum values of the PDF in the desired  
  # interval. The rejection method needs these values in order to work  
  # properly.  
  x=numpy.linspace(xmin,xmax,1000)  
  y=pdf(x)  
  pmin=y.min()  
  pmax=y.max()  
   
  # Counters  
  naccept=0  
  ntrial=0  
   
  # Keeps generating numbers until we achieve the desired n  
  ran=[] # output list of random numbers  
  while naccept<n:  
  x=numpy.random.uniform(xmin,xmax) # x'  
  y=numpy.random.uniform(pmin,pmax) # y'  
   
  if y<pdf(x):  
   ran.append(x)  
   naccept=naccept+1  
  ntrial=ntrial+1  
    
  ran=numpy.asarray(ran)  
    
  return ran,ntrial

Update Oct. 23rd 2014: This code snippet is available (and updated) at github. Correction: pmin=0 (thanks PZ!)

Computing the scatter of data around linear fits

The Python method below computes the scatter of data around a given linear model.

Let me know if you find any bugs.

   
 def scatterfit(x,y,a=None,b=None):  
  """  
 Compute the mean deviation of the data about the linear model given if A,B  
 (y=ax+b) provided as arguments. Otherwise, compute the mean deviation about   
 the best-fit line.  
   
 x,y assumed to be Numpy arrays. a,b scalars.  
 Returns the float sd with the mean deviation.  
   
 Author: Rodrigo Nemmen  
  """  
   
  if a==None:   
  # Performs linear regression  
  a, b, r, p, err = scipy.stats.linregress(x,y)  
    
  # Std. deviation of an individual measurement (Bevington, eq. 6.15)  
  N=numpy.size(x)  
  sd=1./(N-2.)* numpy.sum((y-a*x-b)**2); sd=numpy.sqrt(sd)  
    
  return sd

Tuesday, August 30, 2011

Python code to create an ASCII file from arrays

Suppose you have the Numpy arrays X, Y and Z - which can contain strings, floats, integers etc - and you want to create an ASCII file in which each column corresponds to the elements of these arrays. The code snippet below allows you to do that.

In the example below, tmp.dat corresponds to the output file and X (strings), Y (floats) and Z (floats) are the arrays you want to export.

 file=open('tmp.dat','w') #create file  
 for x,y,z in zip(X,Y,Z):  
      file.write('%s\t%f\t%f\n'%(x,y,z))     #assume you separate columns by tabs  
 file.close()     #close file

Be sure to modify the %s and %f if you change the array data types.

Thanks to Tyrel Johnson for the trick.