AstroPython Blog: fit

Showing posts with label fit. Show all posts

Friday, February 3, 2012

Computing the chi-squared and reduced chi-squared of a model

Here are two codes for computing the chi-squared of a model compared to some data. Very useful when judging the goodness-of-fit of a model.

Source code for method that returns the chi-squared:

 def chisqg(ydata,ymod,sd=None):  
      """  
 Returns the chi-square error statistic as the sum of squared errors between  
 Ydata(i) and Ymodel(i). If individual standard deviations (array sd) are supplied,   
 then the chi-square error statistic is computed as the sum of squared errors  
 divided by the standard deviations.     Inspired on the IDL procedure linfit.pro.  
 See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 x,y,sd assumed to be Numpy arrays. a,b scalars.  
 Returns the float chisq with the chi-square statistic.  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
      """  
      # Chi-square statistic (Bevington, eq. 6.9)  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
        
      return chisq

Source code for method that returns the reduced chi-squared of a model. You need to provide the number of free parameters of the model as input to the method.

 def redchisqg(ydata,ymod,deg=2,sd=None):  
      """  
 Returns the reduced chi-square error statistic for an arbitrary model,   
 chisq/nu, where nu is the number of degrees of freedom. If individual   
 standard deviations (array sd) are supplied, then the chi-square error   
 statistic is computed as the sum of squared errors divided by the standard   
 deviations. See http://en.wikipedia.org/wiki/Goodness_of_fit for reference.  
   
 ydata,ymod,sd assumed to be Numpy arrays. deg integer.  
   
 Usage:  
 >>> chisq=redchisqg(ydata,ymod,n,sd)  
 where  
  ydata : data  
  ymod : model evaluated at the same x points as ydata  
  n : number of free parameters in the model  
  sd : uncertainties in ydata  
   
 Rodrigo Nemmen  
 http://goo.gl/8S1Oo  
       """  
      # Chi-square statistic  
      if sd==None:  
           chisq=numpy.sum((ydata-ymod)**2)  
      else:  
           chisq=numpy.sum( ((ydata-ymod)/sd)**2 )  
             
      # Number of degrees of freedom assuming 2 free parameters  
      nu=ydata.size-1-deg  
        
      return chisq/nu

Friday, January 13, 2012

Patched BCES fit code in Fortran

I have been using the BCES fitting method of Akritas & Bershady (1996) to perform linear regression on data that has uncertainties on both X and Y (with bootstrapping). Unfortunately, the code made available by Akritas et al. has a few bugs which prevented me from getting it compiled with gfortran on my Mac.

I corrected those bugs in the fortran code and now I am able to compile and run the BCES routine without problems.

If you are interested in the patched BCES fortran routine, let me know and I will be glad to share it with you.

Friday, December 16, 2011

Script illustrating how to do a linear regression and plot the confidence band of the fit

The script below illustrates how to carry out a simple linear regression of data stored in an ASCII file, plot the linear fit and the 2 sigma confidence band.

The script invokes the confband method described here to plot the confidence bands. The confband is assumed to be lying inside the "nemmen" module (not yet publicly available, sorry) but you can place it in any module you want.

I got the test data here to perform the fitting.

After running the script you should get the following plot:

where the best-fit line is displayed in green and the shaded area is in gray.

The script below is also available at Github Gist.

   
 import numpy, pylab, scipy  
 import nemmen  
   
 # Data taken from http://orion.math.iastate.edu/burkardt/data/regression/x01.txt.  
 # I removed the header from the file and left only the data in 'testdata.dat'.  
 xdata,ydata = numpy.loadtxt('testdata.dat',unpack=True,usecols=(1,2))  
 xdata=numpy.log10(xdata) # take the logs  
 ydata=numpy.log10(ydata)  
   
 # Linear fit  
 a, b, r, p, err = scipy.stats.linregress(xdata,ydata)  
   
 # Generates arrays with the fit  
 x=numpy.linspace(xdata.min(),xdata.max(),100)  
 y=a*x+b  
   
 # Calculates the 2 sigma confidence band contours for the fit  
 lcb,ucb,xcb=nemmen.confband(xdata,ydata,a,b,conf=0.95)  
   
 # Plots the fit and data  
 pylab.clf()  
 pylab.plot(xdata,ydata,'o')  
 pylab.plot(x,y)  
   
 # Plots the confidence band as shaded area  
 pylab.fill_between(xcb, lcb, ucb, alpha=0.3, facecolor='gray')  
   
 pylab.show()  
 pylab.draw()

Wednesday, December 14, 2011

Calculating the prediction band of a linear regression model

The method below calculates the prediction band of an arbitrary linear regression model at a given confidence level in Python.

If you use it, let me know if you find any bugs.

Note: the scatterfit method called below is available here.

def predband(xd,yd,a,b,conf=0.95,x=None):

    """

Calculates the prediction band of the linear regression model at the desired confidence

level. 

Clarification of the difference between confidence and prediction bands:

"The 2sigma confidence interval is 95% sure to contain the best-fit regression line. 

This is not the same as saying it will contain 95% of the data points. The prediction bands are

further from the best-fit line than the confidence bands, a lot further if you have many data 

points. The 95% prediction interval is the area in which you expect 95% of all data points to fall."

(from http://graphpad.com/curvefit/linear_regression.htm)

Arguments:

- conf: desired confidence level, by default 0.95 (2 sigma)

- xd,yd: data arrays

- a,b: linear fit parameters as in y=ax+b

- x: (optional) array with x values to calculate the confidence band. If none is provided, will

  by default generate 100 points in the original x-range of the data.

Usage:

>>> lpb,upb,x=nemmen.predband(all.kp,all.lg,a,b,conf=0.95)

calculates the prediction bands for the given input arrays

>>> pylab.fill_between(x, lpb, upb, alpha=0.3, facecolor='gray')

plots a shaded area containing the prediction band  

Returns:

Sequence (lpb,upb,x) with the arrays holding the lower and upper confidence bands 

corresponding to the [input] x array.

References:

1. http://www.JerryDallal.com/LHSP/slr.htm, Introduction to Simple Linear Regression, Gerard 

E. Dallal, Ph.D.

Rodrigo Nemmen

v1 Dec. 2011

v2 Jun. 2012: corrected bug in dy.

    """

    alpha=1.-conf   # significance

    n=xd.size   # data sample size

    if x==None: x=numpy.linspace(xd.min(),xd.max(),100)

    # Predicted values (best-fit model)

    y=a*x+b

    # Auxiliary definitions

    sd=scatterfit(xd,yd,a,b)    # Scatter of data about the model

    sxd=numpy.sum((xd-xd.mean())**2)

    sx=(x-xd.mean())**2 # array

    # Quantile of Student's t distribution for p=1-alpha/2

    q=scipy.stats.t.ppf(1.-alpha/2.,n-2)

    # Prediction band

    dy=q*sd*numpy.sqrt( 1.+1./n + sx/sxd )

    upb=y+dy    # Upper prediction band

    lpb=y-dy    # Lower prediction band

    return lpb,upb,x

Computing the scatter of data around linear fits

The Python method below computes the scatter of data around a given linear model.

Let me know if you find any bugs.

   
 def scatterfit(x,y,a=None,b=None):  
  """  
 Compute the mean deviation of the data about the linear model given if A,B  
 (y=ax+b) provided as arguments. Otherwise, compute the mean deviation about   
 the best-fit line.  
   
 x,y assumed to be Numpy arrays. a,b scalars.  
 Returns the float sd with the mean deviation.  
   
 Author: Rodrigo Nemmen  
  """  
   
  if a==None:   
  # Performs linear regression  
  a, b, r, p, err = scipy.stats.linregress(x,y)  
    
  # Std. deviation of an individual measurement (Bevington, eq. 6.15)  
  N=numpy.size(x)  
  sd=1./(N-2.)* numpy.sum((y-a*x-b)**2); sd=numpy.sqrt(sd)  
    
  return sd

Calculating and plotting confidence bands for linear regression models

This method calculates the confidence band of an arbitrary linear regression model at a given confidence level in Python. Using this method you can get plots like this:

where the shaded area corresponds to the 2sigma confidence band of the linear fit shown in green. A script illustrating how to use the confband method below is available.

If you use it, let me know if you find any bugs.

Note: the scatterfit method called below is available here.