Saturday, September 13, 2014

Simple txt2csv python script

Often I have a text files with columns that are separated by spaces or tabs or a mixture of them. These files can be read by a typical spreadsheet program but this usually requires some extra mouse clicks to tell the spreadsheet program how to interpret the text file. Csv files (comma separated files) are often much better recognized by a spreadsheet program. Therefore I have written a simple python function/script to change a text file to csv file where spaces and tabs are replaced by a separator (default set to semicolon). Multiple spaces are combined into a single separator (which is what I typically want). Leading and trailing spaces on a row are removed. It is amazing how little python code is needed for this (thanks to python's nice string handling). The magic python line csvline=s.join(line.split()) in the script below is based on a very useful item on Stack Overflow.


import sys

def txt2csv(fnIn,fnOut,separator=';'):
    fin=open(fnIn,'r')
    if(not fin):
        print "Error opening %s"%(fnIn)
        sys.exit(-1)
    fout=open(fnOut,'w')
    for line in fin.readlines():
        s=separator
        csvline=s.join(line.split())
        fout.write(csvline+'\n')
    fin.close()
    fout.close()

if __name__== "__main__":
    try:
        fnIn=sys.argv[1]
        fnOut=sys.argv[2]
    except:
        print "Usage: python %s fnIn fnOut"%(sys.argv[0])
        
    txt2csv(fnIn,fnOut)

Monday, April 7, 2014

guiqwt string based function plotter

For almost all plotting from python I use matplotlib, but recently I noticed an interesting alternative called guiqwt. Guiqwt seems to be faster and more tuned towards using plots in a UI environment. In the youtube movie Introduction to Wolfram Language a nice demo is shown where a gui is assigned automatically to a function with sliders for the function parameters so the user can easily explore the function behavior. If you often work with empirical functions to reproduce certain experimentally observed trends it can be very nice to have such a quick visualization of the influence of the chosen function parameters. This gave me the idea to try to make a generic function plotter with python/guiqwt where the user can supply the function by typing a string from which the parameters are then automatically recognized and assigned a slider so the user can explore the function behavior. This turned out to be a little more involved than anticipated but the result is acceptable:
To make life a little easier the user has to write function parameters (and independent variables) starting with an underscore. This saves the effort of trying to recognize all possible mathematical functions from the string (e.g. no searching from tan, sin, exp, sqrt etc. required) which would have been nicer of course. All in all the code is about 400 lines (see below), a bit more than expected. But it was a nice exercise to get familiar with PyQt and guiqwt.

from guidata.qt.QtGui import QWidget, QVBoxLayout, QHBoxLayout,QPalette
from guidata.qt.QtGui import QPushButton,QSlider,QGridLayout,QLabel,QFont
from guidata.qt.QtGui import QCheckBox,QLineEdit,QComboBox,QSizePolicy
from guidata.qt.QtCore import SIGNAL
from guidata.qt.QtGui import QApplication
from guidata.qt.QtCore import Qt
from guiqwt.plot import CurveWidget
from guiqwt.builder import make
from guidata.configtools import get_icon

import numpy
from numpy import *
from functools import partial


class TFunctionParameter:
    """
    This class defines the properties of a function parameter.
    The properties are:
    -Name
    -Value
    -Minimum allowed allowed parameter value
    -Maximum allowed allowed parameter value
    """
    def __init__(self,name,value,minValue=None,maxValue=None,scientific=False):
        """
        Constructor for the TFunctionParameter class
        Arguments:
        name: name of the parameter (this is the string you use in the function definition!)
        value: (initial) value of the parameter
        minValue: if provided the minimum allowed value of this parameter. Otherwise default 0.1*value
        maxValue: if provided the maximum allowed value of this parameter. Otherwise default 10.0*value
        """
        self._name=name
        self._v=value
        self._scientific=scientific
        if(minValue):
            self._minV=minValue
        else:
            self._minV=0.1*self._v
        if(maxValue):
            self._maxV=maxValue
        else:
            self._maxV=10.0*self._v
            
    def Scientific(self):
        return(self._scientific)    
    
    def v(self):
        return(self._v)
    
    def setValue(self,v):
        if(v<self._minV):
            self._v=self._minV
        elif(v>self._maxV):
            self._v=self._maxV
        else:
            self._v=v
    
    def getValue(self):
        return(self._v)
        
    def minValue(self):
        return(self._minV)
        
    def maxValue(self):
        return(self._maxV)
        
    def name(self):
        return(self._name)
        
    def checkV(self):
        """
        Checks if parameter value still between min-max. If not
        value is adjusted to fall in allowed range.
        """
        if(self._v>self._minV):
            self._v=self._maxV
        elif(self._v<self._minV):
            self._v=self._minV
        
    def setMinValue(self,minV):
        """
        Adjust the minimum allowed value for this parameter. The parameter
        value is adjusted to fall in the new range. This function returns
        the current (possibly adjusted) parameter value.
        The new min value is not allowed to be bigger than the current
        maximum allowed value.
        """
        if(minV>self._maxV):
            self._minV=self._maxV
        else:
            self._minV=minV
        self.checkV()
        return(self._v)
        
    def setMaxValue(self,maxV):
        """
        Adjust the maximum allowed value for this parameter. The parameter
        value is adjusted to fall in the new range. This function returns
        the current (possibly adjusted) parameter value.
        The new max value is not allowed to be smaller than the current
        minium allowed value
        """
        if(maxV<self._minV):
            self._maxV=self._minV
        else:
            self._maxV=maxV
        self.checkV()
        return(self._v)
        
    def setMinAndMAxValues(self,minV,maxV):
        """
        Adjust the minimum and maximum allowed values simulataneously.
        Use this function to specify a completely new range for this
        parameter
        
        This function returns
        the current (possibly adjusted) parameter value.
        """
        self._minV=minV
        self._maxV=maxV
        self.checkV()
        return(self._v)
    
class TFunctionUI(QWidget):
    """
    The idea of this class is to provide an automatic user interface
    to adjust all function parameter values and see the effect on the
    function behaviour.
    """
    def __init__(self,parent):
        """Constructor of the TFunctionUI class"""
        self._parameterDict={}
        QWidget.__init__(self, parent)
        self.setMinimumSize(520, 400)
        self.x = [0,1]
        self.xvec=[0,1]
        self.y = [0,1]
        self.func = self.calc
        self.title="f"
        self.autoScale=True
        #---guiqwt curve item attribute:
        self.curve_item = None
        #---
        
    def calc(self):
        """
        Based on function string calculate function
        """
        xpar=self._parameterDict[self._xName]
        xvec=numpy.r_[xpar.minValue():xpar.maxValue():-250j]
        evalString=self.funcString[:]
        for par in self.paramLst:
            if(par==self._xName):
                evalString=evalString.replace(par,"xvec")
            else:
                evalString=evalString.replace(par,"self._parameterDict['%s'].v()"%(par))
        
        self.xvec=xvec        
        f=eval(evalString)
        return(f)
        
    def addParameter(self,p):
        """
        Add a parameter.
        Arguments:
        p - The parameter to add. Must be an instance of TFunctionParameter.
        """
        self._parameterDict[p.name()]=p
        
    def defineX(self,x):
        """
        Define the value on the x-axis. 
        Arguments:
        x - the parameter to plot on x-axis. Must be an a string and part of the parameter list
        """
        self._xName=x
        
    def parValue(self,p):
        """
        Returns the current value for parameter p
        """
        return(self._parameterDict[p.name()].v())
        
    def getParametersAndFunction(self,fs):
        """
        Extract the parameters from the function string
        fs: function string
        """
        math_symbols=['<','>','=','*','+','-','/','(',')']
        #Process string
        #find first occurence of _ to indicate a parameter
        NParams=fs.count('_')
        self.paramLst=[] #Stores parameter names in the order as they appear in the equation
        self.funcString=fs[:]
        cntParams=0
        #~ print self.funcString
        for np in range(NParams):
            index=fs.index('_')
            #Now look for first math symbol to signify end of parameter name
            i=index+1
            while(i<len(fs) and (fs[i] not in math_symbols )):
                i+=1
            #Now the parameter name is from index upto i-1
            parName=fs[index:min(i,len(fs))] #include the underscore
            fs=fs[i:]
            if(parName in self.paramLst):
                continue #Do not count same parameter double
            
            p=TFunctionParameter(parName,1.0,-5.0,5.0,scientific=False)
            self.paramLst.append(parName)
            self.addParameter(p)
        
        
    def slideChange(self,key):
        """
        Process slide change. Key indicates to which parameter
        the slide belongs to
        """
        v=self.slideDict[key].value()
        float_v=v/1000.0
        p=self._parameterDict[key]
        p_value=p.minValue()+(p.maxValue()-p.minValue())*float_v
        p.setValue(p_value)
        if(p.Scientific()):
            self.valDict[key].setText("%5.4e"%(p_value))
        else:
            self.valDict[key].setText("%6.3f"%(p_value))
        self.process_data()
        
    def toggleAutoScale(self,v):
        self.autoScale=self.autoScaleCheckBox.checkState()
        self.process_data()
        
    def parMinChanged(self,key):
        """
        Process a change of a minimum parameter value
        """
        p=self._parameterDict[key]
        p.setMinValue(self.minLEDict[key].text().toFloat()[0])
        self.process_data()
        
    def parMaxChanged(self,key):
        """
        Process a change of a minimum parameter value
        """
        p=self._parameterDict[key]
        p.setMaxValue(self.maxLEDict[key].text().toFloat()[0])
        self.process_data()
        
    def xAxisChange(self,key):
        """
        The user has selected a different x-axis
        """
        #Active slider of current x-axis
        self.slideDict[self._xName].setEnabled(True)
        self._xName=str(key)
        self.slideDict[self._xName].setEnabled(False)
        self.plot.set_axis_title(self.plot.X_BOTTOM,self._xName)
        self.process_data()
        
    def extend_widget(self):
        func_string=str(self.eqLE.text())
        self.getParametersAndFunction(func_string)
        self.eqLE.setEnabled(False)
        self.processButton.setEnabled(False)
        self.title="f=%s"%(func_string)
                
        #Check if _x in parameter list
        if("_x" in self._parameterDict.keys()):
            self.defineX("_x")  #Make _x the x-axis parameter
        else:
            self.defineX(self._parameterDict.keys()[0]) #Make the first parameter the x-axis
        #Loop over function parameters and setup sliders
        cnt=0
        glayout=QGridLayout()
        self.slideDict={} #to store parameter value sliders
        self.minLEDict={} #to store minimum parameter value LineEdits
        self.maxLEDict={} #to store maximum parameter value LineEdits
        self.valDict={} #to store labels containing parameter values
        self.xCBDict={} #To store x-axis selection checkbox
        palette = QPalette()
        palette.setColor(QPalette.Foreground,Qt.blue)
        key_lst=self._parameterDict.keys()
        key_lst.sort()
        self.xCombo=QComboBox()
        sizePol=QSizePolicy(QSizePolicy.Preferred,QSizePolicy.Preferred)
        for key in key_lst:
            self.xCombo.addItem(key)
            p=self._parameterDict[key]
            nameLabel=QLabel(p.name()+': ')
            glayout.addWidget(nameLabel,cnt,0)
            if(p.Scientific()):
                valLabel=QLabel("%5.4e"%(p.v()))
                minLE=QLineEdit("%5.4e"%(p.minValue()))
                #~ minLE.sizeHint(10)
                maxLE=QLineEdit("%5.4e"%(p.maxValue()))
            else:
                valLabel=QLabel("%6.3f"%(p.v()))
                minLE=QLineEdit("%6.3f"%(p.minValue()))
                #~ minLE.sizeHint(10)
                maxLE=QLineEdit("%6.3f"%(p.maxValue()))
            self.connect(minLE,SIGNAL('editingFinished ()'),\
                partial(self.parMinChanged,key))
            self.connect(maxLE,SIGNAL('editingFinished ()'),\
                partial(self.parMaxChanged,key))
            minLE.setSizePolicy(sizePol)
            maxLE.setSizePolicy(sizePol)
            self.minLEDict[key]=minLE
            self.maxLEDict[key]=maxLE
            valLabel.setPalette(palette)
            self.valDict[key]=valLabel    
            glayout.addWidget(valLabel,cnt,1)
            glayout.addWidget(minLE,cnt,2)
            glayout.addWidget(maxLE,cnt,4)
            sld = QSlider(Qt.Horizontal)
            sld.setRange(0,1000)
            
            s_value=int(1000*(p.v()-p.minValue())/(p.maxValue()-p.minValue()))
            sld.setValue(s_value)
            if(p.name()==self._xName):
                sld.setEnabled(False)
            self.connect(sld, SIGNAL('valueChanged(int)'),\
                partial(self.slideChange,key))
            self.slideDict[key]=sld
            #Connect this slider to the parameter
            glayout.addWidget(sld,cnt,3)
            cnt+=1
            
        self.vlayout.addLayout(glayout)
        hl=QHBoxLayout()
        self.xCombo.setCurrentIndex(self.xCombo.findText(self._xName))
        self.autoScaleCheckBox=QCheckBox("Auto scale y-axis")
        self.autoScaleCheckBox.setCheckState(self.autoScale)
        self.autoScaleCheckBox.setTristate(False)
        self.connect(self.autoScaleCheckBox,SIGNAL('stateChanged(int)'),\
            self.toggleAutoScale)
        hl.addWidget(self.autoScaleCheckBox)
        hl2=QHBoxLayout()
        xlab=QLabel("X-axis: ")
        hl2.addStretch()
        hl2.addWidget(xlab)
        hl2.addWidget(self.xCombo)
        hl.addLayout(hl2)
        self.connect(self.xCombo,SIGNAL('currentIndexChanged(QString)'),\
            self.xAxisChange)
        self.vlayout.addLayout(hl)
        self.setLayout(self.vlayout)
        
        self.plot.set_axis_title(self.plot.Y_LEFT,self.title)
        self.plot.set_axis_title(self.plot.X_BOTTOM,self._xName)
        
        self.process_data()
        
    def setup_widget(self, title):
        #---Create the plot widget:
        self.curvewidget = CurveWidget(self)
        self.curvewidget.register_all_curve_tools()
        self.curve_item = make.curve([], [], color='b')
        self.curvewidget.plot.add_item(self.curve_item)
        self.curvewidget.plot.set_antialiasing(True)
        self.plot = self.curvewidget.get_plot()
        
        font = QFont()
        font.setPointSize( 16 )
        
        self.plot.set_axis_font("left", font)
        self.plot.set_axis_font("bottom", font)
        #---
        self.eqLE=QLineEdit()
        self.eqLE.setPlaceholderText(\
            "e.g. sin(_x**2/_a+_y**2/_b), just start parameters with one underscore!")
        self.eqLE.selectAll()
        self.processButton=QPushButton("Process")
        self.connect(self.processButton,SIGNAL('clicked()'),\
            self.extend_widget)
        hlayout=QHBoxLayout()
        hlayout.addWidget(self.eqLE)
        hlayout.addWidget(self.processButton)
        self.vlayout = QVBoxLayout()
        self.vlayout.addWidget(self.curvewidget)
        self.vlayout.addLayout(hlayout)
        self.setLayout(self.vlayout)
        
    def process_data(self):
        self.y = self.calc()
        if(self.autoScale):
            self.plot.set_axis_limits(self.plot.Y_LEFT,min(self.y),max(self.y))
        self.update_curve()
        
    def update_curve(self):
        #---Update curve
        self.curve_item.set_data(self.xvec, self.y)
        self.curve_item.plot().replot()
        

class TestWindow(QWidget):
    def __init__(self):
        QWidget.__init__(self)
        self.setWindowTitle("FunctionPlotter(guiqwt)")
        self.setWindowIcon(get_icon('guiqwt.svg'))
        hlayout = QHBoxLayout()
        self.setLayout(hlayout)
        
    def add_plot(self, title):
        self.widget = TFunctionUI(self)
        self.widget.setup_widget(title)
        self.layout().addWidget(self.widget)

if __name__ == "__main__":
    app = QApplication([])
    win = TestWindow()
    win.add_plot("")
    win.show()
    app.exec_()

Monday, February 24, 2014

Locating the site-packages folder from within python

Nice way to find where the site-packages folder is located for your linux distribution from within python itself (found here)

import site
print site.getsitepackages()

Apparently not available in older python version.

Thursday, November 28, 2013

spreadsheet like line plot with filled areas in python

A nice way to show if a series of values fall within a certain, per sample in the series variable, range is to make a line plot with a shaded area indicating the range.

Example:



A line plot is easily made in a typical spreadsheet program. Getting the correct region shaded in the plot by combining area and line plots was however much too cumbersome for me. Therefore I had a look if matplotlib also supports area plots and luckily it does: fill_between. The only difficulty I encountered was that matplotlib is mainly intended for making scatter plots, i.e. a data series with meaningful x and y coordinates. A typical spreadsheet line plot however has text labels for all points on the x-axis (as shown in the example). The easiest solution I could come up with was to simply number the samples in the data series as 1, 2, 3 etc. and then change the ticks on the x-axis manually with the xticks function. If there are too many samples in the data series and the x-axis gets too full and labels start to overlap, most spreadsheet programs simply drop some labels. The code below does the same. The code seems a bit long but most lines are actually used for reading in the data from a csv file.

import csv
#!/usr/bin/env python
import csv

from pylab import *


violet=(90.0/255.0,36.0/255.0,90.0/255.0)
red=(1.0,0.0,0.0)
green=(0.0,1.0,0.0)


def plotFancy(fn, label,figNum=None,ymin=0.0,ymax=500.0,maxticks=20):
    """
    plots directly from a csv file (with a header row!!)
    file layout:
    column
    1: strip name
    2: Exp
    3: Mod
    4: min
    5: max
    
    use label to name y-axis
    if figNum is supplied the grap will be plotted in the figure with
    that number (and cleared first)
    ymin and ymax determine the scale on the y-axis (i.e. ylim(ymin,ymax))
    maxticks gives the maximum number of ticks (labels) allowed on the x axis
    """
    f=open(fn,'rb')
    reader=csv.reader(f,delimiter=';')
    xlabel_lst=[]
    y_min_lst=[]
    y_max_lst=[]
    y_mod_lst=[]
    y_exp_lst=[]
    line_cnt=0
    for line_lst in reader:
        line_cnt+=1
        if(line_cnt==1):
            continue
        xlabel_lst.append(line_lst[0])
        y_exp_lst.append(float(line_lst[1]))
        y_mod_lst.append(float(line_lst[2]))
        y_min_lst.append(float(line_lst[3]))
        y_max_lst.append(float(line_lst[4]))
    cnt_lst= [i for i in range(len(y_mod_lst))]
    f.close()
    if(figNum==None):
        figure()
    else:
        figure(figNum)
        clf()
    fill_between(cnt_lst,y_min_lst,y_max_lst,facecolor=green,alpha=1.0)
    plot(cnt_lst,y_mod_lst,'bo',color=violet,label="%s mod"%(label),ms=12)
    plot(cnt_lst,y_exp_lst,'mv',color=red,label="%s exp"%(label),ms=12)
    ylim(ymin,ymax)
    ylabel("%s"%(label))
    grid(b=True)
    legend(loc=9)
    show()
    if(len(xlabel_lst)>maxticks):
        delta=len(xlabel_lst)/float(maxticks-1.0)
        tick_num_lst=[]
        tick_text_lst=[]
        index=0
        for i in range(maxticks-1):
            index=int(i*delta)
            tick_num_lst.append(index)
            tick_text_lst.append(xlabel_lst[index])
        tick_num_lst.append(len(xlabel_lst)-1)
        tick_text_lst.append(xlabel_lst[len(xlabel_lst)-1])
        xticks(tick_num_lst,tick_text_lst,rotation=90)
        
    else:
        xticks(arange(len(xlabel_lst)),xlabel_lst,rotation=90)
    xlim(0,len(xlabel_lst))

Saturday, September 7, 2013

Reading/writing a python dictionary to file

To save time building a large dictionary every time I run my program I googled "saving a python dictionary to file". Of the suggested solutions I liked the option to write to a csv file best. However, the posted code did not work for me because the value in the dictionary was a very big nested list of lists and not a simple string. This was easy to fix by calling eval on the value obtained from the csv reader. Of course I was not the first one to realize this.

Below for completeness my code:

import csv

def saveDict(fn,dict_rap):
    f=open(fn, "wb")
    w = csv.writer(f)
    for key, val in dict_rap.items():
        w.writerow([key, val])
    f.close()
    
def readDict(fn):
    f=open(fn,'rb')
    dict_rap={}
    
    for key, val in csv.reader(f):
        dict_rap[key]=eval(val)
    f.close()
    return(dict_rap)

Monday, September 2, 2013

Creating a Cython extension type for use with multiProcessing for function fitting

If you have to fit a complex function to a very big data set it would be nice to be able to use all the cores your cpu has. Because the data set is very big it should be efficient to simply split the data set over a number of cores and calculate the total error sum (sum squared error) in parts in parallel. This sounds simple but it took me some effort to do this in python/Cython on both linux and windows. After googling for a while, I decided that using the multiProcessing module should work best for my specific situation (which contains a lot of python code which makes it difficult to turn the GIL temporarily off). On linux I had things running relatively fast, but on windows I could not get it to function. The difference is caused by the lack of real processes on windows (or at least they work differently). On a fork() in linux everything is copied but this does not happen on windows and you have to take care that all data is correctly passed to the child process (read "Explicitly pass resources to child processes" in the multiprocessing documentation).

To try things out I started with a simple example:

#!/usr/bin/env python
from multiprocessing import Process,Queue
import sys,numpy,pylab

class TFitFunc:
    def __init__(self,X0,x,y,pid=1):
        self.a=X0[0]
        self.b=X0[1]
        self.c=X0[2]
        self.x=x[:]
        self.y=y[:]
        self.pid=pid
        
    def __call__(self,X):
        self.a=X[0]
        self.b=X[1]
        self.c=X[2]
        errsum=0
        for i in range(100):
            ymod=self.a*self.x**2+self.b*self.x+self.c
            errsum+=numpy.sum((ymod[:]-self.y[:])**2)
        
        return(errsum)
        
        
class TFitFuncComplex:
    def __init__(self,X0,x,y,pid=1):
        self.a=X0[0]
        self.b=X0[1]
        self.c=X0[2]
        self.x=x[:]
        self.y=y[:]
        self.pid=pid
        
    def __call__(self,X):
        self.a=X[0]
        self.b=X[1]
        self.c=X[2]
        errsum=0
        for i in range(100):
            ymod=self.a*self.x**2+self.b*self.x+self.c+\
                numpy.sin(numpy.sqrt(self.x))*numpy.cos(self.x+0.5)\
                /self.a*numpy.sqrt(self.b)
            errsum+=numpy.sum((ymod[:]-self.y[:])**2)
        
        return(errsum)        
        
def f(fitfunc,X,Q=None):
    errsum=fitfunc(X)
    print "%d: errsum:%e"%(fitfunc.pid,errsum)
    if(Q!=None):
        Q.put(errsum)
    return(errsum)
    
def main():
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-10000000j]
    x2=numpy.r_[0:10:-10000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFunc(X0,x1,y1,1)
    f2=TFitFunc(X0,x2,y2,2)
    
    ps=[]
    for i in range(2):
        if(i==0):
            p=Process(target=f,args=(f1,X0))
        else:
            p=Process(target=f,args=(f2,X0))
        p.start()
        ps.append(p)
    for p in ps:
        p.join()
    
    
def main_complex():
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-1000000j]
    x2=numpy.r_[0:10:-1000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFuncComplex(X0,x1,y1,1)
    f2=TFitFuncComplex(X0,x2,y2,2)
    
    ps=[]
    Qs=[]
    errSum=0.0
    for i in range(2):
        Qs.append(Queue())
        if(i==0):
            p=Process(target=f,args=(f1,X0,Qs[i]))
        else:
            p=Process(target=f,args=(f2,X0,Qs[i]))
        p.start()
        ps.append(p)
    for i in range(2):
        errSum+=Qs[i].get()
        ps[i].join()
    print "Total errsum: %e"%(errSum)
    
    
def main_single_complex():
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-1000000j]
    x2=numpy.r_[0:10:-1000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFuncComplex(X0,x1,y1,1)
    f2=TFitFuncComplex(X0,x2,y2,2)
    errSum=f(f1,X0)
    errSum+=f(f2,X0)
    print "Total errsum: %e"%(errSum)

def main_single():
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-10000000j]
    x2=numpy.r_[0:10:-10000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFunc(X0,x1,y1,1)
    f2=TFitFunc(X0,x2,y2,2)
    f(f1,X0)
    f(f2,X0)
    
if __name__=='__main__':
    main_single()
    #~ main()
    #~ main_complex()
    #~ main_single_complex()

This works fine on linux and windows. However, this is pure python. Normally, I use a lot of Cython code in extension types (aka cython classes). And this I could not get to work without some more research. Now first my solution. The first part shows the .pyx file with two classes, one normal python class with some cython code in it and one real extension type. The second part shows the script using these classes.

TFitFunctions.pyx
#!/usr/bin/env python
import numpy as np
cimport numpy as np

class TFitFunc:
    """
    Simple demonstration class to be used as fit function.
    By definition of the __call__ member function an object of this
    class is callable (functor). All additional data required 
    to calculate the error sum should be passed to the constructor.
    The function here is simply
    y=a*x**2+b*x+c
    """
    def __init__(self,X0,x,y,pid=1):
        """
        Constructor. 
        Arguments:
        X0: list of initial values for the three model parameters [a, b, c]
        x: array of x values
        y: array of y values (typically experimentally determined data points)
        pid: optional "process id"
        """
        self.a=X0[0]
        self.b=X0[1]
        self.c=X0[2]
        self.x=x[:]
        self.y=y[:]
        self._pid=pid
    
    def pid(self):
        return(self._pid)
        
    def __call__(self,X):
        """
        Make objects of this class callable. The argument is a list/array
        of model parameter values [a,b,c]
        The function returns the sum squared error
        """
        cdef double errsum
        cdef int i
        self.a=X[0] #could also have used X directly in the calculation below
        self.b=X[1]
        self.c=X[2]
        errsum=0
        for i in range(100): #do this a hundred times to waste some CPU time
            ymod=self.a*self.x**2+self.b*self.x+self.c #calculate model values
            errsum+=np.sum((ymod[:]-self.y[:])**2) # calculate summed square error
        errsum/=100.0 
        return(errsum)
        
        
cdef class TFitFuncComplex:
    """
    Simple demonstration class to be used as fit function.
    Very similar to TFitFunc but with a more complex (and time consuming)
    function. Another big difference is that now the class is defined
    as an extension type. 
    
    By definition of the __call__ member function an object of this
    class is callable (functor). All additional data required 
    to calculate the error sum should be passed to the constructor.
    The function here is simply
    y=a*x**2+b*x+c+sin(sqrt(x))*cos(x+0.5)/(a*b*b)
    """
    cdef double a,b,c   #in an extension type class member variables must be defined here
    cdef int _pid
    cdef np.ndarray x,y
    
    def __init__(self,X0,np.ndarray[double, ndim=1]x,np.ndarray[double, ndim=1]y,int pid=1):
        """
        Constructor. 
        Arguments:
        X0: list/array of initial values for the three model parameters [a, b, c]
        x: array of x values
        y: array of y values (typically experimentally determined data points)
        pid: optional "process id"
        """
        self.a=X0[0]
        self.b=X0[1]
        self.c=X0[2]
        self.x=x[:]
        self.y=y[:]
        self._pid=pid
        
    def pid(self):
        return(self._pid)
        
    def __call__(self,X):
        """
        Make objects of this class callable. The argument is a list/array
        of model parameter values [a,b,c]
        The function returns the sum squared error
        """
        cdef double errsum
        cdef int i
        self.a=X[0] #could also have used X directly in the calculation below
        self.b=X[1]
        self.c=X[2]
        errsum=0
        for i in xrange(100):#do this a hundred times to waste some CPU time
            ymod=self.a*self.x**2+self.b*self.x+self.c+\
                np.sin(np.sqrt(self.x))*np.cos(self.x+0.5)\
                /self.a*np.sqrt(self.b) #calculate model values
            errsum+=np.sum((ymod[:]-self.y[:])**2) # calculate summed square error
        errsum/=100.0 
        return(errsum)
        
   
    def __reduce__(self):
        """
        Without this function the code will not run with multiProcessing
        on Windows.
        It has something to do with making an extension type
        pickable. For a normal python class this is not required
        (see TFitFunc)
        """
        return TFitFuncComplex, ([self.a,self.b,self.c],self.x,self.y,self._pid)

calling script:

#!/usr/bin/env python
from multiprocessing import Process,Queue
import sys,numpy
from TFitFunctions import TFitFunc,TFitFuncComplex

"""
Demonstration of use Process with a callable Python/Cython classes
Can be easily extended into a real multiProcessing fitting
setup for use with (e.g.) fmin

The basic idea is that you have a huge amount of data points
that must be evaluated for the calculation of the summed squared error.
These data points are independent by nature and thus ideal for
parallel processing.
"""
      
        
def f(fitfunc,X,Q=None):
    """
    Function to be passed to Process as target. The function
    will call fitfunc with X as argument and put the result in the
    Queue Q is one is passed as argument.
    """
    errsum=fitfunc(X)
    print "%d: errsum:%e"%(fitfunc.pid(),errsum)
    if(Q!=None):
        Q.put(errsum)
    return(errsum)
    
def main():
    """
    Demonstration of use of TFitFunc
    """
    a=1.0 #model parameters
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-10000000j] #data points for first process (x) 
    x2=numpy.r_[0:10:-10000000j] #measurement points for second process (x)
    
    y1=a*x1**2+b*x1+c  #"measured" data for process 1
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5) #add some noise
    y2=a*x2**2+b*x2+c #"measured" data for process 2
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5) #add some noise
    X0=[0.5,2.5,1.5] #initial guess for model parameters
    f1=TFitFunc(X0,x1,y1,1) #Create fit function for process 1
    f2=TFitFunc(X0,x2,y2,2) #Create fit function for process 2
    
    ps=[] #to contain process
    Qs=[]
    errSum=0.0
    for i in range(2):
        Qs.append(Queue())
        if(i==0):
            p=Process(target=f,args=(f1,X0,Qs[i])) #create process 1
        else:
            p=Process(target=f,args=(f2,X0,Qs[i])) #create process 2
        p.start() #start process
        ps.append(p) #add process "handle"
    
    for i in range(2):
        errSum+=Qs[i].get() #collect error sums from processes
        ps[i].join() #wait for process to finish
    print "Total errsum: %e"%(errSum)
    

    
def main_complex():
    """
    Same as main but now with TFitFuncComplex
    """
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-1000000j]
    x2=numpy.r_[0:10:-1000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFuncComplex(X0,x1,y1,1)
    f2=TFitFuncComplex(X0,x2,y2,2)
    
    
    ps=[]
    Qs=[]
    errSum=0.0
    for i in range(2):
        Qs.append(Queue())
        if(i==0):
            p=Process(target=f,args=(f1,X0,Qs[i]))
        else:
            p=Process(target=f,args=(f2,X0,Qs[i]))
        p.start()
        ps.append(p)
    for i in range(2):
        errSum+=Qs[i].get()
        ps[i].join()
    print "Total errsum: %e"%(errSum)
    

    
def calcErrorSum(X,fitfuncs):
    """
    Calculate error sum for given model parameters (X)
    by using the functions in fitfuncs in (parallel) processes
    """
    ps=[]
    Qs=[]
    errSum=0.0
    for i in range(len(fitfuncs)):
        Qs.append(Queue())
        p=Process(target=f,args=(fitfuncs[i],X,Qs[i]))
        p.start()
        ps.append(p)
    for i in range(2):
        errSum+=Qs[i].get()
        ps[i].join()
    print "Total errsum: %e"%(errSum)
    return(errSum)
    
def main_complex2():
    """
    Same as main_complex but now with data creation part seperated
    from function evaluation part. PLease note that
    the function calcErrorSum can be used as a (multiProcessing)
    argument to (e.g.) fmin
    """
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-1000000j]
    x2=numpy.r_[0:10:-1000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFuncComplex(X0,x1,y1,1)
    f2=TFitFuncComplex(X0,x2,y2,2)
    fitfuncs=[f1,f2]
    
    calcErrorSum(X0,fitfuncs)
    
        
    
def main_single_complex():
    """
    Same a main_complex but now serial evaluation (for time comparison purposes)
    """
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-1000000j]
    x2=numpy.r_[0:10:-1000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFuncComplex(X0,x1,y1,1)
    f2=TFitFuncComplex(X0,x2,y2,2)
    errSum=f(f1,X0)
    errSum+=f(f2,X0)
    print "Total errsum: %e"%(errSum)

def main_single():
    """
    Same a main but now serial evaluation (for time comparison purposes)
    """
    a=1.0
    b=2.0
    c=3.0
    x1=numpy.r_[0:10:-10000000j]
    x2=numpy.r_[0:10:-10000000j]
    
    y1=a*x1**2+b*x1+c
    y1+=y1*0.35*(numpy.random.random(len(x1))-0.5)
    y2=a*x2**2+b*x2+c
    y2+=y2*0.35*(numpy.random.random(len(x2))-0.5)
    X0=[0.5,2.5,1.5]
    f1=TFitFunc(X0,x1,y1,1)
    f2=TFitFunc(X0,x2,y2,2)
    f(f1,X0)
    f(f2,X0)
    
if __name__=='__main__':
    #~ main_single()
    #~ main()
    #~ main_complex2()
    main_complex()
    #~ main_single_complex()

The code in TFitFunctions.pyx shows that the trick for an extension type is to add the __reduce__ method to make it pickable. This is only needed on windows.

Sunday, May 26, 2013

cython wraparound problems

For performance reasons I was including the line

#cython: wraparound=False

in most of my Cython files. After an update to a newer versions of Cython (0.19) my Cython scripts started to crash. It took me some time to find out that combined with the wraparound=False line you cannot use [-1] as the index for the last element anymore (this is also written somewhere in the Cython documentation).