limscoder: python

Transactions

Database transactions are a convenient way to maintain consistent state during data processing functions. If an error occurs during processing, just rollback the transaction to avoid incomplete or incorrect data being stored.

Problem

I've worked on many problems where data processing involves retrieving a source file, performing some type of processing, and then writing to a destination file. These functions are tricky, because if a problem arises during the processing, you're left with an inconsistent, partially processed batch of files. This problem is especially pronounced if you're storing file metadata in a database. If you perform a rollback of your database transaction when an error occurs, then you've lost any updated metadata about the files that were processed correctly.

Solution

In an attempt to remedy this problem I've developed a somewhat naive implementation of a file transaction class that can be used to maintain consistent state during processing function involving many files. The transaction object keeps track of all files that have been created and all files that should be deleted. All files marked for deletion are deleted when a commit occurs. All files marked as created are removed when a rollback occurs. If a file needs to be moved, it is instead copied, and the source file is marked for deletion, and the destination file is marked as being created.

Implementation


import glob
import os
import shutil

class Transaction(object):
    """
    Manages transactions for file storage.

    Assumes each file is only being operated on by one person at a time.

    If multiple users try to operate on the same file, then the last
    to access gets an exception.
    """

    lock_postfix = 't_lock'

    def __init__(self):
        self._level = 0

    def _get_lock_path(self, path):
        """Return lock file path."""

        if path.endswith('/'):
            end = len(path) - 1
            path = path[:end]

        return path + '.%s' % self.lock_postfix

    def _set_files(self):
        """Resets file lists."""

        self._files_added = set()
        self._files_removed = set()
        self._dirs_added = set()
        self._dirs_removed = set()
        self._locked_files = set()

        # Unlike the other types,
        # move operations
        # must be ordered!!
        self._files_moved = []

    def _check_level(self):
        """Raises exception if level is not 1 or above."""

        if self._level < 1:
            raise exceptions.TransactionError('Transaction not active.')

    def _rm(self, file_paths, dir_paths):
        """Remove all files."""

        for dir_path in dir_paths:
            if os.path.exists(dir_path):
                shutil.rmtree(dir_path)

        for file_path in file_paths:
            if os.path.exists(file_path):
                os.unlink(file_path)

    def _rev_moves(self):
        """Reverse moved files."""

        for move in reversed(self._files_moved):
            shutil.move(move[1], move[0])

    def _acquire_lock(self, path):
        """Attempt to lock a file."""

        # Make sure transaction is started
        self._check_level()

        if path not in self._locked_files:
            # Create lock file on file system
            lock_path = self._get_lock_path(path)
            if os.path.exists(lock_path):
                # Multi-user access is not allowed!
                raise exceptions.TransactionError('File is locked.')
            out_file = open(lock_path, 'w')
            out_file.write('\n')
            out_file.close()
            self._locked_files.add(path)

    def _release_lock(self, path):
        """Release a lock file."""

        lock_path = self._get_lock_path(path)
        if os.path.exists(lock_path):
            os.unlink(lock_path)
        self._locked_files.discard(path)

    def _release_locks(self):
        """Release all locks."""

        locked_paths = self._locked_files.copy()
        for path in locked_paths:
            self._release_lock(path)

    def copy_file(self, src_path, dest_path, remove_existing=False, directory=False):
        """Copy a file. Set remove_existing to True to move file."""

        if directory is True:
            shutil.copytree(src_path, dest_path, symlinks=True)
        else:
            shutil.copyfile(src_path, dest_path)
        self.add_file(dest_path, directory=directory)

        if remove_existing is True:
            self.remove_file(src_path, directory=directory)

    def add_file(self, path, directory=None):
        """Add a file to the transaction."""

        self._check_level()

        self._acquire_lock(path)

        if directory is None:
            directory = os.path.isdir(path)

        if directory is True:
            self._dirs_added.add(path)
        else:
            self._files_added.add(path)

    def remove_file(self, path, directory=None):
        """Remove a file from the transaction."""

        self._check_level()

        self._acquire_lock(path)

        if directory is None:
            directory = os.path.isdir(path)

        if directory is True:
            self._dirs_removed.add(path)
        else:
            self._files_removed.add(path)

    def move_file(self, src_path, dest_path):
        """Move a file from one location to another."""

        self._check_level()

        self._acquire_lock(src_path)
        self._acquire_lock(dest_path)

        shutil.move(src_path, dest_path)
        self._files_moved.append((src_path, dest_path))

    def begin(self):
        """Begin transaction."""

        if self._level == 0:
            self._set_files()

        self._level += 1

    def commit(self):
        """Removes all 'removed' files and dirs."""

        self._check_level()

        self._level -= 1
        if self._level == 0:
            self._rm(self._files_removed, self._dirs_removed)
            self._release_locks()

    def rollback(self):
        """Removes all 'added' files and dirs."""

        self._check_level()

        self._level -= 1
        if self._level == 0:
            self._rm(self._files_added, self._dirs_added)
            self._rev_moves()
            self._release_locks()

Example


def process():
    transaction = Transaction()
    transaction.begin()
    try:
        # Mark a file as created
        transaction.add_file(new_file)

        # Mark a file as deleted
        transaction.remove_file(delete_file)

        # Copy a file
        transaction.copy_file(src_file, dest_file)

        # Move a file
        transaction.move_file(mov_src_file, mov_dest_file)
        transaction.commit()
    except:
        transaction.rollback()
        raise

Limitations

The class only works for single user environments. A lock file is created for every file added to a transaction. If a different transaction tries to acquire a lock for a file that is already locked, an exception is raised. Negotiating multi-user access would be quite tricky, especially in the case of delete files, where the file no longer exists after the lock is released.

Thursday, April 22, 2010

Django + Dojo

When I work on HTML projects, I usually use the Dojo Toolkit for my Javascript needs. Lately I've been spending some time playing around with the Python web framework Django. I did some internet searching and found Dojango, a project that integrates Django with Dojo. Dojango has features for automatically turning Django form fields into Dijits (Dojo UI widgets), but unfortunately Dojango uses Dojo's custom HTML attributes with Dojo's parseOnLoad option. I prefer to create Dijits programatically so that my markup stays clean. I decided to develop a Django app to meet my needs.

Code is available from SVN. Instructions are below.

Instructions:

Settings

# Setup app in settings.py

# Required attributes:

# The URL to get dojo.js from
DOJO_URL = MEDIA_URL + 'js/dojo'

# Optional attributes:

# Set to True to add Dojo setup to template
DOJO_ENABLED = True

# Set Dojo theme
DOJO_THEME = 'tundra' 

# Set the value of djconfig
DOJO_DJCONFIG = {'isDebug': False, 'parseOnLoad': False, 
                 'modulePaths': {'app': MEDIA_URL + 'js/app'}}

# More on this later
DOJO_FORM_FUNCTION = None

# Attach middleware
MIDDLEWARE_CLASSES = (
    'dojo.middleware.DojoMiddleware',
    ...
)

The dojo object

The middleware attaches a Dojo object to each request. You can access the object from your views, and use it to set Dojo parameters.

# The dojo object has several useful attributes and methods.

# The path to dojo.js (from settings.DOJO_URL), read-only
request.dojo.src

# Theme
request.dojo.theme = 'soria'

# DjConfig
request.dojo.dj_config['isDebug'] = True

# Convenience method to set module paths in dj_config
request.dojo.set_module_path('custom', 'url_to_custom_module')

# Add stylesheets
request.dojo.append_stylesheet('url_to_custom_stylesheet')

# Require modules
request.dojo.append_module('module.to.require')

# Set function to addOnLoad
request.dojo.append_aol('function() {do_something();}')

# Set function to addOnLoad before other already set
request.dojo.prepend_aol('function() {do_something();}'

Forms

Django forms can easily be 'dijitized'.

from dojo import models as dojo

class Register(forms.Form):
    username = forms.RegexField(
        label='Choose a username (letters and numbers only)',
        min_length=2,
        max_length=16,
        regex=r'^[\w]{2,16}$',
        error_messages={
            'invalid': 'Username must be 16 characters or shorter, and can only' 
                       ' contain letters, numbers, underscores and dashes.'
         }
    )

    # Use the dojo_field function to attach dijit
    # parameters to a Django form field.
    dojo.dojo_field(username, 'dijit.form.ValidationTextBox')



   """
   dojo_field arguments

   required
   ==========
    * field - Django field to attach dijit parameters to.
    * dojo_type - str, the qualified name of the dijit class to use

   keyword
   =========
    * attr_map - dict, Used to map Django field parameters to Dijit parameters.
                 Overrides the default values in dojo.models.default_attr_map.
                 The dict elements should be structured as follows:

                 Key == Django attribute name
                 Value == tuple with elements:
                     [0] == Dijit attribute name to map to
                     [1] == None, or callable to convert Django value to Dijit value

                 EXAMPLE:
                 {
                     'max_length': ('maxLength', None),
                     'regex': ('regExp', lambda a: '%s' % a.pattern)
                 }
    * dojo_attrs - dict, Attributes will be applied directly to dijit.
                   Key == dojo attribute name
                   Value == dojo attribute value
   """
}

After creating a form, it must be instrumented to create the Javascript required to create the dijits.



def my_view(request):

    my_form = Register()

    # This call generates all the necessary Javascript code
    request.dojo.dojo_form(my_form)

    # By default, the function code generated
    # is a string to be added in-line.
    #
    # If you prefer to call a pre-defined JS function,
    # just set the request.dojo.form_function attribute.
    #
    # The value of the attribute should be a tuple where:
    # [0] == qualified Dojo module name where function exists
    # [1] == function name
    #
    # request.dojo.form_function can also be set automatically
    # by setting DOJO_FORM_FUNCTION in settings.py

Template

Include the following tags within the 'head' tag of your HTML template:


{% load dojo %}
{% dojo request.dojo %}

The Dojo app also includes a script for creating a Dojo build:


python manage.py dojo_build

Tuesday, November 24, 2009

Profiling and Optimizing Python Code

All programmers have heard the advice: "Don't prematurely optimize code." What exactly does that mean? It means that you shouldn't guess about what is causing your code to run slowly. Chances are you'll guess wrong. Instead of guessing, use profiling and benchmarking tools to quickly and accurately identify the performance bottlenecks in your scripts.

Every now and then I run into a piece of Python code that just doesn't run as fast as I would like. I use 3 different Python tools to find and fix Python performance problems.

cProfile:

cProfile is a module that is included in the Python Standard Library. It logs function calls and execution times. There is also a pure-python version named 'profile'.

KCachegrind:

KCachegrind was written to visualize the output generated by Callgrind (a C profiler), but the function call logs from cProfile can be converted to the KCachegrind format with this script. KCachegrind uses the KDE framework. On Linux boxes, KCachegrind is usually bundled in a package with other development tools written for KDE. The package is named 'kdesdk' or something similar.

timeit:

timeit is a module included in the Python Standard Library that is used for measuring the execution time of arbitrary pieces of code.

Here is the code that needs to be improved:


class LameStringBuilder(object):
    def __init__(self, cols=40, rows=10000):
        self.cols = cols
        self.rows = rows

    def build(self, val):
        built = ''
        for i in range(self.rows):
            built += self.build_row(val) + "\n"
        return built

    def build_row(self, val):
        built = ''
        for i in range(self.cols - 1):
            built += val + ','
        built += val
        return built

Here is the code to execute a profile test and format the results into something that KCachegrind can understand:


import cProfile

# Use this module to convert profile data
# into the KCacheGrind format.
#
# The module is available here:
# http://www.gnome.org/~johan/lsprofcalltree.py
import lsprofcalltree

import lame_string_builder

def test():
    """Code to profile."""
    l = lame_string_builder.LameStringBuilder(40, 10000)
    l.build('foo')

if __name__ == "__main__":
    # Profile function
    p = cProfile.Profile()
    p.run('test()');

    # Get profile log in KCacheGrind format
    # and dump to file.
    k = lsprofcalltree.KCacheGrind(p)
    out = open('lame_profile.kgrind', 'w')
    k.output(out)
    out.close()

After running the test, launch KCachegrid and open the profile file to view the results.

The 'Flat Profile' and 'Callee Map' are the 2 most useful displays for finding the bottle necks in your code.

The 'Flat Profile' describes each function called in the test with the following columns:

Incl. - The total percentage of time spent within a function.

Self - The percentage of time spent within a function NOT including inner function calls.

Called - The total number of times the function was called during the test.

Function - The name of the function being described.

The 'Callee Map' represents the different functions called during the test. Each function is drawn as a rectangle. Inner function calls are drawn on top of their parent function's rectangle. The area used to draw each function is proportional to the execution time for each function call relative to the total execution time of the parent (also printed as a percentage in the graph).

In the example above it is easy to pick out where the bottle necks are by looking at the 'Callee Map'. Most of the test time was spent within the 'LameStringBuilder.build_row' method, so it is the main bottle neck. Other significant sources of time include calls to 'LameStringBuilder.build' and the built-in function 'range'.

After identifying the bottlenecks, I created a better version of the 'LameStringBuilder' class:


import lame_string_builder

class LessLameStringBuilder(lame_string_builder.LameStringBuilder):
    def build(self, val):
        return "\n".join([self.build_row(val) for i in xrange(self.rows)])

    def build_row(self, val):
        return ",".join([val for i in xrange(self.cols)])

After making changes to the code, Python's built-in 'timeit' module can be used to simply and accurately measure the differences in execution speed between the old and the new versions:


import timeit

import lame_string_builder
import less_lame_string_builder

def test_lame(val, cols, rows):
    l = lame_string_builder.LameStringBuilder(cols, rows)
    l.build(val)

def test_less_lame(val, cols, rows):
    l = less_lame_string_builder.LessLameStringBuilder(cols, rows)
    l.build(val)

def test_generic(repeat, name, args):
    print "%s (x%i):" % (name, repeat)
    t = timeit.Timer("%s(%s)" % (name, args), "from __main__ import %s" % name)
    print t.timeit(repeat)

def test_all(repeat, val, cols, rows):
    args = "'%s', %i, %i" % (val, cols, rows)   
    
    for name in ("test_lame", "test_less_lame"):
         test_generic(repeat, name, args)

if __name__ == "__main__":
    test_all(100, 'foo', 40, 10000)

The new code is significantly faster!

Monday, November 2, 2009

Role based security with Python

Most moderately complex enterprise applications require some sort of role based security and access control. For example, an employee should have access to fill-out her time card, but a manager should also be able to approve the time card. This post outlines how to create an 'access control list' (acl) for Python.

The acl is an object that can be queried to determine if a particular role has permission to access a resource. Each permission is also associated with a 'privilege'. For example, a manager may have both the 'read' and 'write' privileges for the weekly schedule, but an employee only has the 'read' privilege.


import acl

# Each user name should be associated with a role.
role = get_role(username)

# Query the acl to see if a role has access to a resource.
# params: role name, resource, privilege
acl = acl.Acl()
if not acl.check_access(role, 'time_card', 'write'):
    # User doesn't have access to this resource!!
    pass

The Acl class supports role inheritance. If the 'manager' role inherits the 'employee' role, then managers will have all of the same permissions as employees.


# Build acl list
acl = acl.Acl()

employee = acl.Role('employee')
resource = acl.Resource('time_card')
resource.set_privilege('write')
employee.set_resource(resource)
acl.set_role(employee)

manager = acl.Role('manager')
manager.set_parent(employee)
resource = acl.Resource('time_card')
resource.set_privilege('approve')
manager.set_resource(resource)
acl.set_role(manager)

if acl.check_access('manager', 'time_card', 'write'):
    # YES!
    pass

if acl.check_acces('manager', 'time_card', 'approve'):
    # YES!
    pass

if acl.check_access('employee', 'time_card', 'write'):
    # YES !
    pass

if acl.check_access('employess', 'time_card', 'approve');
    # NO!
    pass

I'm normally not a very big fan of XML, but in this particular case it is a good format to use if you prefer to store your acl in a configuration file. The Acl object's 'build_acl' method populates the object from a XML file with the following format:


<?xml version="1.0"?>
<!--
- This file configures the roles (user groups)
- and permissions for accessing the system.
-->
<config>
    <!--
    - Setup roles here.
    - Use the 'inheritFrom' tag to
    - inherit permissions from another role.
    -->
    <roleSet>   
        <role>
            <name>customer</name>
        </role>
        <role>
            <name>employee</name>
            <inheritFrom>customer</inheritFrom>
        </role>
        <role>
            <name>manager</name>
            <inheritFrom>employee</inheritFrom>
        </role>
    </roleSet>

    <!--
    - Set permissions for accessing application components here.
    - resource -> property being access controlled.
    - role -> group or user that can access resource.
    - privilege -> privilege that role can use with resource.
    -
    - Each permission tag can contain multiple
    - resources, roles, and privileges.
    -->
    <permissions>
        <permission>
            <resources>
                <resource>contact_details</resource>
                <resource>profile</resource>
            </resources>
            <roles>
                <role>customer</role>
            </roles>
            <privileges>
                <privilege>read</privilege>
                <privilege>write</privilege>
            </privileges>
        </permission>

        <permission>
            <resources>
                <resource>time_card</resource>
            </resources>

            <roles>
                <role>employee</role>
            </roles>
            <privileges>
                <privilege>read</privilege>
                <privilege>write</privilege>
            </privileges>
        </permission>
        
        <permission>
            <resources>
                <resource>time_card</resource>
            </resources>
            <roles>
                <role>manager</role>
            </roles>
            <privileges>
                <privilege>approve</privilege>
            </privileges>
        </permission>

    </permissions>
</config>

Here is the code for the acl.py module:


"""Role based security"""
from xml.dom.minidom import parse

class AccessError(Exception):
    pass

class Resource(object):
    """An Resource is an object that can be accessed by a Role."""
    def __init__(self, name=''):
        self.name = name
        self._privileges = {}

    def set_privilege(self, privilege, allowed=True):
        self._privileges[privilege] = allowed

    def has_access(self, privilege):
        if privilege in self._privileges:
            return self._privileges[privilege]
        return False
    
    def __str__(self):
        rpr = self.name + ': '
        for privilege, access in self._privileges.iteritems():
            rpr += "%s:%s " % (privilege, access)
        return rpr

class Role(object):
    def __init__(self, name=''):
        """An Acl role has access to resources with specific privileges."""
        self.name = name
        self._parents = {}
        self._resources = {}

    def set_parent(self, parent):
        self._parents[parent.name] = parent

    def set_resource(self, resource):
        self._resources[resource.name] = resource

    def has_access(self, attr_name, privilege):
        if attr_name in self._resources:
            if self._resources[attr_name].has_access(privilege):
                return True

        for parent in self._parents.values():
            if parent.has_access(attr_name, privilege):
                return True

        return False
    
    def __str__(self):
        rpr = self.name + ":\n"
        rpr += "parents:\n"
        for parent in self._parents.keys():
            rpr += "\t%s\n" % parent
        rpr += "resources:\n"
        for resource in self._resources.values():
            rpr += "\t%s\n" % resource.describe()
        return rpr

class Acl(object):
    """Manages roles and resources.
    
    Singleton class.
    """

    class __impl:
        """Implementation of the singleton interface"""
        
        def __init__(self):
            self._acl = {}

        def set_role(self, role):
            self._acl[role.name] = role
            
        def check_access(self, role_name, resource, privilege):
            """Check whether a role has access to a resource or not."""

            if not role_name in self._acl:
                raise AccessError('Role does not exist.')
            return self._acl[role_name].has_access(resource, privilege)

        def build_acl(self, file): 
            """Build acl from an XML file."""

            self._acl = {}
            roles_to_create = {}
            dom = parse(file)
            
            # Find roles to create
            roles_nodes = dom.getElementsByTagName('roleSet')
            for roles_node in roles_nodes:
                role_nodes = roles_node.getElementsByTagName('role')
                for role_node in role_nodes:
                    name_nodes = role_node.getElementsByTagName('name')
                    parent_nodes = role_node.getElementsByTagName('inheritFrom')
                    role_name = name_nodes[0].childNodes[0].data
                    roles_to_create[role_name] = []

                    # Find role parents
                    for parent_node in parent_nodes:
                        roles_to_create[role_name].append(parent_node.childNodes[0].data)

            # build inheritence chain
            for role, parents in roles_to_create.iteritems():
                self.set_role(self._create_role(role, roles_to_create))

            # assign permissions
            permissions = dom.getElementsByTagName('permissions')
            for permissions_node in permissions:
                permission_nodes = permissions_node.getElementsByTagName('permission')
                for permission_node in permission_nodes:
                    resource_nodes = permission_node.getElementsByTagName('resource')
                    role_nodes = permission_node.getElementsByTagName('role')
                    privilege_nodes = permission_node.getElementsByTagName('privilege')

                    for resource_node in resource_nodes:
                       resource = Resource()
                       resource.name = resource_node.childNodes[0].data
                       for privilege_node in privilege_nodes:
                           resource.set_privilege(privilege_node.childNodes[0].data)

                       for role_node in role_nodes:
                           try:
                               role = self._acl[role_node.childNodes[0].data]
                           except:
                               raise AccessError('Role in permission is not defined.')

                           role.set_resource(resource)

        def _create_role(self, role_name, roles_to_create):
            """Recursively create parent roles and then create child role."""
            
            if role_name in self._acl:
                role = self._acl[role_name]
            else:
                role = Role()
                role.name = role_name
                
            for parent_name in roles_to_create[role_name]:
                if parent_name in self._acl:
                    parent = self._acl[parent_name]
                else:
                    parent = self._create_role(parent_name, roles_to_create)
                    self.set_role(parent)
                role.set_parent(parent)
            return role
        
        def __str__(self):
            rpr = ''
            for role in self._acl.values():
                rpr += '----------\n'
                rpr += role.describe()
            return rpr

    __instance = None

    def __init__(self):
        """ Create singleton instance """
        
        # Check whether an instance already exists.
        # If not, create it.
        if Acl.__instance is None:
            Acl.__instance = Acl.__impl()

        self.__dict__['_Acl__instance'] = Acl.__instance

    def __getattr__(self, attr):
        """ Delegate get access to implementation """

        return getattr(self.__instance, attr)

    def __setattr__(self, attr, val):
        """ Delegate set access to implementation """

        return setattr(self.__instance, attr, val)

Sunday, September 20, 2009

Facebook's Tornado

Today I got my first chance to play with Facebook's Python based web server and framework Tornado. The framework is single threaded and asynchronous, similar to how the Twisted framework operates, and Facebook uses the technology to provide data to the 'Friend Feed'.

Python thread creation and maintenance have relatively high overhead, so asynchronous solutions can provide better performance and scalability than more traditional threaded server implementations. This is especially true for servers with high numbers of concurrent connections, and for long-polling and streaming connections that spend most of their time waiting around for a message to be published to a client. Tornado's benchmarks look impressive against threaded Python web servers, but I wonder why they didn't include Twisted, their closest competitor, in the benchmarking.

My first impression is that Tornado is much easier to use than Twisted, but also much less powerful. Twisted is a monolithic package that can be used with many different networking protocols, but Tornado is focused only on http. Tornado provides built in support for templating, authentication, and signed cookies. One really cool feature Tornado provides is semi-automatic XSRF protection, which should save time for developers who would otherwise need to manually implement countermeasures.

The documentation for Tornado is very slim. For example, I couldn't find any information about how to interact directly with Tornado's main event loop (IOLoop). Fortunately, Tornado's code base is small and readable, so reading the source will quickly get you up to speed.

While implementing a Tornado channel for the AmFast project, I ran into a problem that I have also encountered with Twisted. Both Tornado and Twisted use callbacks to complete a request. As an example, if a long-poll client is waiting for a message, you call RequestHandler.finish() to send a response back to the client when a message is published.

The above method works well for a single server instance, but what about when you have multiple servers behind a proxy? A server process may publish a message for a client that is connected to an entirely different server process. The publishing process has no way to notify the subscribing process that a client has received a message.

This problem can be solved by polling a database table or a file that is accessible to all server processes, but that takes system resources, especially if you have many clients and your polling interval is short. One of my future goals is to figure out a more elegant way for the publishing server process to notify the subscribing server process when a client receives a message.

Saturday, August 1, 2009

AmFast adds support for Google App Engine and Django

I just released version 0.4.0 beta of AmFast, a Flash remoting package for Python. The new release includes support for Google App Engine and the Django framework.

Browse the code here.

Check out the live Google App Engine demo.

To get the code:

svn checkout https://amfast.googlecode.com/svn/tags/0.4.0b

Saturday, June 20, 2009

YAPT (Yet Another Python Tutorial)

The tutorial I created for the Python workshop is posted here. The tutorial is geared towards bioinformatics users, and includes molecular biology related exercises.

I created the tutorial with the Sphinx documentation package, which turned out to be the perfect tool for the job.

Wednesday, May 20, 2009

Real-time Flex messaging with AmFast

This example is very similar to the previous, but we're going to use AmFast's HTTP streaming channel to implement real-time flex messaging. Get the complete example code here.

First let's setup the Producer and Consumer in Actionscript.



// Configure the ChannelSet that
// messages will be sent and received over.
import mx.messaging.ChannelSet;

// To use HTTP streaming, the
// StreamingAMFChannel class must be used.
import mx.messaging.channels.StreamingAMFChannel;
var channelSet:ChannelSet = new ChannelSet();
var channel:StreamingAMFChannel = new StreamingAMFChannel("channel-name", "server-url");
channelSet.addChannel(channel);

// Setup a Consumer to receive messages.
import mx.messaging.Consumer;
var consumer:Consumer = new Consumer();

// The consumer's destination is the 'topic'
// name that the consumer will be subscribed to.
// The consumer will receive all messages
// published to the topic.
consumer.destination = "topic";

// Use the ChannelSet that was already created.
consumer.channelSet = channelSet;

// This event listener will be called whenever
// the consumer receives a message from the server.
consumer.addEventListener(MessageEvent.MESSAGE, newMsgHandler);

// The consumer won't start receiving messages
// until it is subscribed.
consumer.subscribe();

// Setup a Producer to publish messages.
import mx.messaging.Producer;
var producer:Producer = new Producer();
producer.destination = "topic";
producer.channelSet = channelSet;

// Create an Async message and send it
// to all other clients subscribed to the topic.
import mx.messaging.messages.AsyncMessage;
var msg:AsyncMessage = new AsynMessage();

// Set the message's body attribute to the
// object that is going to be published.
//
// In this case the object contains
// the X and Y coordinates of a Sprite.
msg.body = {'x': 10, 'y': 10};
producer.send(msg);

Next we'll setup the server.


# Create a ChannelSet to serve messages.
from amfast.remoting.channel import ChannelSet
from amfast.remoting.wsgi_channel import StreamingWsgiChannel
channel_set = ChannelSet()

# Each individual ChannelSet can use
# one or more Channels. When messages
# are published through the ChannelSet,
# they will be published to all subscribed clients,
# regardless of which Channel the clients
# are connected through.

# Create a HTTP streaming channel.
#
# When a client connects to a streaming channel,
# the HTTP connection remains open until the client
# disconnects. When messages are published,
# they are immediately dispatched to clients connected
# to HTTP streaming channels.
stream_channel = StreamingWsgiChannel('stream-channel')

# WsgiChannels objects are wsgi applications
# that can be served with any Wsgi server.
# CherryPy is being used in this example.
cherrypy.tree.graft(stream_channel, '/amf')

# Start the server.
# App() is your root controller class
# for non-AMF functions.
cherrypy.quickstart(App(), '/')

# That's it, our server is up and running.
# It's that simple.

To test the example, download the full code from the repo. To run the example you'll need to install AmFast and CherryPy or Twisted 1st.


# To server the example with cherrypy
python cp_server.py

# To server the example with Twisted
twistd -noy twisted_server.tac

Open two browser windows and browse to 'http://localhost:8000'. Click the 'Subscribe' button in both windows. In the 1st window click on the 'Master' button. In the 1st window, click on the yellow circle, and drag the circle around the screen. As the circle is dragged around the screen, the position of the circle will be updated in the 2nd window. Pretty cool stuff.

Friday, May 15, 2009

'Pushing' messages to Flex clients with AmFast

One of the coolest aspects of Flex development is the ability to 'push' messages from the server to clients. With the newest release of AmFast, you can finally utilize this capability with all of the goodness of Python. This will be the 1st of 3 posts describing how you can use the AmFast Flash remoting package to push messages to Flex clients from a Python server.

AmFast supports pushing messages to clients over HTTP using 3 different strategies: polling, long-polling, and streaming. Here is a good article describing the differences between those strategies. Go ahead and skip past the XML configuration sections, since you won't need any XML to configure AmFast :)

Streaming channels deliver messages in real time, but also consume the most server resources. Standard polling consumes the least amount of server resources, but is also the slowest of the three strategies. Long-polling sits in the middle and offers a nice compromise between latency and resource consumption.

AmFast has built-in support for using any of the 3 messaging strategies with either a multi-threaded WSGI server, or the Twisted web framework. One thing to think about when implementing a multi-threaded server is that any client using long-polling or streaming may consume a thread for a significant amount of time, even if all it's doing is waiting for a new message. I haven't done any tests, but I'm guessing that Twisted's single-threaded, asynchronous design can handle more concurrent connections then a multi-threaded implementation.

On the client side, the Actionscript Consumer class is used to receive messages from a server, and the Producer class is used to send messages to a server. Any type of object can be pushed to clients, so the same technology can be used to build many different types of applications.

In the next post I'll explain the details of using AmFast's HTTP polling and HTTP long-polling channels to create a simple chat client. In the following post I'll show an example of using HTTP streaming to create an app where a master client can manipulate a graphical entity on the screen, and changes are replicated to all other subscribed clients.

Sunday, March 22, 2009

AmFast 0.2.2 Released!

The latest version of AmFast is out. Improvements include an Actionscript code generator, an example using the Twisted framework, and several bug fixes.

Wednesday, March 11, 2009

AmFast 0.2 Released!

Check out the latest version of AmFast. The 0.2 version implements AMF0, AMF3 and supports configurable remoting with NetConnection and RemoteObject.

It's fast, flexible, and easy to use.

Monday, March 2, 2009

AmFast Released!

I released the alpha version of AmFast today. AmFast is an AMF3 encoder/decoder Python extension.

Some un-sophisticated testing shows that AmFast is around ~18x quicker than PyAmf. 10,000 runs through the test_complex_encode_decode_dict() unit test takes ~105 seconds on my machine using PyAmf, and ~5.8 seconds using AmFast.

If you want to try it out, you can download the package from PyPi, and take a look at the meager documentation.

AmFast supports all of the data types that PyAmf does, but there is not yet any functionality for implementing remoting, or working with AMF0. I'm hoping that AmFast will be able to integrate into PyAmf, so that you can optionally use either the existing pure Python AMF3 encoder/decoder, or AmFast within the PyAmf framework.

This was my first time working with the Python C API, and I think it's pretty cool. Besides having to remember to increment and decrement Python object reference counts in the right spots, it was pretty easy to learn. I'm not anywhere near a C expert, and I wrote this extension in my spare time over the course of a couple of weeks.

Thursday, February 19, 2009

Plater Released!

I released Plater 0.1 today.

Plater is a Python package that contains common LIMS classes that can be used to build a custom LIMS application.

Instead of starting your LIMS application from scratch, use the Plater package to represent common LIMS domain models, and piece them together in whatever configuration your application requires.

Take a look at the Plater project page for details on how to download/install/use Plater.