Showing posts with label sqlalchemy. Show all posts
Showing posts with label sqlalchemy. Show all posts

Tuesday, August 19, 2008

A Little Command Line Love

One of the things I do in my "spare" time is work on building web applications that will (hopefully) earn some spare money on the side without too much maintenance on my part. Those who have read The Four Hour Work WeekImage will recognize this as my "muse" business.

In working on these web apps, I needed a place to host them, so I went with WebFaction due to their excellent support for TurboGears. I'm using a shared hosting environment with WebFaction, so my usual method of getting stuff up can't involve any scripts in /etc/init.d like I'd use at work, so I used to do the old nohup python start-appname.py >> output.log 2>&1 & to "daemonize" the process and then ps -furick446 to figure out which processes I needed to kill/restart when updating code. This was irritatingly verbose, so I figured I'd build a "userspace daemonizer" which I'll describe below.

The requirements of my daemonizer were pretty straightforward:

  • I should be able to add/remove services from the command line with a minimum amount of typing

  • I should be able to start/stop/restart any service just by listing it by name (no more ps -furick446)

  • It should do "real" daemonization (fork, fork, dup stuff if you're familiar with it) rather than the nohup garbage I'd been using

  • It should perform some sort of verification or status check on processes to make sure that they're successfully started/stopped/running/etc.


The first (and most questionable) decision I made was to go with SQLAlchemy as my service database. I say this is questionable because I ended up with only one table and one client that uses the database, so a fully relational model is really overkill here. Anyway, here's the SQLAlchemy setup code I used. It's pretty simple, and represents kind of the "lowest common denominator" of SQLAlchemy usage. (I've also included all the imports required for the whole shebang at the top of the file.)

#!/usr/bin/env python2.5
import os, sys, signal, shlex, time
from optparse import OptionParser

from sqlalchemy import *
from sqlalchemy.orm import *

HOME=os.environ.get('HOME')
DBFILE=os.path.abspath(os.environ.get(
'SERVER_FILE',
os.path.join(HOME, 'server.sqlite')))

DBURI='sqlite:///' + DBFILE

engine = create_engine(DBURI)
metadata = MetaData(engine)
session = scoped_session(sessionmaker(bind=engine))

process = Table(
'process', metadata,
Column('name', String(255), primary_key=True),
Column('pid', Integer),
Column('command_line', String(255)),
Column('working_directory', String(255)),
Column('stdin', String(255), default='/dev/null'),
Column('stdout', String(255), default='/dev/null'),
Column('stderr', String(255), default='/dev/null'))

class Process(object):
def __repr__(self):
return '%s(PID %s, WD %s): %s < %s >> %s 2>> %s' % (
self.name, self.pid, self.working_directory,
self.command_line, self.stdin, self.stdout, self.stderr)
session.mapper(Process, process)

So, for all of you out there who wonder how to use SQLAlchemy outside of a web framework, there you go. The idea here is that every service has a row in the process table, and every service can optionally redirect its stdin/stdout/stderr to/from files. If a process is running, it will have a pid. You can also specify the startup directory of each process. That pretty much covers the data model.

My next task was to figure out how to invoke this from the command line. I decided to make all the commands of the form python server.py command [options] [service]. Actually, the only command that takes any options is the add command, so I definitely over-engineered here, but I wanted to be able to specify a global list of options shared by all commands to be used with an optparse option parser. So I built the following dictionary:

OPTPARSE_OPTIONS = {
'working-directory':(['-w', '--working-directory'],
dict(dest='ws', default=HOME,
help='Set working directory to DIR',
metavar='DIR')),
'stdin':(['-i', '--stdin'],
dict(dest='stdin', default='/dev/null',
help='Use FILE as stdin', metavar='FILE')),
'stdout':(['-o', '--stdout'],
dict(dest='stdout', default='/dev/null',
help='Use FILE as stdout', metavar='FILE')),
'stderr':(['-e', '--stderr'],
dict(dest='stderr', default='/dev/null',
help='Use FILE as stderr', metavar='FILE')),
}

The next task was to specify the commands. I'm lazy and I like decorators, so I decided that new commands should be as easy as possible to write. My add command, for instance, is the most complex, and it looks like this:

@Command
def add(service, command, working_directory,
stdin, stdout, stderr):
p = Process(name=service,
command_line=command,
working_directory=working_directory,
stdin=stdin,
stdout=stdout,
stderr=stderr)
session.commit()

That's pretty simple. Of course, as you might have guessed, the Command decorator isn't all that simple. Its responsibilities are as follows:

  • Add the function name to a command list so I can get a list of commands from the command line

  • Build an optparse parser based on the OPTPARSE_OPTIONS dict and the named arguments to the function

  • Wrap the function in a new function with a signature like function(args) so it can be called with sys.argv[2:]

  • In the wrapper function, use the optparse parser to initialize the argument list to the function and then call with a named argument dict


So with all that explanation, here's the code:

class Command(object):
commands = {}
def __init__(self, func):
Command.commands[func.__name__] = self
options = self.get_options(func)
self.func = func
self.optparse_options = []
self.optparse_option_names = []
self.positional_options = []
for o in options:
if o in OPTPARSE_OPTIONS:
self.optparse_option_names.append(o)
self.optparse_options.append(
OPTPARSE_OPTIONS[o])
else:
self.positional_options.append(o)
positional_string = ' '.join(
'<%s>' % o for o in self.positional_options)
self.parser = OptionParser('%%prog [options] %s' % positional_string,
prog=func.__name__)
for args, kwargs in self.optparse_options:
self.parser.add_option(*args, **kwargs)

def get_options(self, func):
code = func.func_code
return [ vn for vn in code.co_varnames[:code.co_argcount] ]

@classmethod
def run(klass, args):
if args and args[0] in klass.commands:
return klass.commands[args[0]](args[1:])
else:
print 'Unrecognized command'
print 'Acceptable commands:'
for name in klass.commands:
print ' -', name

def __call__(self, args):
(opts,a) = self.parser.parse_args(args)
if len(a) != len(self.positional_options):
self.parser.error('Wrong number of arguments')
o = {}
for name in self.optparse_option_names:
o[name] = getattr(opts, name, None)
for name, value in zip(self.positional_options, a):
o[name] = value
return self.func(**o)

Once I've built the decorator, the commands are fairly straightforward (as the add command shows). Here are the "short" commands:

@Command
def initialize():
try:
metadata.drop_all()
except:
pass
metadata.create_all()
print 'Service database initialized'

@Command
def list():
q = Process.query()
if q.count():
for p in Process.query():
print p
else:
print 'No processes'

@Command
def add(service, command, working_directory,
stdin, stdout, stderr):
p = Process(name=service,
command_line=command,
working_directory=working_directory,
stdin=stdin,
stdout=stdout,
stderr=stderr)
session.commit()

@Command
def remove(service):
p = Process.query.get(service)
session.delete(p)
print 'Removed %s from the service list' % service
session.commit()

@Command
def status(service):
p = Process.query.get(service)
print p

Starting and stopping processes is a little more complex, but not too bad. First off, I needed a way to determine if a process was running. I didn't want to parse the ps -furick446 results, so I send a unix signal 0 to the PID (which doesn't do anything to the receiving process). If there's an exception, the process is either not running or not owned by me. So here's the is_running code:

def is_running(pid):
try:
os.kill(pid, 0)
return True
except Exception, ex:
return False

I also need a daemonizer function that will start, daemonize, and set the PID of a process object:

def daemonize(p):
pid = os.fork()
if pid: return # exit first parent
os.chdir(p.working_directory)
os.umask(0)
os.setsid()
pid = os.fork()
if pid:
Process.query.get(p.name).pid = pid
session.commit()
sys.exit(1) # exit second parent
si = open(p.stdin, 'r')
so = open(p.stdout, 'a+')
se = open(p.stderr, 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
args = shlex.split(p.command_line.encode('utf-8'))
os.execvp(args[0], args)

Once these are defined, I can start, stop, and restart processes fairly simply:

@Command
def start(service):
print 'Starting %s ...' % service
p = Process.query.get(service)
if p.pid is not None:
if is_running(p.pid):
print '... %s already running with PID %s' % (
service, p.pid)
return
p.pid = 0
session.commit()
daemonize(p)
print '... started %s' % service

@Command
def stop(service):
print 'Stopping %s ...' % service
p = Process.query.get(service)
if not p.pid or not is_running(p.pid):
print '... service %s is already stopped' % service
p.pid = None
session.commit()
return
for retry in range(5):
print '... sending SIGTERM to %s' % p.pid
os.kill(p.pid, signal.SIGTERM)
time.sleep(0.5)
if not is_running(p.pid):
break
else:
print '... sending SIGKILL to %s' % p.pid
os.kill(p.pid, signal.SIGKILL)
time.sleep(0.5)
if is_running(p.pid):
print '... process %s could not be killed' % p.pid
return
p.pid = None
session.commit()
print '... %s is stopped' % service

@Command
def restart(service):
stop([service])
start([service])

Finally, I need to hook the script up and make sure it runs from the command line:

def main():
Command.run(sys.argv[1:])

if __name__ == '__main__':
main()

And there you have it! A userspace daemonizer that lets you manage an arbitrary number of services. It is definitely overkill in many ways, but hopefully the sharing the process of building it will be as educational to you as it was to me.

Wednesday, August 13, 2008

Miruku - Migrations for SQLALchemy

One of the painful things about working with any database-oriented project in production is that you can't just drop the database and re-create every time you have a schema change. (Of course, you could do that, but your users might get a little miffed when their data disappears.) Rails and Django have for this reason had support for migrating from one schema version to another. Well, now SQLAlchemy has a automatic migrations tool by the name of Miruku. I haven't tried it, and it's an extremely early version (0.1a7), but it looks promising. Have a look here, and let me know what you think.

Friday, July 11, 2008

Cascade Rules in SQLAlchemy

Last night at the PyAtl meeting, there was a question about how you define your cascade rules in SQLAlchemy mappers. I'll confess that it confused me at first, too, but here's all you need to know:

What's "cascading" in the mapper is session-based operations. This includes putting an object into the session (saving it), deleting an object from the session, etc. Generally, you don't care about all that stuff, because it Just Works most of the time, as long as you specify cascade="all" on your relation() properties in your mappers. What this means is "whatever session operation you do to the mapped class, do it to the related class as well".

One little confusing thing is that there's another thing you'll often want to specify in your cascade rules, and that's the "delete-orphan". In fact, most of my 1:N relation()s look like:

mapper(ParentClass, parent, properties=dict(
children=relation(ChildClass, backref='parent',
cascade='all,delete-orphan')
)
)

The "delete-orphan" specifies that if you ever have a ChildClass instance that is "orphaned", that is, not connected to some ParentClass, go ahead and delete that ChildClass. You want to specify this whenever you don't want ChildClass instances hanging out with null ParentClass references. Note that even if you don't specify "delete-orphan", deletes on the ParentClass instance will still cascade to related ChildClass instances. An example is probably best. Say you have the following schema and mapper setup:

photo = Table(
'photo', metadata,
Column('id', Integer, primary_key=True))
tag = Table(
'tag', metadata,
Column('id', Integer, primary_key=True),
Column('photo_id', None, ForeignKey('photo.id')),
Column('tag', String(80)))

class Photo(object): pass

class Tag(object): pass

session.mapper(Photo, photo, properties=dict(
tags=relation(Tag, backref='photo', cascade="all"),
session.mapper(Tag, tag)

I'll go ahead and create some photos and tags:

p1 = Photo(tags=[
Tag(tag='foo'),
Tag(tag='bar'),
Tag(tag='baz') ])
p2 = Photo(tags=[
Tag(tag='foo'),
Tag(tag='bar'),
Tag(tag='baz') ])
session.flush()
session.clear()

Now if I delete one of the photos, I'll delete the tags associated
with it, as well:

>>> for t in Tag.query():
... print t.id, t.photo_id, t.tag
...
1 1 foo
2 1 bar
3 1 baz
4 2 foo
5 2 bar
6 2 baz
>>> session.delete(Photo.query.get(1))
>>> session.flush()
>>> for t in Tag.query():
... print t.id, t.photo_id, t.tag
...
4 2 foo
5 2 bar
6 2 baz

At this point, everything is the same whether I specify
"delete-orphan" or not. The difference is in what happens when I
just remove an item from a photo's "tags" collection:

>>> p2 = Photo.query.get(2)
>>> del p2.tags[0]
>>> session.flush()
>>> for t in Tag.query():
... print t.id, t.photo_id, t.tag
...
4 None foo
5 2 bar
6 2 baz

See how the "foo" tag is just hanging out there with no photo?
That's what "delete-orphan" is designed to prevent. If we'd
specified "delete-orphan", we'd have the following result:

>>> p2 = Photo.query.get(2)
>>> del p2.tags[0]
>>> session.flush()
>>> for t in Tag.query():
... print t.id, t.photo_id, t.tag
...
5 2 bar
6 2 baz

So there you go. If you don't mind orphans, then use
cascade="all" and leave off the
"delete-orphan". If you'd rather have them disappear when
disconnected from their parent, use
cascade="all,delete-orphan".

PyAtl: SQLAlchemy Theme Night

Well, last night was the Python Atlanta user group meeting (PyAtl). It had been a while since I've been, and I'd forgotten how fun it can be. The theme was SQLAlchemy, and the speaker lineup was me, Brandon Craig Rhodes, and James Fowler.

The meeting started off with "shooting the breeze" as usual, and then moved into my presentation "Essential SQLAlchemy", which gives a 30 minute overview of the basics of SQLAlchemy. Here are the usual links to slides and the video:



After my talk, Brandon Craig Rhodes (who is, by the way, an incredibly lively presenter, using nothing but emacs!) gave a talk "SQLAlchemy Advanced Mappings" that focused on using the ORM layer in SQLAlchemy. It really was more of a mini-tutorial that took you through basic mappings all the way through relations, backrefs, and more. SQLAlchemy is an amazingly rich library, and it's hard to squeeze a talk into half an hour. Here's the video:



After Brandon, James Fowler did a "now for something completely different" kind of talk on wxPython, "WxPython Quick Bite", focusing on how you can make wxPython (designed to be event-driven and single-threaded) play nicely in a multi-threaded environment. Unfortunately the start of the video was cut off as I feverishly tried to download the other two videos to make room for James's talk. I'll post the video as soon as it gets uploaded.

I'd be remiss if I didn't thank O'Reilly for "sponsoring" the meetup with a giveaway of a number of books (including 9 copies of Essential SQLAlchemy, which I stuck around afterwards to sign). We also had a couple of copies of Beautiful CodeImage, Beginning Development with Python GamingImage, and HackerteenImage to give away. A great time was had by all!

Friday, June 06, 2008

Essential SQLAlchemy Ships

Well, I finally got my hands on the first (that I know of) copy of my book Essential SQLAlchemy. You can get it from Amazon using the link below. (Disclaimer: this is my affiliate link. Hey, it's my blog after all, right? ;-) )


I think the book turned out well, although I was a little surprised at how thin it was. 200 or so pages seems like a lot more when you're writing (or reading) it. Many thanks go out to all who put up with me during the writing of the book, including my coworkers, editors, reviewers, and my amazing wife Nancy. (Not to mention my son Matthew, who was born during the proofreading phase of the book. I wisely delayed my proofreading until we all got home from the hospital.)

Anyway, here is a picture of me with the book at work taken by my coworker Jenny Walsh on her iPhone:
Image

I'll try to post some more about SQLAlchemy at some point. It really is a fantastic library. Thanks go to Mike Bayer and all the other contributors for writing it!

Edit 7:16pm June 6, 2008
As I have been reminded by Noah, I will be presenting a talk on SQLAlchemy for the July 10th Python Atlanta user group PyAtl, followed by a book signing. I should have a few books available there to sell, or you can (obviously) bring our own. You can RSVP for the meeting at the PyAtl Meetup Site.

Thursday, February 14, 2008

PyAtl BloxAlchemy Talk

I gave a talk at the Atlanta Python user group last night on a project I've been working on called BloxAlchemy. BloxAlchemy is bascially a SQLAlchemy-inspired library that targets a logical database known as LogicBlox instead of a SQL database. I've uploaded the slides here, if you're interested.

Update 2/16/08: The video of the talk is now available on Google video here. If you're going to watch the talk, I'd recommend keeping the slides opened in a separate window while watching, since I make a lot of references to code on the slides that isn't in the video.

Monday, January 07, 2008

Cascading DROP TABLE with SQLAlchemy

A little quirk that can get you if you're using SQLAlchemy to create and drop your database is that PostgreSQL doesn't allow you to drop a table that has other tables referring to it via FOREIGN KEY constraints. PostgreSQL has a DROP TABLE ... CASCADE command that supports this (and drops all the dependent tables) but there's no easy way to use the DROP TABLE ... CASCADE statement. It's not too hard to make it available, though.

It turns out, however, that SQLAlchemy has a nice, pluggable database dialect system that is fairly simple to update. One part of this dialect system is a "SchemaDropper". So to cascade the DROP TABLE statements, I just created the following SchemaDropper (derived from the existing PostgreSQL PGSchemaDropper) and installed it as the default PostgreSQL dialect schemadropper. (Most of the code is copied from the base SchemaDropper class in sqlalchemy.sql.compiler)


from sqlalchemy.databases import postgres

class PGCascadeSchemaDropper(postgres.PGSchemaDropper):
def visit_table(self, table):
for column in table.columns:
if column.default is not None:
self.traverse_single(column.default)
self.append("\nDROP TABLE " +
self.preparer.format_table(table) +
" CASCADE")
self.execute()

postgres.dialect.schemadropper = PGCascadeSchemaDropper


And that's it!