Aftermarket Pipes: python

Showing posts with label python. Show all posts

Tuesday, September 06, 2011

Case Study: Python as Secret Weapon for C++ Windows Programming

One of my favorite features of Python is its interactive shell. If you want to try something, you type in the code and try it immediately. For someone whose first coding environment was the equally-immediate Applesoft Basic, this is just as natural. But if your introduction to programming was C, C++, or Java, the benefits might not be apparent, especially if you're trying to do exploratory coding in one of those languages.

So I'm going to walk through a recent experience as a case study.

The Problem

At work we develop a Windows program that talks to certain devices via serial cables. Those devices also come in wireless Bluetooth flavors, and we connect to them via a "virtual serial port". To the program running, it looks as if the Bluetooth device is plugged into a real serial port, because all of the wireless connectivity is abstracted away by Windows. These devices are unidirectional--they transmit data to the Windows program, which passively reads it.

If you power off and restart one of these wired devices, it will start chattering away at the Windows program with hardly a hiccup--our program never even sees a disconnect. However, we noticed that this didn't happen with the Bluetooth devices: powering down one of those requires the Windows app to reconnect. But the Windows app didn't even seem to get any notification that the device disconnected. So how do you solve this chicken and egg problem?

Research

It had been a while since I'd done any actual hardware serial programming, so I started with some documentation, and remembered that the RS-232 serial spec included a line called DCD, or Data Carrier Detect (also called RLSD, for Receive Line Signal Detect). Back in the dinosaur days, this signal meant that your modem was connected to the remote modem, and was able to start communicating back and forth.

Sure enough, a search brought up the right bit of Win32 API documentation, which told me how to detect an RLSD change on a physical serial port using the SetCommMask and WaitCommEvent calls. The question now became "does the Microsoft virtual serial port for Bluetooth support RLSD"?

Exploration

At this point I could have started up Visual Studio, created a scratch project, written a couple dozen lines of C++ code, compiled and linked, fixed the compile errors, compiled and linked again, run the program, fixed the inevitable errors that the compiler didn't catch, and then had my answer.

But I'm too impatient to wait for Visual Studio to start up, too lazy to write C++ when I don't have to, and I have the hubris to think I can come up with something better than the obvious solution. Programmers are funny like that.

So instead, I cranked up DreamPie.

Secret Weapon #1: DreamPie

DreamPie is, very simply, my favorite cross-platform interactive Python interpreter. It began life as a fork of Python's built-in IDLE command shell, and from there it's never looked back. It has excellent interactive completion for packages (so you can type "from sys import s" and get a list of "stdin, stdout, stderr").

Even better, it does completion when you're typing file paths in arbitrary strings. I use this a lot to get to modules I'm trying to test: "import os,sys; sys.path.append('c:/src/'" gives me a list of all the directories in in "c:/src".

It also has a slick separation of (typed) input and (generated) output, and a neat "copy only code" feature that makes it perfect for "try this code interactively, and when it works the way I want it, yank it into the actual source file" exploration.

DreamPie works pretty much the same on both Linux and Windows systems. It's reputed to work well on Mac systems, too, but I don't use them for day-to-day development.

So where I'd normally crank up the Python command interpreter for interactive exploratory coding, I usually reach for DreamPie instead.

But what I needed to explore now was the Windows API as called from C++, not Python.

Secret Weapon #2: ctypes

ctypes is a "foreign function interface" (FFI) that's been part of Python since version 2.5. An FFI is just a way to call code that isn't written in your current programming language. In our case, the functions I wanted to call in order to test out serial port notification are in the kernel32.dll library, which is part of Windows. ctypes makes this really easy. Well, easy if you happen to have the Windows API documentation and all of the correct C header files handy, and if you know exactly what you're looking for:

>>> import ctypes

... file_mode = 0x80000000 # GENERIC_READ from <winnt.h>

... open_existing = 3 # from <winbase.h>

... buffer = ctypes.create_string_buffer(100)

... bytes_read = ctypes.c_ulong(0)

... hfile = ctypes.windll.kernel32.CreateFileW(r'\\.\COM17', file_mode, 0, None, open_existing, 0, None)

... ctypes.windll.kernel32.ReadFile(hfile, buffer, 100, ctypes.byref(bytes_read), None)

... buffer.value

0: b'\r\n052100746029\r\n'

>>>

Hooray. We can call the Win32 API functions to open the serial port and read from it, just like we would from C code.

But... that's an awful lot of crap to remember and type. I had to know exactly the C code I wanted to write. I had to know the Windows API well enough to find the constants and the functions to call. I had to know the ctypes API well enough to wire up Python to the C return values via ctypes buffers.

What a chore. Did I mention I'm lazy?

ctypes is the universal adapter--it can connect Python code to anything. But if you're specifically looking to call the Windows API, there's an even better tool:

Secret Weapon #3: PyWin32

PyWin32 predates ctypes, but it has a similar goal: gluing Python to something else. In this case, something else is specifically the entire Win32 API. PyWin32 consists of about two dozen modules, for example, "win32print" for printing, or "win32gui" for window handling, which wrap a good portion of the Win32 API.

The documentation is rather Spartan, but if you know the Win32 API side, you can map those calls to the PyWin32 modules without too much pain. The 8-line, hard-to-remember ctypes example above turns into just four lines of simpler code using PyWin32:

>>> import win32file # for CreateFile

... import win32con # for constants

... hfile = win32file.CreateFileW(r'\\.\COM17',

... win32con.GENERIC_READ,

... 0,

... None,

... win32con.OPEN_EXISTING,

... 0,

... None)

... win32file.ReadFile(hfile, 50, None)

0: (0, b'\r\n052100746029\r\n')

The Final Secret Weapon

My actual exploratory DreamPie session to see if Window's virtual Bluetooth serial port supported RLSD looked like this:

>>> import win32api, win32file, win32con

>>> hfile = win32file.CreateFileW(r'\\.\COM17', win32con.GENERIC_READ | win32con.GENERIC_WRITE, 0, None, win32con.OPEN_EXISTING, 0, None)

>>> win32file.GetCommMask(hfile)

0: 0

>>> win32file.SetCommMask(hfile, win32con.EV_RLSD)

Traceback (most recent call last):

File "", line 1, in

win32file.SetCommMask(hfile, win32con.EV_RLSD)

AttributeError: 'module' object has no attribute 'EV_RLSD'

>>> win32file.SetCommMask(hfile, win32file.EV_RLSD)

>>> win32file.GetCommMask(hfile)

1: 32

>>> win32file.EV_RLSD

2: 32

>>> win32file.WaitCommEvent(hfile)

3: (0, 32)

>>>

This is an actual copy of the DreamPie buffer from my test session, mistakes and all. This is what really happened when I tried to figure out if RLSD would work:

I typed up the code to open the serial port, which I knew should succeed, and it did.
I looked up the Win32 API call to get the "event mask", or the set of events that were being watched on the serial port handle, and saw that it was "GetCommMask". I blindly typed "win32file.GetCo", and lo and behold, DreamPie brought up a list of completions, which assured me that GetCommMask was there.
The Win32 API said that GetCommMask returned its result in a buffer passed into the call. Knowing that PyWin32 usually does a pretty good job of hiding return buffers, I decided to just try calling it with the input parameter, and got back zero. That made sense, if the serial port wasn't being monitored for events.
So I decided to push my luck: if GetCommMask worked, SetCommMask should work, too. A quick peek at the documentation, and... hrm. win32con didn't contain the "EV_RLSD" constant I was looking for to monitor the RLSD signal.
Well, I could have just typed the exact value (0x020) from the Windows docs... or I could just retype the line and use PyWin's autocompletion to see if win32file has the constant. I typed "win32file.EV_", and I had my answer. Then a quick re-test of GetCommMask() showed that the value was set.
The API docs claimed that WaitCommEvent should wait for one of the masked events to occur, and then return which one occurred. But the documentation showed that it took another of those return buffers. Thinking that PyWin32 might help me here, too: I typed "win32file.WaitCommEvent(hfile)", and the call appeared to block.
So I powered down the device, and within a few seconds, I was rewarded with the return value from WaitCommEvent: (0, 32). Aha. This meant that the Windows API version of WaitCommEvent returned 0 (for success), and that the return buffer contained 32, or EV_RLSD.

I included all the steps, including the mistakes, to show the last secret weapon: flexibility. Be willing to bounce back and forth between the documentation, the code you think should work, and the feedback you get both from the code under test and the tools you're using--and be willing to change your mental model based on that feedback.

In reality, this whole test took under five minutes from "Hmm... I wonder if I can use the DCD signal" to "Aha, looks like it works! Time to test it in C++." To be honest, I didn't even type out the whole ctypes version while testing--I started on it, realized that I'd have to look up and type all the constants by hand, then restarted DreamPie to jump over to PyWin32. Remembering that you can switch tools on the fly keeps you from getting stuck in ratholes that aren't directly related to the task at hand.

Flexibility is the key to fast and efficient exploratory coding. Using an interactive language like Python with a good set of support tools and libraries can be a secret weapon for speeding up exploratory coding--even when your target language is C++.

Wednesday, October 20, 2010

Switchpy

One of the consequences of the 2.x-to-3.x Python changeover is that I need to keep both versions around for a while on my Windows dev workstation.

Actually, strike that: I need to keep many versions around:

2.5.4, because that's the earliest version we support at work for some internal tools
2.6.6, because one particular internal tool jumped the gun and started using the "with" statement before we migrated to...
2.7, because that's what we're migrating those internal tools to (slowly)
3.1.2, because that's what we're targeting for new development
A "special" 3.1.2, which mimics the version we've modified for use in our embedded devices
The most recent 3.2 alpha, for testing
A 3.2 trunk install, for testing patches

Virtualenv doesn't exactly do what I want: you have to install it from within an already-installed version of Python, and it doesn't support Python 3 yet (although there is a fork that does). Plus it doesn't handle anything other than environment variables--it doesn't understand Windows' defaults.

Ned Batchelder wrote a neat script that does some of that, but again, it doesn't handle everything.

So starting from Ned's script, I came up with switchpy:

Supports Windows Python versions from 2.5 up to 3.2
Changes the local PATH environment in the current shell (via the same batchfile trick as mpath)
Updates the Registry-based associations (via code from Ned's script)
Pings Explorer so that if you run "python.exe" from the Start | Run command, it notices the update
Automatically reads installed official versions from the Registry, so you can say "switchpy 31" instead of "switchpy c:\python31"

So now, testing scripts in multiple versions of Python is as easy as:

C:\src\myscript>switchpy 25
Switching to Python at C:\Python25\...
Python is now C:\Python25\

C:\src\myscript>py.test
============================= test session starts =============================
python: platform win32 -- Python 2.5.4 -- pytest-1.3.0
test object 1: C:\src\myscript

myscript\tests\test_script.py ...

========================== 3 passed in 0.03 seconds ===========================

C:\src\myscript>switchpy 31
Switching to Python at C:\Python31\...
Python is now C:\Python31\

C:\src\myscript>py.test
============================= test session starts =============================
platform win32 -- Python 3.1.2 -- pytest-1.3.1
test object 1: C:\src\myscript

myscript\tests\test_script.py ...

========================== 3 passed in 0.03 seconds ===========================

For now, you can find switchpy in the same bitbucket repo as mpath; if I add any more scripts, I'll probably end up making it a more general repo.

Tuesday, February 16, 2010

Mpath: command-line path manipulation for Windows

I'm a command line geek. Windows' style of installing everything in its own directory makes it easier to clean up after uninstallation, but it makes for very long PATH environment variables. If I put every directory containing command line tools in the system path, it gets too long for Windows to handle. So I usually end up doing "PATH=%PATH%;c:\somethingelse\bin" just before I use it. That also makes for long paths over long runtimes, especially when you use it in batch files (since you end up with PATH=c:\somethingelse\bin;c:\somethingelse\bin;c:\somethingelse\bin;[rest of path] after multiple invocations).

So I wrote mpath. Mpath is a combination batch file and Python script that takes advantage of some quirks of the Windows command shell, to let a child process alter the environment of a parent command shell process (something that you typically can't do in win32, but mpath gets around it by creating a temporary batch file that gets executed in the parent process).

Syntax:

mpath pathname : prepends pathname to the current command shell's PATH, if it doesn't already exist.
mpath + pathname : appends pathname to the current command shell's PATH, if it doesn't already exist.
mpath - pathname : removes pathname from the current command shell's PATH, if it exists.

A quick demo:

 C:\> PATH=C:

C:\> PATH
PATH=C:

C:\> mpath c:\foo (prepend c:\foo to the path)

C:\> PATH
PATH=c:\foo;C:

C:\> mpath - C:\FOO (take it off the path--note case insensitivity)

C:\> PATH
PATH=C:

C:\> mpath + c:\foo (append c:\foo to the path)

C:\> PATH
PATH=C:\;c:\foo

C:\> mpath c:\foo (try to prepend it again--mpath knows it's already there)
c:\foo already in path.

C:\> PATH=%PATH%;c:\foo (silly user should have used mpath...)

C:\> PATH
PATH=C:\;c:\foo;c:\foo (now there are two copies!)

C:\> mpath - C:\FOO (but mpath takes care of that.)

C:\> PATH
PATH=C:

~~I've tested Mpath with Windows XP running Python 2.5 and 2.6. I know it doesn't work on 3.x; I plan on fixing that at some point when I need it.~~

Update: mpath is now tested on 2.5, 2.6, 2.7, and 3.1.

Monday, December 07, 2009

Five Pycon 2010 Talks I Need to See

Following the example of Catherine Devlin and Carl Trachte, I thought I'd put together a list of the five Pycon talks I need to see in 2010. But I couldn't--I struggled to get below a dozen. So here are the top five I need to see, plus the ones I'll probably kick myself for not seeing because they're undoubtedly going to be scheduled in the same slots as the top five:

1. Import this, that, and the other thing: custom importers (Brett Cannon)
This is an easy choice, because I'm about to be implementing one of these for work. Would have been be nicer if Pycon 2010 had been scheduled for September 2009, but I'll take what I can get.

2. Understanding the Python GIL (David Beazley)
Another easy choice. After reading lots of code and debugging thread issues in our embedded Python interpreter at work, I think have a decent grasp of the GIL implementation. Given David's mindbending generators tutorial last year and his GIL presentation from ChiPy, I expect this talk to be rich in things I will be disturbed to have learned.

3. Powerful Pythonic Patterns (Alex Martelli)
Alex's talk last year, Abstractions as Leverage, was curiously satisfying. He didn't present any facts I hadn't already heard or read, but his presentation made some new connections for me (in a "My God, it's full of stars!" way).

4. Threading is Not a Model (Joe Gregorio)
In the last few years, I've begun to see pervasive threading as a placebo more than a solution. To paraphrase JWZ, some people, when confronted with a problem, think, "I know, I'll spin up a new thread." Now they have two problems. In reality, they've usually created an unknown number of problems, bounded only at the lower end by the number two. I'm really interested in seeing what Joe brings to the discussion beyond the usual "threads, select(), or fork()" question.

5. Turtles All The Way Down: Demystifying Deferreds, Decorators, and Declarations (Glyf Lefkowitz)
I have a long history of utter contempt for the practice of using syntactic sugar to "re-define the language in order to provide a more concise, natural style" for a given purpose. Glyf says he "will try to convince you that all of this wonderful magic isn't all that weird". Sounds like a challenge. If you're not continually questioning your own biases, you're heading for a mental rut, so I'm going to try to attend this with an open mind (and probably leave with a thoroughly-bitten tongue).

These are the ones I will move heaven, earth, and lunch plans to see. The others I really want to attend are:

How Are Large Applications Embedding Python? (Peter Shinners). Totally relevant for work, but probably more elementary than I'd want.
What Every Developer Should Know About Database Scalability (Jonathan Ellis). Totally irrelevant for my current work, but I've had to work in this area in the past, so it's somewhat interesting, and I'm curious about what's changed lately.
Optimizations and Micro-Optimizations in CPython (Larry Hastings). Pure geeky personal interest.
New *and* Improved: Coming changes to unittest, the standard library test framework (Michael Foord). I'm not quite a test-driven development zealot, but I'm about as close as you can get without applying for membership.
Python Metaprogramming (Nicolas Lara). More pure geeky goodness.
Eventlet: Asynchronous I/O with a Synchronous Interface (Donovan Preston). I can't quite decide whether this is applicable to work or not, and there's only one way to find out.
Seattle: A Python-based Platform for Easy Development and Deployment of Networked Systems and Applications (Ivan Beschastnikh). I was quite disappointed by last year's sandboxing talk (the description didn't really let on that it was all about PyPy), so I'm hoping I can pick up more from this one.
Tests and Testability (Ned Batchelder). Probably more elementary-level than I'd like, but might have some good discussion.
On the Subject of Source Code (Ian Bicking). Another blue-sky talk by Ian? Yes, please.
Python's Dusty Corners (Jack Diederich). I have a feeling this will be like Doug Hellman's PyModule of The Week: 80% of it is "yeah, yeah, I knew that," and 20% is "oh, wow, how did I not know that?"

I wasn't terribly impressed by the tutorial list (other than the compiled Python one), so I'll probably pass on them, but the talks look even better than last year. See you in Atlanta!

Tuesday, June 03, 2008

Fun with itertools

Sometimes it's hard to shake old habits, especially when you've burned them into your brain as the "standard" way to do things. For example, I've been doing network programming with C and C++ for a very long time. One of the standard pieces of code I've written again and again is the "connect with backoff" pattern.

If a program needs a continuous network connection, and that connection is lost, it should try to reconnect. On the one hand, you want to reconnect as quickly as possible; on the other hand, you don't want to keep retrying (and failing) in a tight loop. So you use a "backoff" timer: after each attempt, you wait longer (up to a maximum limit).

As a C programmer, I would implement an algorithm that resembles this Python-like pseudocode:


# After the first failure wait half a second before retrying;
# double this each time up to eight seconds.
backoff_times = [.5, 1, 2, 4, 8]
cur_backoff = None

while 1:
   try:
       # Try to connect
       connect()
   except ConnectionError:
       # Failed; update the backoff counter
       if cur_backoff is None:
           cur_backoff = 0
       else:
           cur_backoff = min(len(backoff_times)-1, cur_backoff+1)
       # Wait to retry
       time.sleep(backoff_times[cur_backoff])
   else:
       # Success; reset the backoff timer
       cur_backoff = None

But in Python the code to manage the current backoff timer looks out of place.

In a high level language, when the ratio of "code that says what I want" to "code that tells the language how to do what I want" gets too low, you're doing it wrong. It means that you're spending too many mental cycles on the "how," and not enough on the "what".

In this case, Python gives me a better way to tell it just "what" I want it to do: use an iterator.


import itertools

def iter_pegged(seq):
   """Return an iterator that walks the sequence, and then 'pegs' on the last item."""
   return itertools.chain(seq, itertools.repeat(seq[-1]))

backoff_times = [.5, 1, 2, 4, 8]
cur_backoff = iter_pegged(backoff_times)

while 1:
   try:
       # Try to connect
       connect()
   except ConnectionError:
       # Wait to retry
       time.sleep(cur_backoff.next())
   else:
       # Success; reset the backoff timer
       cur_backoff = iter_pegged(backoff_times)

Other than the definition of iter_pegged, each line of code says what only what it wants to do, not how it wants to do it.

And that's what coding in a high level language is all about, no?