October | 2010 | A Curious Programmer

Archive for October, 2010

What Is Going On With Ironman Perl?

Posted in Perl, tagged broken feed on October 28, 2010| 4 Comments »

Herbert Breunung’s The Pearl Metaphor has been at the top for days. Did he manage to break it just by writing a post with a future date?

Read Full Post »

Why Micro-Benchmark?

Posted in Perl, tagged benchmarking, databases on October 25, 2010| 2 Comments »

I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:

"premature optimization is the root of all evil"

and

“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.

I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.

The two alternatives that immediately come to mind are.

create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
micro-benchmark them with something similar to my read/write data profile.

Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.

Read Full Post »

Perl And Type Checking

Posted in Perl, tagged rant, typeful programming, types on October 20, 2010| 4 Comments »

(aka known as which language am I using again?)

Dave Rolsky has a new post on perl 5 overloading. It’s fairly informative, but it contains this little gem (emphasis mine):

If you don’t care about defensive programming, then Perl 5’s overloading is perfect, and you can stop reading now. Also, please let me know so I can avoid working on code with you, thanks.

Defensive programming, for the purposes of this entry, can be defined as "checking sub/method arguments for sanity".

Blanket statements like this really get my gripe up. Let’s have a look at defensive argument checking in Perl taken to its illogical conclusion.

Defensive Primitive Type Checking

use strict;
use warnings;

use Carp;
use Scalar::Util 'looks_like_number';

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not a number" unless looks_like_number($age);
    return $age >= 18;
}

print can_legally_drink('x'), "\n";

And fantastic news! can_legally_drink correctly detected that my argument isn’t a number.

x is not a number at age.pl line 10
        main::can_legally_drink('x') called at age.pl line 14

But hang on a minute. Not all integers are ages. Surely we want to check if a real age type was passed in.

Checking For A ‘Real’ Type

My stripped down defensive age type might look something like this.

package age;

use Carp;
use Scalar::Util 'looks_like_number';

sub isa_age
{
    my $arg = shift;
    return ref($arg) and blessed $arg and $arg->isa('age');
}

sub new {
    my ($class, $years) = @_;
    croak "$years is not a number" unless looks_like_number($years);
    bless { years => $years }, $class;
}

sub years {
    return $_[0]->{'years'}
}

sub less_than {
    my ($self, $other) = @_;
    croak "$other is not an age" unless isa_age($other);
    return $self->years() < $other->years();
}

And then my calling code can look like this:

package main;

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not an age" unless $age->isa('age');
    return ! $age->less_than(age->new(18));
}

print can_legally_drink(age->new(18)), "\n";
print can_legally_drink(18), "\n";

And woohoo, the second line throws an error as I wanted.

Actually, I don’t write Perl like this. Dave, you probably want to avoid working on code with me, thanks.

Moose

To be fair, Rolsky is talking his own book. Moose has a bunch of stuff that handles all this type checking malarky nice and cleanly. If you’re building something big in Perl, you should take a look.

But if you really care about types, I mean defensive programming that much, you could use a statically typed language instead and then you even get fast code thrown in for free.

Read Full Post »

Tokyo Cabinet vs Berkeley DB

Posted in Perl, tagged benchmarking, berkeley db, tokyo cabinet on October 14, 2010| 4 Comments »

In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.

I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.

First we need to get the database handle.

my $tc_file = "$ENV{HOME}/test.tc";
unlink $tc_file;
my $hdb = TokyoCabinet::HDB->new();

if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){
    my $ecode = $hdb->ecode();
    printf STDERR ("open error: %s\n", $hdb->errmsg($ecode));
}

Presumably putasync is the fastest database put method.

my $ORDER_ID = 0;

sub store_record_tc
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->putasync($key, $json ? encode_json($ref_record)
                              : Storable::freeze($ref_record));
}

I needed to amend store_record to compare json and storable too.

sub store_record
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->db_put($key, $json ? encode_json($ref_record)
                            : Storable::freeze($ref_record));
    $db->db_sync() unless $no_sync;
}

The benchmarking code looks like this.

Benchmark::cmpthese(-1, {
    'json-only-50/50' => sub { json_only($db, $rec_50_50) },
    'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },

    'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
    'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },

    'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
    'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },
});

And the results are as follows:

        Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50     7791/s                   --                -9%                    -39%                  -47%              -59%            -81%
json-no-sync-50/50       8605/s                  10%                 --                    -33%                  -41%              -55%            -79%
freeze-no-sync-50/50-tc 12800/s                  64%                49%                      --                  -13%              -33%            -69%
json-no-sync-50/50-tc   14698/s                  89%                71%                     15%                    --              -23%            -64%
freeze-only-50/50       19166/s                 146%               123%                     50%                   30%                --            -54%
json-only-50/50         41353/s                 431%               381%                    223%                  181%              116%              --

Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.

Edit: json and no_sync parameter switch has been fixed.

Read Full Post »

Finding Useful Posts

Posted in Blogging, Emacs, Perl on October 7, 2010| 2 Comments »

One of the weaknesses of many blogs, including my own, is the difficulty of finding old, useful articles. There are a few ways to find articles written previously, such as the Archives, the categories and the tags. But to be honest, a lot of the stuff I write is only relevant (at best) at the time of publishing. And even I have difficulty finding my useful posts again.

There are several possible solutions, e.g. I could tag pages within delicious as curiousprogrammer/useful. For the moment, I’ve decided to keep a blog highlights page, and list the posts which are useful for me. Later on I might try a more comprehensive index.

Read Full Post »

Nested Maps in Perl

Posted in Perl, tagged functional programming, map, nested loops on October 4, 2010| 11 Comments »

I’m not really a fan of nested loops, so when I need to create a list of combinations based on two or three other lists, I really miss list comprehensions¹ such as those in Python.

l = [(x,y) for x in range(5) for y in range(5)]

Very elegant.

If I was using Lisp, I might use a nested mapcar.

(mapcar (lambda (out) (mapcar (lambda (in) (cons in out))
                              '(a b c d e)))
          '(1 2 3 4 5))

Perl map uses $_ so you don’t need to explicitly specify the name of the lambda parameter. How can I differentiate between the outer $_ and the inner $_?

my $outer;
my @x = map { $outer = $_;
              (map { $outer . $_ } qw(a b c d e)) }
            (1..5);

Note: if you are trying to do something as simple as this, take a look at Set::CrossProduct instead.

1. It’s available on the CPAN of course

Read Full Post »

A Curious Programmer

Thoughts on Perl and Emacs, technology and writing

Archive for October, 2010

What Is Going On With Ironman Perl?

Why Micro-Benchmark?

Perl And Type Checking

Defensive Primitive Type Checking

Checking For A ‘Real’ Type

Moose

Tokyo Cabinet vs Berkeley DB

Finding Useful Posts

Nested Maps in Perl

Read These First

Recent Posts

Links

Meta