Herbert Breunung’s The Pearl Metaphor has been at the top for days. Did he manage to break it just by writing a post with a future date?
Archive for October, 2010
What Is Going On With Ironman Perl?
Posted in Perl, tagged broken feed on October 28, 2010| 4 Comments »
Why Micro-Benchmark?
Posted in Perl, tagged benchmarking, databases on October 25, 2010| 2 Comments »
I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:
and
That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.
I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.
The two alternatives that immediately come to mind are.
- create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
- micro-benchmark them with something similar to my read/write data profile.
Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.
Perl And Type Checking
Posted in Perl, tagged rant, typeful programming, types on October 20, 2010| 4 Comments »
(aka known as which language am I using again?)
Dave Rolsky has a new post on perl 5 overloading. It’s fairly informative, but it contains this little gem (emphasis mine):
Defensive programming, for the purposes of this entry, can be defined as "checking sub/method arguments for sanity".
Blanket statements like this really get my gripe up. Let’s have a look at defensive argument checking in Perl taken to its illogical conclusion.
Defensive Primitive Type Checking
use strict; use warnings; use Carp; use Scalar::Util 'looks_like_number'; sub can_legally_drink { my $age = shift; croak "$age is not a number" unless looks_like_number($age); return $age >= 18; } print can_legally_drink('x'), "\n";
And fantastic news! can_legally_drink correctly detected that my argument isn’t a number.
x is not a number at age.pl line 10
main::can_legally_drink('x') called at age.pl line 14
But hang on a minute. Not all integers are ages. Surely we want to check if a real age type was passed in.
Checking For A ‘Real’ Type
My stripped down defensive age type might look something like this.
package age; use Carp; use Scalar::Util 'looks_like_number'; sub isa_age { my $arg = shift; return ref($arg) and blessed $arg and $arg->isa('age'); } sub new { my ($class, $years) = @_; croak "$years is not a number" unless looks_like_number($years); bless { years => $years }, $class; } sub years { return $_[0]->{'years'} } sub less_than { my ($self, $other) = @_; croak "$other is not an age" unless isa_age($other); return $self->years() < $other->years(); }
And then my calling code can look like this:
package main; sub can_legally_drink { my $age = shift; croak "$age is not an age" unless $age->isa('age'); return ! $age->less_than(age->new(18)); } print can_legally_drink(age->new(18)), "\n"; print can_legally_drink(18), "\n";
And woohoo, the second line throws an error as I wanted.
Actually, I don’t write Perl like this. Dave, you probably want to avoid working on code with me, thanks.
Moose
To be fair, Rolsky is talking his own book. Moose has a bunch of stuff that handles all this type checking malarky nice and cleanly. If you’re building something big in Perl, you should take a look.
But if you really care about types, I mean defensive programming that much, you could use a statically typed language instead and then you even get fast code thrown in for free.
Tokyo Cabinet vs Berkeley DB
Posted in Perl, tagged benchmarking, berkeley db, tokyo cabinet on October 14, 2010| 4 Comments »
In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.
I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.
First we need to get the database handle.
my $tc_file = "$ENV{HOME}/test.tc"; unlink $tc_file; my $hdb = TokyoCabinet::HDB->new(); if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){ my $ecode = $hdb->ecode(); printf STDERR ("open error: %s\n", $hdb->errmsg($ecode)); }
Presumably putasync is the fastest database put method.
my $ORDER_ID = 0; sub store_record_tc { my ($db, $ref_record, $no_sync, $json) = @_; $json //= 0; $no_sync //= 0; $ref_record->{'order_id'} = ++$ORDER_ID; my $key = "/order/$ref_record->{'order_id'}"; $db->putasync($key, $json ? encode_json($ref_record) : Storable::freeze($ref_record)); }
I needed to amend store_record to compare json and storable too.
sub store_record { my ($db, $ref_record, $no_sync, $json) = @_; $json //= 0; $no_sync //= 0; $ref_record->{'order_id'} = ++$ORDER_ID; my $key = "/order/$ref_record->{'order_id'}"; $db->db_put($key, $json ? encode_json($ref_record) : Storable::freeze($ref_record)); $db->db_sync() unless $no_sync; }
The benchmarking code looks like this.
Benchmark::cmpthese(-1, {
'json-only-50/50' => sub { json_only($db, $rec_50_50) },
'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },
'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },
'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },
});
And the results are as follows:
Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50 7791/s -- -9% -39% -47% -59% -81%
json-no-sync-50/50 8605/s 10% -- -33% -41% -55% -79%
freeze-no-sync-50/50-tc 12800/s 64% 49% -- -13% -33% -69%
json-no-sync-50/50-tc 14698/s 89% 71% 15% -- -23% -64%
freeze-only-50/50 19166/s 146% 123% 50% 30% -- -54%
json-only-50/50 41353/s 431% 381% 223% 181% 116% --
Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.
Edit: json and no_sync parameter switch has been fixed.
Finding Useful Posts
Posted in Blogging, Emacs, Perl on October 7, 2010| 2 Comments »
One of the weaknesses of many blogs, including my own, is the difficulty of finding old, useful articles. There are a few ways to find articles written previously, such as the Archives, the categories and the tags. But to be honest, a lot of the stuff I write is only relevant (at best) at the time of publishing. And even I have difficulty finding my useful posts again.
There are several possible solutions, e.g. I could tag pages within delicious as curiousprogrammer/useful. For the moment, I’ve decided to keep a blog highlights page, and list the posts which are useful for me. Later on I might try a more comprehensive index.
Nested Maps in Perl
Posted in Perl, tagged functional programming, map, nested loops on October 4, 2010| 11 Comments »
I’m not really a fan of nested loops, so when I need to create a list of combinations based on two or three other lists, I really miss list comprehensions1 such as those in Python.
l = [(x,y) for x in range(5) for y in range(5)]
Very elegant.
If I was using Lisp, I might use a nested mapcar.
(mapcar (lambda (out) (mapcar (lambda (in) (cons in out)) '(a b c d e))) '(1 2 3 4 5))
Perl map uses $_ so you don’t need to explicitly specify the name of the lambda parameter. How can I differentiate between the outer $_ and the inner $_?
my $outer; my @x = map { $outer = $_; (map { $outer . $_ } qw(a b c d e)) } (1..5);
Note: if you are trying to do something as simple as this, take a look at Set::CrossProduct instead.