Sikhote-Alin

Bridging the gap between SQL ao NoSQL: A state of the art

May 31, 2011 by Zamith

Here is a state of the art report I wrote on SQL and NoSQL, and a way to bring them closer. This is actually the theme of my master thesis, so you should probably get some more posts on this topic in the future.

Hasta. 😉

Artigo-MI-STAR

Posted in !* | Leave a Comment »

Running a Cassandra cluster with only one machine

May 9, 2011 by Zamith

I’ve noticed that if you want to run a cassandra cluster on your own pc, for the purpose of small tests, there is no guide in the wiki to do just that.

Therefore, here is how I’ve done it.

First of you’ll need to create an alias for you network interface:

Mac OS
ifconfig en0 alias 192.168.0.2

Linux
ifconfig eth0:0 192.168.0.2

Here I’ve chosen the en0 (or eth0) interface, but you can choose the one you like, and also the IP address you like.

The first file you’ll have to edit is the conf/cassandra.yaml:

Change the commit_log, data and saved_caches directories, so it doesn’t conflict with the ones from the previous “node”
Change the rpc_port (used for Thrift or Avro) to one that is free
Change the listen_address to the IP of your “fake” interface

Next open the conf/cassandra-env.sh file and change the JMX_PORT.

The last file to edit is the bin/cassandra.in.sh where you’ll need to change all the occurences of $CASSANDRA_HOME to the path of the “node”. For example, if you’re bin directory is in /XXX/YYY/node2/bin, the path is /XXX/YYY/node2.

You can do this to create as many nodes as you want, and then just run them as usual, with bin/cassandra -f

Posted in !* | Leave a Comment »

Inserting data with Thrift and Cassandra 0.7

May 1, 2011 by Zamith

A lot has changed from Cassandra 0.6 to 0.7, and sometimes it is hard to find examples of how things work. I’ll be posting how to’s on some of the most usual operations you might want to perform when using Cassandra, written in Java.

First of you have to establish a connection to the server:

TFramedTransport transport = new TFramedTransport(new TSocket("localhost", 9160));
Cassandra.Client client = new Cassandra.Client(new TBinaryProtocol(transport));
transport.open();

Here I’m using localhost and the default port for Cassandra, but this can be configured.

One difference to the previous versions of Cassandra is that the connection can be bound to a keyspace, and can be set as so:

client.set_keyspace("Keyspace");

With the connection established you need only the data to insert, now. This data is passed to the server in the form of mutations (org.apache.cassandra.thrift.Mutation).

In this example, I’ll be adding a column to a row in a column family in the predefined keyspace.

List<Mutation> insertion_list = new ArrayList<Mutation>();

Column col_to_add = new Column(ByteBuffer.wrap(("name").getBytes("UTF8")), ByteBuffer.wrap(("value").getBytes("UTF8")),System.currentTimeMillis());

Mutation mut = new Mutation();
mut.setColumn_or_supercolumn(new ColumnOrSuperColumn().setColumn(col_to_add));
insertion_list.add(mut);

Map<String,List<Mutation>> columnFamilyValues = new HashMap<String,List<Mutation>>();
columnFamilyValues.put("columnFamily",insertion_list);

Map<ByteBuffer,<String,List<Mutation>>> rowDefinition = new HashMap<ByteBuffer,<String,List<Mutation>>>();
rowDefinition.put(ByteBuffer.wrap(("key").getBytes("UTF8")), columnFamilyValues);

client.batch_mutate(rowDefinition,ConsistencyLevel.ONE);

The code is pretty much self explaining, apart from some values that can be reconfigured at will, as the encoding of the strings (I’ve used UTF8), and the consistency level of the insertion (I’ve used ONE).

In the case of the consistency levels you should check out Cassandra’s wiki, to better understand it’s usage.

To close the connection to the server it as easy as,

transport.close();

Hope you find this useful. Next I’ll give an example of how to get data from the server, as soon as I have some time. 😉

Posted in Computer Science | Leave a Comment »

Opening a new tab in the same directory and then some in Mac OS

February 18, 2011 by Zamith

For all of you that use the Mac OS Terminal, you’ve probably felt the frustation of opening a new tab and it opening on the $HOME path, unlike the Linux one’s, that open in the path you were in.

Well, I’ve written a script that kind of solves this problem, and adds some extra functionality that I find really helpfull.

#!/bin/bash

COMMAND=$1

if [ -z "$1" ]
then
  COMMAND="cd $(pwd)"
fi

/usr/bin/osascript 2>/dev/null <<EOF
activate application "Terminal"
tell application "System Events"
  keystroke "t" using {command down}
end tell
tell application "Terminal"
  activate
  do script "$COMMAND" in window 1
end tell     
return       
EOF

First let’s take a look at the applescript part (that’s the part between EOF). Applescript code is very readable, but what it does is to open a new tab in the terminal, and then run the code in the COMMAND variable in that newly open window.

Then, there is that little bit of bash code, that assigns the string passed as an argument to be run or, if none is provided, it changes the directory to the one you were in.

So, you can open a new tab, by calling the script or, and this is the very handy thing for writing other scripts, open a new tab and run code in that tab, by calling the script with the string as an argument.

Posted in Computer Science | Leave a Comment »

Parsing strings from the datepicker

February 10, 2011 by Zamith

In my previous post I explained how to create a datepicker with dynamic internationalization. There is one catch though, in different languages the representation of the date can be different, for example, February 5th can be 2/5/2011 in the USA and 5/2/2011 in Portugal. The rails default is the first.

This means that you’ll have to take this into account when using the string that you get from the form where you are using the datepicker. You have two options, change the way the dates are displayed, by altering all the languages javascripts, or parse the string as it gets to the controller.

The latter can be achieved with this piece of code:

 
todo.due_date =  DateTime.strptime(params[:date],"%d/%m/%Y").to_time

Note that I’ve chosen that format for the string, but it can be any format according to the ruby’s Time class.

There is one other problem though, the user can, maliciously or by distraction, insert an invalid date. We can strengthen our code to prevent this, by catching the exception thrown.

begin 
  todo.due_date =  DateTime.strptime(params[:date],"%d/%m/%Y").to_time
rescue ArgumentError
  flash[:error] = t("flash.invalid_date")
  redirect_to somewhere_in_the_app_path
  return
end

So, if the exception occurs, we set the flash error message (using the translation helper), redirect to the appropriate path, and then return, so that it does not complain of having multiple render or redirect calls.

Posted in Computer Science | Leave a Comment »

jQuery-UI datepicker dynamic internationalization in Rails

February 10, 2011 by Zamith

The jQuery UI datepicker is internationalizable, by chosing from one of the languages in the regional array, as such:

$(selector).datepicker($.datepicker.regional['en-GB']);

As is easy to see, this changes the datepicker language to english. In order for any other language, apart from english (which is the default), to work, we need to include a javascript file that defines the strings to be shown.

We can either include all the languages (http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.8/i18n/jquery-ui-i18n.min.js), or just the ones we need, that can be found here.

So far so good. But what if we want to include only the file we need, according to the system’s locale?

It’s pretty simple, and it prevents a user from having to download files he is not going to use, but just the one for the language he is viewing the site.

First, you’ll have to create a helper that checks the current locale and includes the file accordingly, so it can be called from the views that use the datepicker.

def include_i18n_calendar_javascript
  content_for :head do
    javascript_include_tag case I18n.locale
      when :en then "jquery.ui.datepicker-en-GB.js"
      when :pt then "jquery.ui.datepicker-pt-BR.js"
      else raise ArgumentError, "Locale error"
    end
  end
end

As you can see, this has a case that chooses the file according to the locale, an returns it to the javascript_include_tag helper that generates the HTML for the inclusion of a javascript file and places it in the header with the content_for helper.

Now you only have to call the helper in the view and add some javascript.

var counter = 0;
var locale = "
for(i in $.datepicker.regional){
  if(counter == 1)
  { locale=i; break; }
  counter++;
}

Because the regional array is not exactly an array, but an hash (or an associative array, in javascript terms), we will have to iterate through each of it’s objects. The one we want is the second, that the reason for the break. This object is the string the key in that associative array for the definitions of the locale we want. In the case of the first example, it would be “en-GB”.

Now, we just initialize the datepicker with this variable:

$.datepicker.setDefaults( $.datepicker.regional[ '' ] ); 
$( ".datepicker" ).datepicker($.datepicker.regional[locale]);

And that’s it. Now your datepickers are internationalized in a dynamic way.

PS: Of course you’ll need a textfield that has the HTML class datepicker (or any other you choose).

Posted in Computer Science | Leave a Comment »

A tragédia d’O ardiloso, ignóbil janota, quiçá dândi.

February 2, 2011 by Dr. E

A sumidade loquaz, perdulária pederasta tergiversou-se rubicundo e taciturno do seu perene engodo. Pois o seu fenecimento fugaz e frugal, sem arroubos, de uma petiz plissada realmente pérfida irrompeu numa balbúrdia ígnea e belicosa de jaez veneta. Ruando, perscrutando lugares recônditos à procura de homizio no âmago desse tão ignóbil como parco instante, permutei admoestas com injúrias, indagando ao facto que encerrou no seu incólume estrangulamento fleumático.

Posted in Art, Poetry | Leave a Comment »

Manipulating nested hashes in Ruby

January 5, 2011 by Zamith

Lately I needed to work with some nested hashes in ruby. By nested hashes I mean hashes with hashes (or any other type, actually) in them, something like this:

nested_hash = {"first_key"=>{"second_key"=>"value"},"third_key"=>12}

That was when I found out that there are no actual methods do to this, and therefore I had to come up with my own.

If you want to get all the values in a nested hash, you can do this:

def get_all_values_nested(nested_hash={})  
    nested_hash.each_pair do |k,v|
      case v
        when String, Fixnum then 
          @all_values << v
        when Hash then get_all_values_nested(v)
        else raise ArgumentError, "Unhandled type #{v.class}"
      end
    end
    
    return @all_values
  end

Obviously, you will have to run trough all the pairs of key/value, but this only means that you’ll the keys in the first “layer”, and the values may be actual values, or other hashes.

That’s why we need the case statement, in order to differenciate between values and other hashes (or Arrays…). If it is a hash, you just take give the value to the function and do a recursive search, that stop when it hits a value, adding this value to the final array. In this exaple I’ve considered to be values, String and Fixnum, if any other type happens to be present, an exception will be raised.

Note that some of the all_values variable is an instance variable. It cannot be a local variable, because it’s values are going to be used in all the recursive calls. Another way of doing it would be by passing the variable to method each time and update it at each return. I find it simpler and prettier like this.

In the previous example you’ll get an array with all of the values, and that’s it. You may, however, want to now what was the path travelled to get to the value. It is actually not that hard to implement, by changing the code just a little bit.

def get_all_values_nested(nested_hash={})  
    nested_hash.each_pair do |k,v|
      @path << k
      case v
        when String, Fixnum then 
          @all_values.merge!({"#{@path.join(".")}" => "#{v}"}) 
          @path.pop
        when Hash then get_all_values_nested(v)
        else raise ArgumentError, "Unhandled type #{v.class}"
      end
    end
    @path.pop

    return @all_values
  end

There are two main differences in this code, the first is that there is a new instance variable, called path, that’s an array, with the all the keys that had to be “visited”, to get to the value. The last key to be visited is the last key in the array, and is poped each time a value is found, or when all the keys of a certain hash are exhausted.

Imagine you have the hash first presented as an example, the evolution of the path array would be:

[“first_key”] , [“first_key”,”second_key”] – Here, the value is found, and therefore a pop occurs, leaving the array with the previous state:

[“first_key”] – first_key does not have any more keys, so another pop happens, and so on:

[] , [“third_key”] , []

The other difference is that an hash is returned instead of an array. The hash has the following format (using the previous example):

{“first_key.second_key”=>”value” , “third_key”=>12}

So, now you can get the values, and know where they came from, the next thing you probably will want to do is change them and update the nested hash.

def set_value_from_path(nested_hash,path_to_value,newValue)
  path_array = path_to_value.split "."
  last_key = path_array.pop
  hash = create_hash_with_path(path_array,{last_key=>newValue})
  self.merge!(nested_hash,hash)
end

This method receives the path to the value in the format used in the get, and the new value to be inserted. It first transforms the string of the path into an array as the ones we’ve used before, then it makes the array and the value into a hash, using an auxiliary method, create_hash_with_path. It is a very simple method that gets a path array and a simple hash with the last key before the value, and the value.

def create_hash_with_path(path_array,hash)
    newHash = Hash.new
    tempHash = Hash.new
    flag = 1
    path_array.reverse.each do |value|
      if flag == 1
        tempHash = hash
        flag = 0
      else
         tempHash = newHash  
      end    
      newHash = {value => tempHash}
    end  
    
    return newHash
  end

Afterwards it merges the nested hash (that is the one from the first example), and the newly created hash. The only problem is that you cannot use the merge methods from the Hash class, because it does not work for nested hashes. You can write a simple method that does just that.

def merge!(merge1,merge2)
    case merge1
      when String,Fixnum then merge2
      else merge1.merge!(merge2) {|key,old_value,new_value| self.merge!(old_value,new_value)} if merge1 && merge2
    end    
  end

It just redefines the Hash’s class merge! to be recursive, and to stop when it reaches a value (String, Fixnum), and the using the value in merge2, that is the new value.

So, there you go. Now you have a whole new arsenal of methods to deal with nested hashes. Have in mind that you can chage this code to be compatible with other kinds of values, like Symbols and/or to accept other containers as an Array.

Posted in Computer Science | Tagged programming, ruby | 4 Comments »

Foldr vs Foldl – A small survey with the help of GHC

December 1, 2010 by Marcelo Sousa

Recursion patterns are one of my favorite aspects of functional programming, but when our objective is checking how our functions behave in terms of performance instead of just writing beautiful functions, we need to be careful which pattern to use.
Using the list definition:

data [a] = [] | a:[a]

We can define simple functions such as sum:

sum :: Num a => [a] -> a
sum []    = 0
sum (h:t) = h + sum t

and:

 and :: [Bool] -> Bool
 and []    = True
 and (h:t) = h && and t

This kind of functions are quite simple to define and as we can see they capture a structural recursion patten over the data type. In Haskell we have this kind of recursive pattern with folds, foldr and foldl. More information about folds for other data check out here.

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f e []    = e
foldr f e (h:t) = f h (foldr f e t)

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f a []    = a
foldl f a (h:t) = foldl f (f a h) t

With folds we can now have new definitions for sum and and:

sumr = foldr (+) 0
suml = foldl (+) 0
andr = foldr (&&) True
andl = foldl (&&) True

Now the question is: Which version of sum and and should we use in our real programs?!
To answer that question we are going to do some heap profiling with the help of GHC.
Lets start by wrapping our functions in a Main module like:

module Main where

main = print $ sumr [1..2000000]

Now we compile with:
ghc –make -prof -auto-all Rev
The options -prof and -auto-all are to enable profiling. We are able to run our main function with profiling now with:
./Rev +RTS -K100M -p -hc
2000001000000
-K100M increases the stack size. -p and -hc are related to heap profiling and for more info about RTS check here.
As we can see the result is correct. If we inspect our directory we should have a Rev.hp where hp stands for heap profiling. If you have installed ghc correctly we should be able to convert this file to a postscript file with:
hp2ps Rev.hp
And now we can open Rev.ps to check out the graphic.

heap profiling for sumr

Wow, more 10Mb for this simple function! As we can see this is extremely inefficient. Let’s understand what is really going on: according to our definition of foldr we will produce something like:
1+2+3+foldr (+) 0 [4,…], so we basically have to carry out this huge expression until we reach the end of the list corresponding to our highest point in the graph and we start performing the reductions. Now because foldl used an accumulator we shouldn’t check this. Let’s examine then:

heap profiling for suml

The suml version is even worse, more than 25MB of memory! To understand let’s check what foldl is doing:
foldl (+) (0+1+2) [3,…]
So we are still creating this huge expression! What we need to do is to use the seq or more concretly the $! function:

seq :: a -> t -> t

What seq does is to evaluate x before returning f. Now taking a look at $!:

 ($!) :: (a -> b) -> a -> b
f $! x = x `seq` f x

Now we are able to define a new sum function:

sum = sum' 0
sum' acc  []   = acc
sum' acc (h:t) = (sum' $! acc + h) t

We are now forcing the evaluation of acc + h. This version is equivalent with foldl’ (import Data.List):

suml' = foldl' (+) 0

Checking out the graphic:

heap profiling for suml'

The improvement is quite astonishing! From more than 25Mb we are now in 1Kb.
Does this means that we should always use foldl‘? Check out the definition of and: it uses a lazy operator so if we find a False in the middle of the list with foldl we will always traverse the whole list which is quite inefficient. Also foldl don’t work for infinite lists for the same reason.
So as a conclusion of when to use foldr, foldl and foldl’:
foldr – Partial results, infinite lists, lazy operators
foldl’ – Finite lists with strict operators
foldl – Otherwise (like reverse)

Posted in !*, Computer Science, Haskell | Tagged foldl, foldr, GHC, Haskell, heap profilling, optimisations, recursion patterns | Leave a Comment »

Haskell Session Types with (Almost) No Class

November 24, 2010 by Marcelo Sousa

Unfortunately I think the project of a paper per day is a but unrealistic since I also want to have a life outside the box. So I’ve decided to post as I’m reading the papers with the promise of at least write about a paper a week.
Today’s paper is Haskell Session Types with (Almost) No Class. Session types is basically the idea of encoding the protocol of communication in the type of the processes that communicate. In Haskell this means that if we have a server function and a client function, the types of both functions represent the protocol of communication. Obviously the theory of process-pi calculus behind it tells us what rules we must have in our protocols, meaning, they must be dual, if the protocol of the server is Int ! End, that represents send an Int and finish, the type of the client must be Int ? End, receives an integer and ends. The concept of duality is obtained in Haskell with functional dependencies of multi-parameter classes. Well multi-parameter classes are just classes that receive more than one parameter such as:

class MyClass a b where ...

The duality is represented by functional dependencies, serving as example the dual class in the paper:

class Dual r s | r -> s, s -> r

What this means is that given two protocols r and s, there can only be one instance of Dual r s, it’s a bijection. Let’s clarify this situation with a practical example:

class MyClass a b | a -> b, b -> a where
x :: a -> b -> a

instance MyClass Int Char where
x i c = i

instance MyClass Char Int where
x c i = c

We can no longer define instances of MyClass with Char or Int, like MyClass Char String, MyClass String Int, etc.

Let’s now take a look at a safe echo server with one channel. (cabal install simple-sessions to install the library used)

module EchoServer where

import  Control.Concurrent.SimpleSession.Implicit
import  Control.Concurrent.SimpleSession.SessionTypes
import  Control.Concurrent

(>>>) :: IxMonad m => m i j a -> m j k b -> m i k b
k >>> m = k >>>= \_ -> m

server = enter >>> loop where
loop = offer close ixdo
ixdo = recv >>>=  \s -> io (putStrLn s) >>> zero >>> loop

client = enter >>> loop 0 where
loop count   = io getLine >>>= (\s -> ixdo s count)
ixdo s count = case s of
"q" -> sel2 >>> send (show count ++ " lines sent") >>>
zero >>> sel1 >>> close
_   -> sel2 >>> send s >>> zero >>> loop (count + 1)

runPrintSession = do rv <- newRendezvous
forkIO (accept rv server)
request rv client

(I apologize for the bad indentation-will try to fix that later, meaning never)
Without knowing anything about the types of functions used or what index monads are (basically monads that carry pre-conditions and post-conditions and only allow composition of monads when the post-condition of the first is equivalent to the pre-condition of the second), we can still understand the flow of the functions and also uses this as a blind api as start. Because of functional dependencies, we facilitate type inference and don’t even have to worry about giving types to functions.
The server function basically enters a loop that offers a choice between ending communication or receiving a string and printing it to server world. Then it goes back to the loop. Let’s take a look at the type of the function server:

server :: Session (Cap e (Rec (Eps : & : (String : ? : Var Z)))) () ()

Let’s focus on the important part that I want to explain which is the specification of the protocol: Rec (Eps : & : (String : ? : Var Z)), which states that we have recursive protocol that either ends, Eps, or receives a String and assigns it to Var Z associated in this recursive block.
Assuming the dual of : & : is : + :, and : ? : is : ! :, we can infer that the protocol of the client is:

client :: Session (Cap e (Rec (Eps : + : ([Char] : ! : Var Z)))) () ()

The client receives a string from it’s world, if that string is “q” selects option 2 on the server side which is for it to receive a string and print in its world, and sends (show count ++ ” lines sent”). It then goes back to the beginning and now selects option 1 to end communication. If it’s not “q” it sends the string given and goes back to the beginning.
The function runPrintSession wraps this communication by creating a channel, forking and accepting a channel in the server, and by requesting that same channel on the client side.
Now, does this work?!

[1 of 1] Compiling EchoServer       ( EchoServer.hs, interpreted )
Ok, modules loaded: EchoServer.
*EchoServer> runPrintSession
Hello server
Hello server
Im the client
Im the client
bye
bye
q
3 lines sent

It seems like it! What happens if both protocols don’t match?! Suppose the server now does not close but keeps in an infinite loop:

server = enter >>> loop
 where loop = ixdo
 ixdo = recv >>>=  \s -> io (putStrLn s) >>> zero >>> loop

*EchoServer> :r
[1 of 1] Compiling EchoServer       ( EchoServer.hs, interpreted )

EchoServer.hs:20:0:
 Couldn't match expected type `Eps : + : ([Char] : ! : Var Z)'
 against inferred type `String : ! : s'
 When using functional dependencies to combine
 Dual (a : ? : r) (a : ! : s),
 arising from the dependency `r -> s'
 in the instance declaration at <no location info>
 Dual (String : ? : Var Z) (Eps : + : ([Char] : ! : Var Z)),
 arising from a use of `request'
 at EchoServer.hs:22:21-37
 When generalising the type(s) for `runPrintSession'
Failed, modules loaded: none.

We got a type error stating that our protocol should be a choice between close and receiving a string, instead of just receiving a string!
I find this pretty cool. There are implementations for other languages. Check it out here!

Posted in Computer Science, Haskell, Papers, Programming Languages | Tagged Haskell, Pi-Calculus, type sessions | Leave a Comment »

Older Posts »

Experiences near Extinction

Bridging the gap between SQL ao NoSQL: A state of the art

Running a Cassandra cluster with only one machine

Inserting data with Thrift and Cassandra 0.7

Opening a new tab in the same directory and then some in Mac OS

Parsing strings from the datepicker

jQuery-UI datepicker dynamic internationalization in Rails

A tragédia d’O ardiloso, ignóbil janota, quiçá dândi.

Manipulating nested hashes in Ruby

Foldr vs Foldl – A small survey with the help of GHC

Haskell Session Types with (Almost) No Class

Archives

Categories

Pages

Blogroll

Meta