Nirav's Contemplations: Scala

Showing posts with label Scala. Show all posts

Monday, December 13, 2010

Tackling nulls the functional way

Most programmers have suffered null pointer one way or other - usually a core-dump followed by a segmentation fault on development machine or on a production box with application in smokes. NullPointerException results in a visible embarrassment of not thinking about "that something *could be* null".

Tracking Null Pointer ranges from loading core-dump in gdb and tracing dereferenced pointer to stack traces pointing to exact location in source. However, ease of tracking nulls opens up the doors to ignore them in practice and throwing null-checks just becomes as common as throwing one more div to fix IE's layout problems which is bad.

Problems with Null:
I hate having to ignore nulls as it is not always enough just to add one more null check. The reason why I am writing this blog is because I have several problems with Nulls:

All and Every reference can be a null in languages like Java. This covers everything: method parameters, return values, fields etc. There's no precise way for a programmer to know that some method might return null or accept null parameters. You absolutely have to resort to actual source code or documentation to see if it can possibly return null (and you are going to need good luck with that). All of it adds extra work when you really want to be focusing on fixing the real problem.
The problem with NullPointerException is that they point to causal eventuality and not usually the actual cause. So what you see in stack traces are usually the code paths where damage is not really initiated but done when we are normally interested in case where damage is initiated.
Null is actually very ambiguous. Is it the uninitialized value or absence of value or is it used to indicate an error? The paradigm of null fits well in database but not in programming model.
Having Nulls in your code has major implications in code quality and complexity. For example, it is not unusual to see code branches with null checks breeding like rabbits when an API "may" return null which in turn results in extremely defensive code. This significantly taxes readability.
Null makes Java's type system dumber when a method is overridden and you want to call it. Writing code like methodDoingStuff((ActualType)null, otherArgs) isn't exactly a pretty sight. This results in subtle errors when arguments are non-generic.

In many ways Nulls are necessary evil. For those of us who care about readability and safety we can't ignore them yet we shouldn't just let it overtake safety and readability.

I have come to know several techniques to tackle nulls. First, there is Null Object pattern which is not entirely as ridiculous as the name implies but it's not practical in real life software having hundreds of class hierarchies and thousands of classes, and so, I will not talk about it. Then there are languages like Haskell and Scala with library classes that try to treat nulls in, IMO, a better way. Haskell has MayBe and Scala has Options. After using options in Scala for a while in a side project, I found that I was no longer fighting with nulls. I knew exactly when I had to make a decision that a value is really optional and I must do alternate processing.

The central idea behind Haskell's MayBe and Scala's Option is to introduce a definitive agreement on a value's eligibility to be either null or not-null enforced with the help of type system. I will talk about Scala's Option since I have worked with it, but the concept remains same. I will also introduce how to implement and use Options in Java since this is much more of a functional way of thinking about handling nulls and it doesn't (almost) take Scala's neat language features to implement it.

Treating nulls the better way:

Usual course of action when you are not sure what value to return from a method is:

Most of the times it results in the last option because you don't have to fear about breaking everything (well, mostly) and everyone passes the buck like this.

We can do better with Scala's Option classes. We can wrap any reference in to Some or None and handle it with pattern matching or "for comprehension". For example:

Some(x) represents a wrapper with x as actual value; None represents absence of value. Some and None are subclasses of Option. Option has all the interesting methods you can use. When a variable in question is null we can: fall back to default value, evaluate and return a function's computed value, filter and so on.

Options in Java:
Implementing Options in Java is surprisingly a trivial task. However, it is not as pleasant as Scala's options. Implementation boils down to a wrapper class Option with two children: Some and None. None represents a null but with a type (None[T]) and Some represents non-null type.

To make Option interesting we make Option extend List, so we can iterate on it to mimic poor man's "for comprehension". We will also go as far as tagging both types with Enums so we can do poor man's pattern matching with a switch. You can find example implementation of Options in Java with a test case demonstrating use of Options. Here's the small snippet which covers the essense:

As you can see, Option opens up several doors to fix the null situation. You now have choice to compute a value, use default value or do arbitrary stuff when you encounter nulls.

How are my null problem solved with Options:

Using Options for optional/null-able references I have at least avoided "all things could be null" problem in my code. When an API is returning a Option, I don't have to wonder if it can return null. Intention is pretty clear.
When I am forced to handle null right at the time of using an API, I have to handle it right there: do alternate processing or use default. No surprises.
Option is a very clear way of saying a variable represents possibly an absent value.
Option doesn't really solve this problem completely. For example, method signatures with wrapper Option type can get really long (e.g. def method1(): Map[String, Option[List[Option[String]]] = {}). However, compared to null checks, I would prefer long method signature any day. Other benefits out-weight this limitation.
Clearly, Option[Integer] always means only Option[Integer] and not Option[Integer], Option[String], Option[Character], Option[Date] and so on. Compiler can infer exact method call from generic types.

As good as the concept behind optional values is, it doesn't and will not always save you from Null. You will still have to deal with existing libraries which return nulls and cause all these problems and more. However, most of the time null is problematic in your own code.

Where to use Options:

Here are the common places where I think using Options makes more sense:

APIs: Make your API as specific and as readable as possible; all optional parameters and return values should be Option.
Use in your domain model: You already have fair understanding on null-able columns, use Option for null-able fields in your table. It is not hard to integrate using Options if you are using an ORM with interceptable DB fetch; you can initialize fields to None if database contains null and so on.

In the interest of keeping this post relevant and on topic, I have completely avoided heavy theoretical baggage (monads et. al.) that's inevitable when theoretical functionalists (functional programmers) talk about Options. I really hope this post generates some interest in this topic. If you disagree or would like to share more on this topic, please leave a comment.

Sunday, March 07, 2010

Eclipse Refactoring for legacy code

It has been quite sometime since I wrote anything on this blog, twitter probably spoiled me. If you are following me on twitter, you probably noticed announcement of a small eclipse plugin for automated refactoring for Legacy Code.

Before few months, I wrote an LTK refactoring mainly in Scala (yes, eclipse plugin in Scala language) to forward static method calls in a Java method to an instance method in same class. I am not really an expert at Scala (or functional programming for that matter) so the code is more of javaish Scala but I am improving and thats the best thing about Scala. This is also an attempt to prove how easy it is to write eclipse plugins in Scala and how seamlessly it integrates with Java source; Scala is not only a better language, it is much more suitable to deal with Eclipse APIs (you can define views etc. on extremely verbose interfaces like JDT AST).

Enough about Scala; the real purpose of the refactorings provided in this plugin would be to ease development with Legacy Java code, most of the code generators (think JavaCC) generate ugly Java code which is not only non unit testable but pain to comprehend and is often not usable with concurrent routines. Using this automated refactoring should make such code better. You can read more about the motivation behind the plugin in wiki.

For the performance heads who are worried about method chains introduced by this refactoring are encouraged to run their own "sane" micro-benchmarks to be sure JVM is really in-lining the method calls created by this refactoring.

The plugin is currently in beta and I am open for any new refactoring proposals. If you have any feedback, you are welcome to comment on this blog post or create a bug. The update site is here.

Tuesday, May 12, 2009

Scala v/s Java arrays

Here's a Java puzzler for the curious (and a good interview question too!). Given a array merge method below, what will be the output of following program?



public class Generics {

 static class A {
 }

 static class B extends A {
 }

 public static void main(String[] args) {

  A[] copy = merge(new B[] { new B() }, new A[] { new A() }, new B[1]);
  System.out.println(copy.length != 1);

 }

 static  Z[] merge(Z[] arr1, Z[] arr2, Z[] store) {
  List list = new ArrayList();
  list.addAll(Arrays.asList(arr1));
  list.addAll(Arrays.asList(arr2));
  return list.toArray(store);
 }

}

If you didn't guess it already, the program above results in a runtime exception (java.lang.ArrayStoreException).


Exception in thread "main" java.lang.ArrayStoreException
 at java.lang.System.arraycopy(Native Method)
 at java.util.Arrays.copyOf(Unknown Source)
 at java.util.ArrayList.toArray(Unknown Source)
 at name.nirav.Generics.merge(Generics.java:23)
 at name.nirav.Generics.main(Generics.java:16)

I am not a huge fan of generics in Java because we are left with whatever type safety we get from a half-hearted implementation (and I'm not even criticizing). It is too much to expect from a Java compiler to check that the program above has type safety compromised at call site, mostly because that's how arrays in Java are handled by VM. Arrays are special types of mutable objects with components as anonymous members which are accessed with indices. An array itself isn't a type, it assumes whatever type its components are. This is where the problem starts.

With current generics implementation, generic arrays are treated as covariant by default i.e. an array of component type T is also array of component type S where T is a subclass of S. This introduces type issues such as above where syntactically valid programs are victimized, making Java's "statically typed, type safe language" designation an irony. If arrays were regular objects, compiler will report an error in code without type variance information.

Arrays are regular objects in Scala, each array is an instance of Scala.Array class. The code below is equivalent to Java program above with some syntactic differences, unlike Java code the Scala code below is not syntactically valid. Scala arrays are non-variant, and Scala compiler uses what is called "conservative approximation" to ensure type safety at compile time.


object App extends Application{

 class A

 class B extends A
 
 def merge[T](arr1 : Array[T], arr2: Array[T], store: Array[T]) : Array[T] = {
    val list = new ArrayList[T]
    list.addAll(Arrays.asList(arr1:_*)) // :_* is for vararg conversion
    list.addAll(Arrays.asList(arr2:_*))
    list toArray store
 }

 merge(Array[B](new B), Array[A](new A), new Array[B](1)) //Error, type mismatch 

}

The Scala compiler will report an error on "merge" call, complaining about type mismatch.

Not everyone likes to know about such details until it bites back with million dollar bugs. Why are Java arrays co-variant? Who needs more run time checks?

Tuesday, May 05, 2009

Fork/Join Concurrency with Scala Actors

Have you ever wondered why there are no special frameworks to address concurrency in a Java based application? Considering Java's rich (NIH) ecosystem, I do wonder why I have to write same old state management code while introducing even a small amount of concurrency in Java application.

The reason why I think it is almost impossible to consider concurrency as an aspect in arbitrary application is because of JVM's native support for shared memory concurrency. As a result every developer is forced to think in terms of threaded shared state with guarded blocks. If you have read or written non-trivial piece of code using shared memory concurrency primitives (Mutex, Semaphore etc.) you probably know that the resultant code is hard to visualize and test.

I have been reading about Scala's Actor library and its share-nothing message passing abstraction built over existing concurrency model of JVM. While it doesn't try to solve the fundamental problem, it provides an alternative to address concurrency in your application from a different perspective which is testable and easier to understand.

In actor model, an Actor is a forkable task which runs independently, something like a serializable+immutable object with its private data and behavior. Each actor can send and receive (or react to) messages asynchronously, very similar to object oriented programming with objects responding to messages, but in a concurrent way. This abstraction can seamlessly be applied to a given application of divide and conquer nature and can be made concurrent with minimal efforts as compared to adapting to Java's concurrency primitives.

To explain my point further take a look at classically trivial Producer/Consumer example in Java.


public class Consumer extends Thread {
private final Buffer buffer;
public Consumer(Buffer buffer) {
 super("Consumer");
 this.buffer = buffer;
}
@Override
public void run() {
 while (true){
  System.out.println(buffer.next());
 }
}
}
public class Producer extends Thread {
private final Buffer buffer;
public Producer(Buffer buffer) {
 this.buffer = buffer;
}
@Override
public void run() {
 Random random = new Random(System.nanoTime());
 while (true) {
  String num = Integer.toString(random.nextInt());
  System.out.println(getName() + "=putting: " + num);
  buffer.add(num + ": " + getName());
  try {
   sleep(400);
  } catch (InterruptedException e) {
  }
 }
}
}
public class Buffer {
private String string;
private boolean ready = false;
public synchronized String next() {
 if (ready != true)
  try {
   wait();
  } catch (InterruptedException e) {
  }
 ready = false;
 return string;
}

public synchronized void add(String string) {
 while(ready == true)
  try {
   wait();
  } catch (InterruptedException e) {
  }
 this.string = string;
 notifyAll();
 ready = true;
}

}
public class Test {
public static void main(String[] args) throws Throwable {
 Buffer buffer = new Buffer();
 new Consumer(buffer).start();
 Producer producer = new Producer(buffer);
 producer.start();
 producer.join();
}
}

Take a look at Buffer class, we have used some concurrency primitives there since that's the place where state is being manipulated. We didn't declare variable ready as volatile since primitive assignments are guaranteed to be atomic (except long and double), Even a simple problem like this involves fair bit of understanding of the underlying threading model. There's no doubt this complexity will extrapolate in non-trivial applications e.g. multi-phase concurrent incremental compiler, SEDA based server etc.

Now take a look at the equivalent Producer/Consumer example in Scala.


import actors._
import actors.Actor._
import util.Random

case class SimpleMessage(num: Long)

class Producer(c: Consumer) extends Actor{
val random = new Random(System nanoTime)
def act = {
 loop{
  val num = produce
  println("Sending: " + num )
  c ! SimpleMessage(num)  // asynchronous message passing
 }
}
def produce(): Long = {
 Thread sleep 400
 return random.nextLong
}
}
class Consumer() extends Actor{
def act = {
  loop{
   receive{ //blocks here
     case SimpleMessage(num) => println("Received: " + num);
   }
  }
}
}
object PCTest {
def main(args : Array[String]) : Unit = {
var c = new Consumer()
var p = new Producer(c)
c.start;p.start
}
}

Even if we don't compare the amount of code, the Scala code above is much more clear in terms of its functionality. In Scala, Actors can be mapped to a single native thread with 'receive' (similar to Thread#wait()) or we can replace 'receive' with 'react' which is event based invocation but doesn't cost a blocked thread. The code within 'react' is executed by any non-blocked thread from a pre-created thread-pool. Just a single change and your application is scalable!

The Java example code above can be equally trivialized with the util.concurrent BlockingQueue, but the important point to take away is, writing shared memory concurrency code is inherently difficult and error-prone. With JDK1.7 we will get similar fork/join abstraction in Java itself (JSR166y), which will add new alternative to how we design and write concurrent applications.

Scala borrowed Actors from Erlang and similar libraries exist for Java as well. If you are curious about interesting details on Actor based OO concurrency implementation in Java, take a look at some of the thoughts Sebastian is sharing with his ConcurrentObjects library.

Tuesday, April 21, 2009

How Scala's pattern matching can replace Visitors

The primary motivation of Visitor design pattern is to separate model traversal from operational logic. A visitable model takes the responsibility of model navigation while the behavior is defined by arbitrary visitors. In this post I will try to explain problems associated with Visitors in general and how Scala's pattern matching feature can eliminate such problems cleanly.

Consider a simplified Insurance Policy model as follows (In Java):



public class PolicyElement {
 static class Quote extends PolicyElement {
  protected final Risk risk;
  public Quote(Risk risk) {
   this.risk = risk;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(this);
   visitor.visit(this.risk);
  }
 }

 static class Risk extends PolicyElement {
  protected Coverage coverage;
  public Risk(Coverage coverage) {
   this.coverage = coverage;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(coverage);
  }
 }

 static class Coverage extends PolicyElement {
  protected final Premium prem;
  public Coverage(Premium prem) {
   this.prem = prem;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(prem);
  }
 }

 static class Premium extends PolicyElement {
  protected final double amt;
  public Premium(double amt) {
   this.amt = amt;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(this);
  }
 }
}

public interface PolicyVisitor {
 public void visit(Quote quote);
 public void visit(Risk risk);
 public void visit(Coverage cvrg);
 public void visit(Premium prem);
}
public class PolicyTest {
 static class PremiumCalcVisitor implements PolicyVisitor {
  private double totalPremium;

  @Override
  public void visit(Premium prem) {
   totalPremium = getTotalPremium() + prem.amt;
  }

  @Override
  public void visit(Coverage cvrg) {
  }

  @Override
  public void visit(Risk risk) {
  }

  @Override
  public void visit(Quote quote) {
  }

  public double getTotalPremium() {
   return totalPremium;
  }
 };

 public static void main(String[] args) {
  Quote quote1 = new Quote(new Risk(new Coverage(new Premium(10))));
  Quote quote2 = new Quote(new Risk(new Coverage(new Premium(30))));
  PremiumCalcVisitor visitor1 = new PremiumCalcVisitor();
  PremiumCalcVisitor visitor2 = new PremiumCalcVisitor();
  quote1.accept(visitor1);
  quote2.accept(visitor2);
  assert visitor1.getTotalPremium() + visitor2.getTotalPremium() == 40;
 }
}

(Generally, we introduce one more abstract class to omit empty implementations in Visitors but I have left it for brevity.)

Now, not so apparent problem here is that if the object model changes (which is more frequently the case in real life), we have to add one more method to PolicyVisitor interface, all visitor implementations if change is substantial and have new Policy elements implement visitor methods. This invasive nature of Visitor couples it tightly with the model.

With pattern matching and views in Scala, you can have alternative implementation which is precise as well as non-invasive unlike visitors.


class PolicyElement 
case class Quote(risks: Risk) extends PolicyElement
case class Risk(cvrg: Coverage) extends PolicyElement 
case class Coverage(limit: Premium) extends PolicyElement 
case class Premium(amt: Double) extends PolicyElement
object PremCalcTest {
  class PremCalculator(pol: PolicyElement){
 def calcPrem : Double = calcPrem(pol)
 
    def calcPrem(policy: PolicyElement): Double = policy match{
   case Quote(risk)   => calcPrem(risk)
   case Risk(coverage)  => calcPrem(coverage)
   case Coverage(premium)=> calcPrem(premium)
   case Premium(amt)  => amt
 }
  }
 
  implicit def calPremV(pol: PolicyElement)= new PremCalculator(pol)
  
  def main(string: Array[String]){
   val risk1 = Risk(Coverage(Premium(10)))
   val risk2 = Risk(Coverage(Premium(30)))
   println(Quote(risk1).calcPrem + Quote(risk2).calcPrem)
  }
}

This code requires some explanation. What we have done here is we labeled domain classes with a 'case' keyword in Scala. If you tag a class with 'case' it can be used for pattern matching in a switch-case like structure as done in method 'calcPrem'. You don't need to create members or setter/getters for them, they are created by compiler for you. A case class can be instantiated without 'new' keyword; So Risk(Coverage(Premium(0)) is translated as new Risk(new Coverage(new Premium(0D))) in equivalent Java code.

The code in 'calcPrem' function can be assumed to be something similar to instanceOf checks for each possible case in Java, for example:


if(object instanceOf Premium)
return ((Premium)object).amt;

What we also have done silently is added a method 'calcPrem' to PolicyObject class. This is done through implicitly defined function 'calPremV', this will allow us to call 'calcPrem' method on any PolicyObject without actually modifying the domain model code. This type of lexically scoped class extension is known as a View in Scala and is similar to what is available in Ruby as open classes except without scoping.

In case if the model changes in this case, we just need to modify a single function and we are done. These programming language features of Scala frees us from coupling introduced by inheritance.

So it is easy to see that Scala's language features can be elegant and far more powerful than other languages (specifically Java) without sacrificing compiler checks and type safety.

Thursday, April 09, 2009

Scala: First impression

If you are curious enough about programming languages then you probably have heard about Scala - the 'statically typed' functional and object oriented language. Scala is the new sun rising to balance 'the burn factor' between functional and object oriented schools of thoughts.

Unlike what this paper suggests[pdf], The reason why I think Scala exists is because functional v/s object oriented groups are moving in opposite directions, which is not only inefficient but they can't leverage achievements of one another. If you read about functional v/s object oriented programming comparison, every argument boils down to productivity, tooling and maintainability of code. While functional languages (Lisp, Haskell, Python etc.) offer excellent productivity compared to OO languages (Java, C++ etc.), Object oriented languages offer excellent tooling and are relatively maintainable. The reason why I think OO languages have been so popular is due to its easier to understand concept which is easy to map in real life, so for most people who have never took computer science course OO is still easier to grasp compared to Functional programming methods like list comprehension, closures or the higher order functions which are rooted from the formal systems of mathematics.

Scala tries to satisfy both of the groups by providing grammar and a type system which seamlessly integrates with mainstream platforms (Java, .NET) and offers powerful functional abstractions only available in dynamic languages. What this means is, Java developers can write their code in same fashion they write in Java using existing libraries and frameworks but with an added advantage of functional programming techniques wherever they feel it might be productive. Functional programming language enthusiast get access to rich class libraries and powerful tooling (eventually).

If you take a look at the Scala language grammar[pdf] you will notice that what you can create with Scala is limited by your creativity. Based on what I have learned so far, I find Scala much more refreshing than Java, Scala feels a lot more like a programming language of the 21st century! Scala compiler itself is pluggable so you can do heck of a stuff you can only dream with javac, ecj. What is missing is tooling, the existing tooling is scrap but that will improve hopefully with an active community.

Bill Venners of Artima has presented Scala wonderfully, take a look at the presentation[requires flash] on 'The feel of Scala'.

Nirav's Contemplations