Nifty Snippets: tip

Showing posts with label tip. Show all posts

Friday, 10 February 2012

`forEach` and runtime cost

Just a tiny one today, mostly to write this down somewhere:

As you all know, ECMAScript5 adds forEach to arrays, where you supply a function that gets called for each element in the array. There are lots of benefits to this, not least variable scoping on the index and value variables and a bit less typing, but don't all of those function calls add up to a significant runtime cost?

No, they don't.

I got curious about it (I have a tendency to micro-optimize, which I'm trying to break myself of), so I tested it on the slowest (desktop) browser currently in use: IE6. Specifically, IE6 running in an old Windows 2000 VM I have lying around. I tested the performance of looping through a 100-entry array both with and without calling a function. Without the function call, IE6 looped through the array ~4,500 times in a second (that's 450,000 loop iterations/second). With the function, it managed ~2,000 times a second (200,000 loop iterations/second). So while that's a big relative difference (56% slower!), in real terms it falls into fergedaboudit territory: 2.78 microseconds of overhead. You heard, me, microseconds. That's 0.00278 milliseconds.

Now, I'm not going to say there aren't places where it could matter. I am going to say that they're going to be extraordinarily rare, I'd wager I'll never run into a situation where it makes a difference and nor will you. Whatever else you're doing in the loop body is totally going to wash out the function calls. Really.

Happy coding!

Thursday, 16 June 2011

Copying table structure in SQL Server

Why didn't I know this? For so many years I've bemoaned the fact that SQL Server doesn't have MySQL's CREATE TABLE LIKE statement:

CREATE TABLE NewTable LIKE ExistingTable

Um, er, except it does:

SELECT TOP 0 INTO NewTable FROM ExistingTable

The INTO clause of SELECT...

"...creates a new table in the default filegroup and inserts the resulting rows from the query into it." (MSDN link)

And of course, if you use TOP 0, there are no rows — but the table structure still gets created. (Naturally, if you want to copy the data as well, just leave off the TOP 0.)
Sometimes, you think you know something doesn't exist, and it turns out you just haven't looked hard enough to find it...
Happy coding,

Sunday, 22 May 2011

Appending to an array

JavaScript's Array#concat function is handy and cool, but it creates a copy of both arrays. That is, given array a and array b, a.concat(b) will create a third array c with the contents of both of them, leaving a and b intact. Now, sometimes that's what you want, but other times you just want to append b to a without creating a copy of a.

While JavaScript doesn't have a dedicated append function, there's a surprisingly efficient and concise way to do it:

a.push.apply(a, b);

That calls the push method via apply, which accepts a context argument (what the this value should be during the call) and any array-like object providing the arguments to give the call. Since push allows you to push multiple elements with a single call, that works a treat. And somewhat unusually for JavaScript, it's probably the most efficient way to do it on all platforms (where normally this sort of thing is faster on some and slower on others)

Now, if you do this in code other people are going to have to read, I'd recommend a comment saying "Appends `b` to `a` in place" or some such, because otherwise it's pretty opaque to people who don't know JavaScript well. Or, of course you could always add an append method to arrays:

(function() {
    var push = Array.prototype.push;
  
    function Array_append(array) {
        push.apply(this, array);
        return this;
    }
  
    if (Object.defineProperty) {
        Object.defineProperty(Array.prototype, "append", {
            enumerable: false,
            value: Array_append
        });
    }
    else {
        Array.prototype.append = Array_append;
    }
})();

As always, if you're doing that on platforms that don't have Object.defineProperty (and there are still a lot in the wild), make sure you don't have any broken for..in loops lying around.

Sunday, 23 January 2011

A myth of arrays

When is an array not an array?

When it's a JavaScript array.

(Update: Here I'm talking about JavaScript's standard Array type. Many environments now also support new typed arrays, which are different.)

JavaScript objects are basically just key=value maps, and JavaScript arrays are nothing more than objects that have:

special handling for keys that are numeric strings
a special length property
and some functions they inherit from Array.prototype

That's it. Although we conventionally write array "indexes" as numbers, like all property names they are strings — a[0] is converted to a["0"] by the interpreter (although implementations are free to optimize this as long as the behavior remains as per the spec).

All of this is covered by Section 15.4 of the specification, which starts with this paragraph:

Array objects give special treatment to a certain class of property names. A property name P (in the form of a String value) is an array index if and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to 2^32−1. A property whose property name is an array index is also called an element. Every Array object has a length property whose value is always a nonnegative integer less than 2^32. The value of the length property is numerically greater than the name of every property whose name is an array index; whenever a property of an Array object is created or changed, other properties are adjusted as necessary to maintain this invariant. Specifically, whenever a property is added whose name is an array index, the length property is changed, if necessary, to be one more than the numeric value of that array index; and whenever the length property is changed, every property whose name is an array index whose value is not smaller than the new length is automatically deleted. This constraint applies only to own properties of an Array object and is unaffected by length or array index properties that may be inherited from its prototypes.

There are some consequences to this:

Barring implementation optimizations, accessing array elements by index is not a constant-time operation as it is in (say) C; like all maps, it will vary depending on the layout of the key storage, the size of the map, and the specific key you're looking up.
Barring implementation optimizations, using an array is no more efficient than using a plain object.
JavaScript arrays are inherently "sparse" (since the whole concept of a contiguous block of memory just doesn't exist for them anyway).
Arrays are inherently one-dimensional. A "2D" JavaScript array is merely an array in which each element is, in turn, an array, and it's important to remember that those sub-arrays may have different lengths (and you may even choose to intermix arrays and non-arrays as values of the outer array's elements).
Arrays can have non-index properties, which can be handy. For instance, this is perfectly valid:
```
var a = ["zero", "one", "two", "three"];
a.foo = "bar";
alert(a.length); // alerts "4"
alert(a.foo);    // alerts "bar"
```
This is one reason why it's important to understand for..in loops before using them on arrays (see Myths and realities of for..in).

None of which means that we shouldn't use arrays for ordered lists of things. We should. Doing so is useful semantically, and of course it's entirely possible that an implementation will optimize some of those things above. But it's important to remember what you're dealing with, and taking advantage of an array's non-array nature can be very useful sometimes.

Happy coding,

Thursday, 23 December 2010

Say what?

(Updated 2014/12/30)

Something that sometimes comes up in JavaScript work is the need to figure out what kind of thing you're dealing with — is it a string? a number? a Date? And JavaScript has various features to help you do that, but there are some "gotchas" to watch out for. In this post I'll talk about various strategies for figuring out what things are.

Basically, you have four tools you can use, each of which has its place:

typeof
instanceof
Object.prototype.toString
Learn to stop worrying and love the duck

Let's look at each of them:

typeof

typeof is fast but fairly limited. It basically tells you whether something is a primitive or an object, and if it's a primitive it tells you what kind of primitive it is, but for all objects other than functions, it just says "object". So:

console.log(typeof "testing"); // "string"
console.log(typeof {});        // "object"

But it's useful for, say, determining if something is a function:

if (typeof x === "function") {
    // Yes it is 
}

(Although in some environments, host-provided functions like alert give us "object" rather than "function".)

typeof's chief advantage is speed, although with modern JavaScript engines, speed isn't anywhere near the concern it was a few years ago.

instanceof

instanceof sort of picks up where typeof leaves off. It's good for checking whether something is a specific kind of object, e.g.:

var dt = new Date(...); 
console.log(dt instanceof Date); // true

Specifically, obj instanceof Func it looks at each object in obj's prototype chain and sees if any of them is the one that Func assigns when used to construct objects. In the example above, for instance, since dt's prototype is Date.prototype, dt instanceof Date is true. dt instanceof Object is also true, because dt's prototype's prototype is Object.prototype. If you'd set up a multi-level hierarchy of constructors and prototypes so that Cat derives from Mammal which derives from Animal, and used c = new Cat(), instanceof would be true for a c instanceof Cat, c instanceof Mammal, and c instanceof Animal (and c instanceof Object, normally).

You do have to be careful with instanceof in some edge cases, though, particularly if you're writing a library and so have less control / knowledge of the environment in which it will be running. The issue is that if you're working in a multiple-window environment (frames, iframes), you might receive a Date d (for instance) from another window, in which case d instanceof Date will be false — because d's prototype is the Date.prototype in the other window, not the Date.prototype in the window where your code is running. And in most cases you don't care, you just want to know whether it has all the Date stuff on it so you can use it.

Object.prototype.toString

Calling Object.prototype.toString is slower than typeof or instanceof, but more useful for differentiating what kind of object you have — but sadly, only if the object is one created by one of the built-in JavaScript constructor functions like Date or String, not our own constructor functions. The return value of the Object prototype's toString function is defined in the standard for all the built-in constructor functions (Array, Date, etc.): It returns "[object ___]" where ___ is the constructor function name. So:

var what = Object.prototype.toString;
console.log(what.call(new Date()));   // "[object Date]" 
console.log(what.call(function(){})); // "[object Function]" 
console.log(what.call([]));           // "[object Array]"

...etc. Note that this is very different from just calling toString on the object itself; the object probably has an overridden toString. That's why we explicitly use Object.prototype.toString above. (Remember, these things are just functions, not methods.) Object.prototype.toString works on primitives too, because it coerces its argument into an object before checking. So if you call it with a string primitive, you get "[object String]" (and "[object Number]" for numbers, etc.).

The chief advantage of this is that it doesn't care what window the object came from; referring back to our Date object d from another window, we'll get "[object Date]" regardless.

But sadly for our own constructor functions, all we get is "[object Object]":

var what = Object.prototype.toString;
console.log(what.call(new MyNiftyThing("foo"))); // "[object Object]"

Ah, well...

At one point I was tempted to wrap all of the above (plus support for my own object hierarchy) up into an uber-function that definitively figured out what the thing was. Then I thought: Enh, maybe I should just use the right tool in each given situation.

Speaking of which, our last tool:

Learn to stop worrying and love the duck

When reaching for instanceof or typeof or whatever, ask yourself: Do you really care? How 'bout looking to see if the object seems to have the things on it you need (feature detection) rather than worrying about what it is? This is usually called "duck typing," from the phrase "If it walks like a duck and talks like a duck..." Sometimes, obviously, you do care, but if you get in the habit of asking yourself the question, it's interesting how frequently the answer is "Actually, no, I don't care whether that's a Foo; I just care that it has X on it" or "...I just care that it works if I pass it into getElementById" (that latter case being, basically, it's a string or its toString does what you need). I find myself doing a lot less worrying about types and just getting on with the job these days. I've learned to love the duck. (And always remember, be kind to your web-footed friends; a duck may be somebody's mother.)

Happy coding!

Postscript: You may be wondering why I haven't mentioned the constructor property in any of the above. The answer is: Because it's not very useful, and there are common anti-patterns that mess it up. At some point I'll do an article on what's wrong with constructor...

Sunday, 7 November 2010

Myths and realities of for..in

Probably the second most enduring myth about JavaScript is that the for..in statement loops through the indexes of an array. It doesn't, and if you write code that assumes it does, then even if the code doesn't break in its nice cozy nest on your computer, it's likely to as soon as you expose it to the complexities of the outside world.

In this article, I'll explain the myth, the reality of what for..in actually does, and discuss various ways to loop through the contents of an array correctly (and efficiently).

The Myth

According to the myth, you can loop through the contents of an array like this (let's assume we have a display function that writes a line of text somewhere):

var stuff, index;
stuff = ['a', 'b', 'c'];
for (index in stuff) { // <== WRONG
    display("stuff[" + index + "] = " + stuff[index]);
}

And in a limited environment, that is indeed likely to show this output:

stuff[0] = a
stuff[1] = b
stuff[2] = c

But you can't rely on that, doing so will come back and bite you.

The Reality

for..in loops through the enumerable property names of an object, not the indexes of an array. This is much more powerful than something that just loops through array indexes. Here's an example:

var person, name;
person = {
    fname: "Joe",
    lname: "Bloggs",
    age:   47
};
for (name in person) {
    display("person[" + name + "] = " + person[name]);
}

There we have an object, person, which has three properties: fname, lname, and age. The loop displays each of these properties and its value, for instance:

person[lname] = Bloggs
person[fname] = Joe
person[age] = 47

Note that the properties are not necessarily listed in any particular order (they're not alphabetical, they're not in order of when or how they were added to the object, or in any other order you can rely on; it completely varies depending on the JavaScript implementation).

Now you may be thinking "Well, sure, so it does something different with objects than it does with arrays, so what?" But it doesn't. This is an important point and so I'm going to emphasize it:

Arrays in JavaScript are just objects (with one special feature), and array indexes are just property names. In fact, the name "array" is a bit misleading, because JavaScript arrays are not like arrays in most other languages. For one thing, JavaScript arrays are not contiguous blocks of memory. For another, the indexes into them are not offsets. In fact, array "indexes" aren't even numbers, they're strings. The only reason we get away with things like stuff[0] is that the 0 gets automatically converted into "0" for us. (Don't believe me? Read Section 11.2.1 ["Property Accessors"] of the spec.) So we don't really have array "indexes" at all, we just have property names that consist entirely of the digits 0-9 (without extraneous leading zeroes). But it's convenient to call those property names "indexes."

(Update: Here I'm talking about JavaScript's standard Array type. Many environments now also support new typed arrays, which are different.)

The one special feature of arrays is, of course, their length property, which behaves in a way to make arrays seem more like arrays in other languages (I won't go into the details here, not that they're complex; check out Section 15.4 ["Array Objects"] of the spec).

"Hey wait!" you say, "Speaking of length, if for..in really loops through the property names of the object, why don't we see length in the list?!" Good question. It's because for..in loops through the enumerable property names of an object, and length is not an enumerable property. Arrays (and other objects) are defined with lots of non-enumerable properties. (We'll come back to that in a moment.)

For now, let's consider what happens if we add another (enumerable) property to the array:

var stuff, index;
stuff = ['a', 'b', 'c'];
stuff.name = "Foo";
for (index in stuff) {
    display("stuff[" + index + "] = " + stuff[index]);
}

That's a perfectly legitimate and reasonable thing to do, we've added a name property to the array. Now what gets shown? Right! Four things:

stuff[name] = Foo
stuff[0] = a
stuff[1] = b
stuff[2] = c

Because for..in loops through all of the property names of the object, not just the "indexes."

And this is one of the places we get into differences in the order of property names. Some implementations will display the above, with the name property followed by the numeric properties; other implementations will do it the other way around, with name at the end. In fact, in theory, name could be in the middle. In fact, in theory even the numeric indexes might not be done in order (there's nothing in the spec saying that they have to be), but in practice I've never seen an implementation that did the numeric indexes out of order. Regardless, it's not guaranteeed.

Okay, but as long as we don't put other properties on our array, we can use for..in as though it did just the indexes, right? Wrong. That's where we get into the complexities of the real world, because for..in doesn't list only the object's own properties, it lists all of the properties of its prototype, and its prototype's prototype, etc. So that means not only not adding anything to your specific array instance, but not adding anything to Array.prototype either — and it turns out that it's really handy to add to Array.prototype sometimes, handy enough that lots of libraries, plug-ins, and other bits of code you'll find around do it.

Consider the Array#indexOf method. This handy method finds an element in an array and returns its index. But although some implementations have had it for years, others haven't, and it was only added to the specification as of ECMAScript 5th edition (e.g., at the end of 2009). So there are still implementations that don't have it.

But what if you want to write code that assumes it's there? You could write your own function in a procedural programming way and use that instead, passing the array and the target element in as parameters. Your function could then call the indexOf method if it's there, or do a manual search otherwise. But that not only introduces overhead (an extra function call), it also isn't very object-oriented — and JavaScript is object-oriented down to its bones. So the usual solution (and in fact the one suggested by Mozilla) is to just add it if it's missing:

if (!Array.prototype.indexOf) {
    Array.prototype.indexOf = function(searchElement, index) {
        // ...
    };
}

Now you can use indexOf on any array instance, since the instances inherit the function from the prototype. Augmenting types like this is very common in JavaScript circles.

But the thing is, it creates an enumerable property, and so if we do that, then even our first code snippet at the top starts failing:

var stuff, index;
stuff = ['a', 'b', 'c'];
for (index in stuff) { // <== WRONG
    display("stuff[" + index + "] = " + stuff[index]);
}

Outputs:

stuff[0] = a
stuff[1] = b
stuff[2] = c
stuff[indexOf] = function...

Note that the function (which is, after all, just another property like any other property) now shows up in our loop.

You may be wondering why we don't see all the other functions from the Array prototype in our loops (splice, join, etc.). The answer (you guessed it!) is that they're all defined by the spec as non-enumerable properties.

So can we add our function as a non-enumerable property? As of this writing, no, but it probably won't be long before we can. The new ECMAScript 5th edition specification defines a way for us to add properties to objects with the "enumerable" flag set to false, so they don't show up in for..in loops. But as of this writing, the mechanism has only been implemented and deployed in Google's V8 JavaScript engine (which is used in their Chromium and Chrome browsers). While the other implementers aren't far behind (both Mozilla and Microsoft have it in betas of their next releases), it's not there yet. And of course, once the mechanism is widely implemented, then everyone doing it the old way has to change their code to do it the new way.

So in short: Non-enumerable properties are not going to address this issue for you any time soon. And besides, it's not really an issue. Looping over property names (not indexes!) is what for..in is for!

Now someone out there is probably thinking "But hey, this doesn't only apply to arrays! What if someone adds something to Object.prototype, won't that show up in my for..in loops, even on just normal objects?" Yes it will, but fortunately, a de-facto standard has arisen in the JavaScript community: It's okay to augment the prototypes of specific types, but not okay to augment the Object prototype. Those few projects that have done so have been swiftly beaten into submission.

Options for Looping through Array Indexes

So okay, for..in is really cool and helpful for other things, but if we can't use it to loop over the indexes of an array, what should we do instead?

There are several options:

1. Use a boring old-fashioned loop:

Sometimes, the old ways are the best:

// Boring old fashioned loop (a classic!)
var stuff, index;
stuff = ['a', 'b', 'c'];
for (index = 0; index < stuff.length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Perfectly clear to anyone who's ever programmed in any language syntactically derived from B (so C, C++, D, C#, Java, JavaScript, PHP, and about 30 others). Straight-forward.

Now of course, there are some variations on that. For when the length of the array won't change, there's the cache-the-length variant:

// Boring old fashioned loop, caching length of unchanging array
var stuff, index, length;
stuff = ['a', 'b', 'c'];
for (index = 0, length = stuff.length; index < length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Or for when order doesn't matter, there's the count-backward version to take advantage of any "greater-than-or-equal-to-zero" optimization that the language implementation may have:

// Boring old fashioned loop, counting backward
var stuff, index;
stuff = ['a', 'b', 'c'];
for (index = stuff.length - 1; index >= 0; --index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Before anyone gets into any performance wars over the above, though, some observations:

Unless you actually see a performance problem, it doesn't matter what you use, use what makes sense in relation to the code you're writing. (Don't optimize prematurely.)
Which one you use won't matter from a performance perspective unless the array is huge, in which case you probably have other things to worry about first.
JavaScript is not C/C++/C#/Java/whatever. The source-level optimizations that make sense in C/C++/C#/Java/whatever do not necessarily make sense in JavaScript. Don't assume they do.
(This is a biggie.) The optimizations that improve performance on one implementation (say, in IE) may make performance worse in other implementations (say, in Firefox).

2. Using `for..in` — Correctly

But the old ways aren't always best. By their very nature, JavaScript's arrays are sparse — that is, an array can have a length of 10,000 but only have two entries in it:

stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
display(stuff.length); // shows 10000

That means that if you use a boring old-fashioned loop, you may work a lot harder than you have to, because you'll execute the body of your loop whether the array has an entry there or not:

var stuff, index;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
for (index = 0; index < stuff.length; ++index) {
    display("stuff[" + index + "] = " + stuff[index]);
}

Although that loop correctly avoids outputting the name property, it will display 10,000 lines of output, with 9,998 of them ending with "= undefined" because the array doesn't have an entry there. Now, maybe that's what you want, maybe not (depends on what the array is for).

When you don't want to do that, you can use for..in — but use it correctly:

var stuff, index;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
for (index in stuff) {
    if (stuff.hasOwnProperty(index)  &&    // These are explained
        /^0$|^[1-9]\d*$/.test(index) &&    // and then hidden
        index <= 4294967294                // away below
       ) {
        display("stuff[" + index + "] = " + stuff[index]);
    }
}

Now we only loop three times (once for each property, the indexes and name), and we only output two lines, because the if statement's condition only evaluates true for the kinds of property names we call "array indexes."

So what's that condition doing? Two things:

It uses the hasOwnProperty function that's built into all objects to tell us whether the property is on the object itself (returns true), rather than being inherited from the object's prototype (returns false). This deals with any enumerable properties that might have been added to Array.prototype.
Second, it determines if the property name looks like an "array index" as defined in the specification: The name is (by definition) a string, and we use a regular expression to prove that it's in the correct standardized base-10 form (either "0" on its own, or a non-zero digit followed by zero or more digits), then we check that it's in range (0 <= index <= 2^32-2 [which is 4,294,967,294]). If the property name passes those tests, it's an array index. You may be wondering about that magic number, 2^32 - 2. Why that number? It's because the spec says that the length property will be in the range 0 to 2^32 - 1 (inclusive), and as length is one higher than the highest array index, the highest array index is 2^32 - 2. (17 July 2013: Many thanks to RobG for pointing out that my test in earlier versions of this article wasn't quite right.)

Now, we're not going to want to type that every time this comes up, so here's a function we can use that applies the correct test:

function arrayHasOwnIndex(array, prop) {
    return array.hasOwnProperty(prop) && /^0$|^[1-9]\d*$/.test(prop) && prop <= 4294967294; // 2^32 - 2
}

Which we'd use in the loop like so:

for (index in stuff) {
    if (arrayHasOwnIndex(stuff, index)) {
        display("stuff[" + index + "] = " + stuff[index]);
    }
}

If you're into extending built-in prototypes, you could even adapt that to put on Array.prototype and then call it like stuff.hasOwnIndex(index), but beware of conflicts, and use Object.defineProperty to make it non-enumerable if you can (e.g., in ES5-enabled environments).

3. Use the New `forEach` Function

The new ECMAScript 5th edition specification adds a function to arrays called forEach which calls a callback for each element in the array, where an element is an entry in the array whose name is an array "index" according to our earlier definition. It also guarantees that it will call them in order (according to the numeric value of their property name), lowest to highest. The function is called only for entries that actually exist. So:

var stuff;
stuff = [];
stuff[0] = "zero";
stuff[9999] = "nine thousand nine hundred and ninety-nine";
stuff.name = "foo";
stuff.forEach(function(value, index) {
    display("stuff[" + index + "] = " + value);
});

Like our for..in example earlier, this displays two lines. (Unlike our earlier for..in example, it guarantees the order — although again, I've never seen an implementation of JavaScript where the array "indexes" weren't handled in ascending order by for..in, it's just that the spec doesn't guarantee it.)

(The new spec defines several other handy Array functions aside from just forEach, it's worth taking a look.)

So should you use an old-fashioned loop, for..in correctly, or the new forEach (or one of the other functions like it?)? It depends entirely on what you're doing. For me, most of the time, a boring old-fashioned loop does the trick unless I'm using a sparse array for something, in which case I use for..in. But it's totally up to you. The main take-away point is that a for..in used incorrectly will bite you.

Happy Coding!

Friday, 8 October 2010

CSS and showing my age

Okay. I can't be the only one. Surely some of you also have trouble remembering the order of the values in CSS shorthand properties, e.g.:

margin: 10px 20px 30px 40px;

I mean, I'm an engineering kind of guy. I'm used to coordinate systems like (top,left)-(bottom,right) or (x1,y1)-(x2,y2). CSS, however, uses neither. Instead, it's top right bottom left. I expect there's a good reason.

So showing my age, but I just could not keep that in my head. I kept thinking, "if only I understood why they'd done it, maybe I could remember it." But I've gone one better, and am showing my age ^ 2: I'll just remember "tribble".

That's right. Cute furry (and ultimately dangerous) things from Star Trek: The Original Series. "tribble" as in T R B L as in Top Right Bottom Left:

         +----------------- Top
         |    +------------ Right
         |    |    +------- Bottom
         |    |    |    +-- Left
         v    v    v    v
margin: 10px 20px 30px 40px;

Sorted.

I suppose I'd be remiss if I opened this topic but didn't mention that you can also specify one, two, or three values, but the mnemonic holds well enough basically, just remember to start with the T and you'll be okay. Specifically:

         +----------------- (All four)
         v
margin: 10px;

         +----------------- Top & Bottom
         |    +------------ Right & Left
         v    v
margin: 10px 20px;

         +----------------- Top
         |    +------------ Right & Left
         |    |    +------- Bottom
         v    v    v
margin: 10px 20px 30px;

Right, well, as there are no computers around here to reason into a conflicted meltdown, I guess I'll head off for a Saurian brandy and see if I can chat up an Orion slavegirl. I've always been partial to scantily-clad brunettes with green skin...

Thursday, 18 March 2010

Anonymouses Anonymous

2015 Update: Since this post was written in 2010, the JavaScript specification has been updated so that functions get assigned names more than they used to, by using the name of the property or variable they're being assigned to, if possible. I mostly haven't updated this post, but I have added the odd note where the new spec changes things.
But as of ES2015 (aka ES6), all of the following functions have names:

var foo = function() { };
var obj {
    foo: function() { }
};
obj.bar = function() { };

There isn't always a useful name to use, but when there is, as of ES2015, it's used.

In my JavaScript work, I see a huge number of anonymous functions. I'm not a fan. In today's post I'd like to show the problem and how I like to get around it.

The Short Version

The short version of this post would be: Anonymous functions don't have names. Functions having names is a Good Thing^(tm). So give them names.

But what fun would that be? Read on...

Anonymous Functions

First off, what do I mean by an anonymous function? Here's an example:

var a = ["Dave", "Maria", "Joe"];
a.sort(function(a, b) {
    return a.length - b.length;
});

You'll recall that the Array#sort function optionally accepts a function to call to compare two entries. The function I gave it above is anonymous -- it has no name. In contrast:

var a = ["Dave", "Maria", "Joe"];
a.sort(compareByLength);
function compareByLength(a, b) {
    return a.length - b.length;
}

Here, the function has a name.

But why would I want to use a named function in such a trivial example? I mean, we all know what it does, and I'm only going to use it once, so why not just define it inline as in the first example and leave it at that?

My main reason is that by giving the function a name, you help your tools help you. (I'd also dispute the "I'll only use it once" argument, but that's neither here nor there — as we'll see, you can reuse anonymous functions, and people frequently do.)

Granted the above is a trivial example, but you see anonymous functions in seriously non-trivial situations. For instance, you frequently see code like this (even this is a trivial example, but it's indicative of the larger structures people create, like the one currently used as an illustration on the main Wikipedia JavaScript article):

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
Person.prototype = {
    getFullName: function() {
        return this.firstName + " " + this.lastName;
    },
    toString: function() {
        return this.getFullName();
    }
};
var p = new Person("John", "Doe");
var s = p.toString();

We're creating a constructor named Person and then assigning two anonymous functions to its prototype.

"Huh?" I hear you say, "Those aren't anonymous, they have names — getFullName and toString."

Actually, they don't. (2015 update: They do now! The engine assigns the name based on the property the "anonymous" function expression is being assigned to. Much of the following is therefore outdated.) They're bound to properties with names, but the functions themselves don't have names. Don't believe me? Paste that code into your favorite JavaScript environment with a debugger, put a breakpoint on the return statement inside the function bound to the getFullName property (and isn't that a mouthful), and run until you hit the breakpoint.

What function does your debugger say you're in? What does the call stack look like?

That's right, if it's like most debuggers, it says you're in a function called (?) (an anonymous function) called from a function (?) (another anonymous function). How helpful. Stupid debugger.

Except it isn't the debugger's fault. It's not that the debugger is too stupid to see what the function's name is, it that it doesn't have one. Suppose I change that code to this:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
Person.prototype = {
    getFullName: function() {
        return this.firstName + " " + this.lastName;
    },
    toString: function() {
        return this.getFullName();
    }
};
Person.prototype.foo = Person.prototype.toString;
var p = new Person("John", "Doe");
var s = p.foo();

Using your same breakpoint, run it again. Again the debugger tells you you're in function (?) called from function (?). So should it be saying that the outer function's name is foo? Or toString? If you said foo, let me ask the question a different way. Let's add this to the end:

function bar(context, func) {
    var a;
    a = [];
    a[42] = func;
    return a[7 * 6].call(context);
}
var s2 = bar(p, p.foo);

So now, what's the function's name? a[7 * 6]? a[42]? func? foo? toString? All of them? None?

None of them, of course. It doesn't matter how many different ways we refer to it, the function itself has no name, and the debugger wouldn't be able to figure out a name for it in any but the most trivial cases even if we wanted it to (which we don't; how confusing would that be?!).

Do we care?

So okay, they don't have names. Do we care? Well, I do. When an exception gets thrown and reported to me by the debugger, I want to know where it was thrown. I want to know what the call stack looks like. I want to look at my list of breakpoints and see something meaningful.

What do we do about it?

Your first thought, like mine, might be to do this:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
Person.prototype = {
    getFullName: function getFullName() { // <= PROBLEMATIC CHANGE HERE
        return this.firstName + " " + this.lastName;
    },
    toString: function toString() {       // <= AND HERE
        return this.getFullName();
    }
};

Note that I've now put function names after the function keyword; I haven't changed anything else. Seems reasonable, doesn't it?

It is, but it doesn't work — at least, not in most JavaScript implementations in the wild. Arguably it should work, but then again, arguably I should be slim, rich, and famous. Things aren't always as they should be. Those things above are called Named Function Expressions — they're function expressions (as opposed to function statements) and they have names. Internet Explorer (well, JScript in general), Safari, and several others have bugs related to them. For details, check out Juriy Zaytsev's article on the subject.

Okay, so we can't do that. How 'bout this:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
function getFullName() {
    return this.firstName + " " + this.lastName;
}
function toString() {
    return this.getFullName();
}
Person.prototype = {
    getFullName: getFullName,
    toString:    toString
};

Well, okay, that does work, but it still has some issues:

I seem to be throwing all sorts of things into the global namespace.
The function name toString doesn't tell me much, assuming I have several different objects that all support the toString function.
I have to repeat the function name three times, just to make things work.

So not ideal, but perhaps a step closer.

Mini Modules

So let's take the global name problem first: How do I avoid polluting the global namespace? The usual way is with the module pattern: Define a scoping function, define your bits and pieces within the scoping function, and only make public (export) the things you really need to be global (and in many applications of the module pattern, that can be nothing at all).

So the Person object isn't a module, but using a scoping function is absolutely how we can avoid global names, so let's see how that could work:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
(function() {
    function getFullName() {
        return this.firstName + " " + this.lastName;
    }
    function toString() {
        return this.getFullName();
    }
    Person.prototype = {
        getFullName: getFullName,
        toString:    toString
    };
})();

There, we define an anonymous function, define our stuff inside it, and call it right away. Now our getFullName and toString functions aren't globals anymore; the only way you can access them is via the properties on the Person prototype. And yet, they have proper names. Yay! (And yes, I'm aware of the irony of using an anonymous function — my scoping function — to avoid having anonymous functions. But wait! I could name the scoping function. But then I'm polluting the global namespace. Well, I can scope the scoping function. Oh, but...)

We'll revisit that structure in a bit, but let's move on to problem #2: toString is pretty generic. My call stack won't tell me which toString I'm dealing with — the one on Person? Place? Thing? So how do we deal with that?

Well, in a funny way, by exploiting the fact that the function name has nothing to do with the name of the property that refers to the function:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
(function() {
    function Person_getFullName() {
        return this.firstName + " " + this.lastName;
    }
    function Person_toString() {
        return this.getFullName();
    }
    Person.prototype = {
        getFullName: Person_getFullName,
        toString:    Person_toString
    };
})();

There, now my call stack will show Person_toString and I'll know where I am. The property is still toString.

And a straw man in the back calls out: "But that means I'm typing 'Person' all over the place!" (An issue closely related to problem #3 from the earlier list.) To which my response is: Yes. Get over it, and configure your tools to help. Seriously. In today's world, our editors are very configurable with templates and triggers and macros and such, and to my mind the value (in a project of any size) of knowing what you're dealing with outweighs a bit of extra typing.

"But what about download size?!" someone who works on web apps calls out. And it's true, for the toString function we're saying "toString" three times instead of once, and "Person" twice (in relation to toString). Similarly, we've said "getFullName" two "extra" times (and there are its two "Person"s as well). In all, there's a fair bit of repetition in that solution.

Well, there is, but I very much doubt your final script's download size will be significantly impacted. Here's why: The names you use in this context are (I assert) a very small part of your overall script. Remember that when you call these functions, you do so through the (shorter) property name. Most of your script is probably logic. It's true we want to keep functions as small as feasible (more smaller functions is better than fewer larger functions, for maintenance and readability reasons), but my take is that this is really only needed for public functions on instances, not the private functions used for implementation of an instance.

"Private functions?" Yeah! Because having the scoping function makes it really easy to have private object functions. I've written about that aspect in some detail before, but just briefly:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
(function() {
    function buildFullName(fname, lname) {
        return fname + " " + lname;
    }
    function Person_getFullName() {
        return buildFullName(this.firstName, this.lastName);
    }
    function Person_toString() {
        return this.getFullName();
    }
    Person.prototype = {
        getFullName: Person_getFullName,
        toString:    Person_toString
    };
})();

The buildFullName function is completely private, only the other functions in that scoping function have access to it. The way I called it there, it doesn't have access to the instance's properties, but you can either use it as an object-wide (rather than instance-wide) function (what we'd call a class function in class-based programming), or you can call it differently and it's a truly private instance function:

function Person(fname, lname) {
    this.firstName = fname;
    this.lastName = lname;
}
(function() {
    function buildFullName() {
        return this.firstName + " " + this.lastName;
    }
    function Person_getFullName() {
        return buildFullName.call(this);
    }
    function Person_toString() {
        return this.getFullName();
    }
    Person.prototype = {
        getFullName: Person_getFullName,
        toString:    Person_toString
    };
})();

(Note how Person_getFullName now uses call to call it.)

My compromise on the repetition thing is to be explicit in my public functions, but not to worry about it in private implementation functions — because I can easily tell what object is involved from the other calls on the stack (since they're private, they can only be called from the public ones). Since in most implementations the private functions will substantially outnumber the public ones, we end up with not too much impact (and a lot of debugging/maintenance gain).

Help yourself to help your tools help you

To me, the Person structure we've been kicking around here is a bit clunky, but there are lots of things you can do to make it less so through helper functions. My usual declaration for Person would look like this (but with comments):

var Person = Class.define(function() {
    var pubs = {};

    pubs.initialize = Person_initialize;
    function Person_initialize(fname, lname) {
        this.firstName = fname;
        this.lastName = lname;
    }

    pubs.getFullName = Person_getFullName;
    function Person_getFullName() {
        return buildFullName.call(this);
    }

    pubs.toString = Person_toString;
    function Person_toString() {
        return this.getFullName();
    }

    function buildFullName() {
        return this.firstName + " " + this.lastName;
    }

    return pubs;
});

Yes, technically that's one line longer (not counting the vertical breaks I added for clarity), but to me it's clearer and there's little chance of my forgetting to export a public function if I do so right where the function is declared. My Class.define function creates a constructor for me (for various reasons) and then has that constructor call my initialize function. It automatically knows to call the anonymous function I hand it and use the resulting object's properties as the public members for the thing I'm defining, so I don't have to do that explicitly. For details, check out my article on efficient supercalls.

Conclusion

So that's my take on it, anyway: Help yourself, help your tools help you. Almost any time I find myself starting to write an anonymous function, I stop, write a function with a name, and use that instead — even if I'm writing a closure that will be used and then thrown away, too often I've run into exceptions or call stacks where I just couldn't figure out where I was when I used anonymous functions. Maybe the Array#sort is a bit too trivial to bother, or maybe not (hint: try it on Internet Explorer with this array: [10, "Maria", "Joe"] -- and don't you want to know which function threw that exception?).

Happy coding!

Thursday, 16 October 2008

Minimizing Download Times

Hello all,

That's right, first post since...wow, since April. And it's not even a post, it's sort of a link-post.

I've been doing some work helping build the Prototype user community (moderating the user discussion group, creating an unofficial wiki, that kind of thing) and as part of that I've been doing little mini-articles, much like the ones I expected to do here.

So if you're writing web applications or web pages and you're interested in minimizing the download times for your scripts, check out this article I posted over there: Tip - Minimizing Download Times

Saturday, 29 March 2008

What's in a name?

Micro-post today, folks:

You see a fair bit of confusion from newbies in mailing lists around the "name" attribute vs. the "id" attribute. (Edit 21 March 2010: Confusion not helped by a bug in Internet Explorer; more here.) For instance, I recently saw a "how do I do this" -style post asking how to deal with a form, with the sample being:

<form name='form1'>
<label><input type='checkbox' id='cb1'> Checkbox 1</label>
</form>

The poster wanted to know how to retrieve the form and submit it (along with some further information) via Prototype's Ajax.Updater. So he wanted to use the serialize() method Prototype adds to form elements when it extends them (e.g., when you retrieve the element using $()).

Consequently, the use of "id" and "name" attributes in his example was exactly backward: He wanted to be able to retrieve the form using $(), which is a general-purpose routine that retrieves elements by their unique ID, and then have the form fields submitted -- but the form field had no name.

Here's the mantra:

Elements have IDs; form fields have names.

IDs are unique in the document; form field names are not (necessarily).

Two further notes:

1. Form fields can also have IDs if you need to refer to their elements in your code (to enable/disable them, etc.), but in terms of sending in a form, fields have names.

2. You don't have to use an ID to get at the element for a form; you can use a name and then get at the form element via document.YourFormName in your JavaScript code. But nowadays we mostly look elements up by their unique IDs; and in the specific case of the question from the poster above, since he was going to want to use $(), he would want an ID.