Nifty Snippets: pitfall

Showing posts with label pitfall. Show all posts

Tuesday, 16 March 2010

...by any other name, would smell as sweet

Although it's not a new problem, lately I've been seeing so many people running into a specific issue with Internet Explorer that I thought it worth just jotting down what the problem is and how to get around it.

The problem, in brief, is conflation. Internet Explorer (v6 and v7) mixes together the namespaces of the id and name attributes, which should be completely distinct from one another. id, you'll recall, must be unique within the document; it uniquely identifies an element. name, on the other hand, is not for uniquely identifying things (well, not mostly). It's used for a couple of different things, such as giving form fields the names they'll have when submitted, or giving anchors a name. There's no requirement that name values be unique — in fact, when doing forms, there are lots of reasons for using the same name for multiple fields (radio buttons, for example).

Unfortunately, the Internet Explorer engine will find elements by name sometimes when it really shouldn't, specifically when you're using document.getElementById. So for instance, say you have a form in your document with a 'stuff' field:

<input type='text' name='stuff'>

and later in that same document you have a div with the id "stuff":

<div id='stuff'>...</div>

Consider this code:

var elm;
elm = document.getElementById('stuff');
if (elm) {
    alert("The element's tagName is " + elm.tagName);
}
else {
    alert("Couldn't find the element.");
}

In a browser that implements document.getElementById correctly, that should alert "The element's tagName is DIV". But on IE, it will alert "The element's tagName is INPUT" because it incorrectly finds the input field. The only way around this is to change the id or name of one element or the other.

Another related problem is that id values are not case-sensitive in IE6 or IE7, although of course the standard says they should be. So it'll also confuse an element with the id (or name) 'stuff' with one with the id (or name) 'Stuff'. (To be fair, surely you and I would as well?)

There's good news, though: Microsoft does document the behavior, and they've fixed it in IE8, so there's hope for the future.

Perhaps slightly OT, but lest people accuse me of Microsoft-bashing (er, I have been known...), let me just remind everyone who brought us XMLHttpRequest (the basis of Ajax) and innerHTML, both of which I use (directly or indirectly) in nearly every JavaScript-enabled browser project I do — and I bet you do too. So hey, they got many things wrong, but got some things very right as well. Also useful to remember — as we continue to bash IE6 with repeated shouts of "Die! Die!" — just how much amazingly better (faster, less crash-prone, more feature-rich) it was than its chief rival, back in the day.

Tuesday, 1 April 2008

You must remember 'this'

Of all the tech blogs in all the sites in all the worldwide web, you walk into mine...

If you hang out in JavaScript-oriented newsgroups like these for any length of time, you will eventually see some variation of this question:

Hey, why doesn't this work?
function MyWidget(name)
{
    this.name = name;
    this.element = null;
}
MyWidget.prototype.showName = function()
{
    alert('The name is ' + this.name);
}
MyWidget.prototype.hookElement = function(element)
{
    this.element = element;
    Event.observe(this.element, 'click', this.showName);
}
function test()
{
    var widget;

    widget = new MyWidget('Test Name');
    widget.hookElement(document.getElementById('testDiv'));
}
"testDiv" is a div in the document, and I know that I'm not calling the test() function before the DOM is loaded, so why is it when I click the div I get the message "The name is undefined"?!

(As always, I'm using some convenience syntax in the above for hooking up the event handler.)

The OP (original poster) might even follow on with:

I even tried changing the observe() line to this:
Event.observe(this.element, 'click', function() { this.showName(); });
because I heard somewhere that you have to do that, but that's even worse, it causes an error saying this.showName() isn't a function?!

The issue here is that the OP hasn't quite grokked "this" and its special role in the JavaScript world.

I talked a bit about 'this' over here, but I wanted to do a post focussing on the specific pitfall the OP above, like so many of us, has fallen into (forgetting 'this') and how you deal with it.

Let's look at what's wrong with this line first:

Event.observe(this.element, 'click', this.showName); // Wrong

JavaScript doesn't have methods (see link above), and so this.showName just returns a function reference with absolutely no connection to the instance the OP wanted to bind to the element. It's just a function. (Used properly, this is a powerful feature, but in this situation it's causing the OP some trouble.) Recall that showName is defined like this:

MyWidget.prototype.showName = function()
{
    alert('The name is ' + this.name);
}

Within the code, 'this' is determined not by where the event handler is set up, but by how the function gets called. Most likely, 'this' will be a reference to the element that was clicked ('testDiv'), because modern browsers use the element related to the event as 'this' within event handlers. Consequently, this.name is undefined unless the element in question happens to have a name attribute.

So to get the intended effect, you have to wrap the call to this.showName() so that 'this' is the MyWidget instance when the code gets executed -- you must remember 'this'. Which is probably what the OP heard about when he tried this:

Event.observe(this.element, 'click', function() {
    this.showName();
}); // Still wrong

This is getting closer, and in fact it would work if we were using a variable to reference the widget rather than 'this', but because it's 'this', we actually still have exactly the same problem we had before: When the event handler gets called, 'this' is the element, not the widget, and so there's no showName() function to call.

So how do we deal with this? Well, here's one approach I've seen to rewriting the hookElement function:

MyWidget.prototype.hookElement = function(element)
{
    var self;

    this.element = element;
    self = this;
    Event.observe(this.element, 'click', function() {
        self.showName();
    });
}

This works because we're no longer using 'this' within the event handler, we're using 'self' (the event handler has access to 'self' because it's a closure; more here). So that solution works. I can't say I like it, though. It just feels...hacky, I guess. But still, it works, and although it looks a bit funny the first time, if you're familiar with the idiom you read right past it thereafter. You just need to be sure the closure isn't unnecessarily preserving some other big amount of data from elsewhere in the function.

Personally, though, I prefer using a reusable "binding" function. Many JavaScript toolkits have these (such as Prototype's bind() and bindAsEventListener()), but it's not complicated:

function bind(f, obj)
{
    return function() {
        return f.apply(obj, arguments);
    };
}

This is a function factory: It creates functions that, when called, will call the given function with the given object set as 'this' (using JavaScript's convenient apply() function; insert your own "The fundamental things apply" joke here). Now we can rewrite the OP's hookElement function like so (changes from the original at the top in bold):

MyWidget.prototype.hookElement = function(element)
{
    this.element = element;
    Event.observe(this.element, 'click',
        bind(this.showName, this)
    );
}

You might be wondering why we have to specify 'this' twice. Remember that this.showName just returns a function reference, with nothing about the instance (we could replace this.showName in the above with MyWidget.prototype.showName if we liked). If we want bind() to know what instance we want to bind the function to, we have to specify it -- the this at the end.

And that's it! Now the event handler works as the OP expected it to.

Tuesday, 18 March 2008

The Horror of Implicit Globals

This code should cause an error, right?

function doSomething()
{
   var x;

   x = 10;
   y = 20;

   alert('x = ' + x);
   alert('y = ' + y);
}

Well, you'd think so, wouldn't you? (There's no declaration for y.) It doesn't, though. It also probably doesn't do what the author intended. Welcome to The Horror of Implicit Globals. The good news is that there's something we can do about it.

Update 2010/03/31: Please see the update at the end of the article, there's good news on this front.

So what did the code really do? Well, in a simple test, it would seem to do what the author intended, in that it shows two alerts:

x = 10
y = 20

But x and y are completely different; x is a local variable within the function, whereas y is created as a "global variable". Said "global variable" is not declared anywhere at all, it exists purely because it was assigned to by this function. Functions can create global variables willy-nilly with no advance declaration or indeed, with no declaration at all.

This is clearly a Bad Thing, so why would the language designers let it happen? Well, in some sense I think it's a side-effect. To see why we have the possibility of implicit globals, we have to delve into objects and properties.

If you've done any JavaScript stuff with objects, you know that you can assign a property to an object simply by, well, assigning it:

myobject.myprop = 5;

This sets the property myprop on the object myobject, creating a new property if the property didn't exist before. This is simple, it's convenient, and it's part of the power of JavaScript that objects can just acquire properties without any preamble.

But what does this have to do with implicit global variables? Well, you see, JavaScript doesn't have global variables. No, seriously. It has a global object (just the one), and that object has properties. Those properties are what we think of as global (or page-scope) variables. You might think that this is a distinction without a difference, but if it were, the code shown at the beginning of this post would fail because y isn't declared. It doesn't fail, because by assigning y a value, we created a property on the global object.

"Now hang on," you're saying, "there's nothing in that code referencing some kind of 'global object'." Ah, but there is. And that brings us to the scope chain.

I've mentioned the scope chain before. The scope chain is how JavaScript resolves unqualified references: In any given execution context (e.g., global code or function), there is a chain of objects that the JavaScript engine looks to when resolving an unqualified reference: It first checks to see if the top object has a property with the given name, and uses that property if it does; if not, it checks the next object in the chain, etc. The global object is always the last object in the chain. If the engine gets all the way down to the global object and we're assigning a value to the unqualified reference, the reference is assigned as a property of the global object. And since properties don't have to be declared in advance, voilá, implicit globals.

We can see this in practice. The JavaScript specification says simply that there is a global object and that it's always the root of the scope chain, but it doesn't say what it is -- which makes sense as JavaScript is a general-purpose language. In browser-based implementations, though, the global object has a name: window. Technically, the symbol window in browser-based JavaScript is a property of the global object that refers to the object itself. I mention this not only because it's useful to know, but because it lets us prove to ourselves that "global variables" are really properties of the global object. Let's add a line to our code from above:

function doSomething()
{
   var x;

   x = 10;
   y = 20;

   alert('x = ' + x);
   alert('y = ' + y);
   alert('window.y = ' + window.y);
}

This starts out by showing the same two alerts we had before:

x = 10
y = 20

...and then it shows a third:

window.y = 20

...because y and window.y both refer to the same property of the same object, the global object.

All of this holds true even if we do declare y at global scope:

var y;
function doSomething()
{
   var x;

   x = 10;
   y = 20;

   alert('x = ' + x);
   alert('y = ' + y);
   alert('window.y = ' + window.y);
}

This shows the same three alerts shown earlier. (There is a very slight technical difference between declared and undeclared globals that relates to the delete statement, but the distinction isn't important here, it's just a flag on the property.)

Now, some readers will be thinking "Cool! This means I don't have to have 'var' statements for my globals! I can save a bit of space on the downloaded script files!" Well, true, you could. It's an awfully bad idea, though. Not only would it be a bit unfriendly to anyone else trying to read your script (I suppose you could address that somewhat with comments that get stripped out before download), but it would also mean you couldn't use any lint tools, and you really, really want to. Because the ramification of all of this is that a simple typo, perhaps:

thigny = 10;

instead of

thingy = 10;

can create a new global variable in your page, introducing a bug that's awfully hard to find. Fortunately, though, people have created various tools to do lint checking on JavaScript, tools that can find that error for you before you even get to the testing stage, much less production.

And so there we are, the horror of implicit globals -- and the relief of knowing that we have a defense against them!

Update 2010/03/31: Great news for anyone who's been hit by the Horror of Implicit Globals. The new ECMAScript 5th edition specification is out, and it introduces "strict mode." One of the (several) things strict mode does is prevent the very thing this article warns about: Implicit globals. If we were using strict mode, the function at the beginning of this page would have a syntax error rather than a very subtle bug. Result!

Monday, 3 March 2008

Poor misunderstood 'var'

It seems most programmers coming to JavaScript from C, C++, Java, and the like equate the var statement with variable declaration statements in the languages they come from and use it the same way. And at some casual level that's reasonable; but it can lead you down a misleading path...

Consider this code:

function foo()
{
    var ar;

    // ...do some stuff, create an array in 'ar'...

    for (var index = 0; index < ar.length; ++index)
    {
        doSomethingWith(ar[index]);
    }
}

This is a common idiom, but a misleading one. You might think that index is only defined within the for loop (that's certainly the impression we're giving in the code). But it's not true: In fact, index is defined throughout the function -- within the loop, outside the loop, above the loop, and below the loop. The var statement defines a variable within the current scope (all of it, not just "from here on"), and unlike some other languages, in JavaScript blocks don't have any effect on scope; only functions introduce a new scope.

Consequently, the above function can be written as it is above, but also with the index declaration...

...at the top:

function foo()
{
    var ar;
    var index;

    // ...do some stuff, create an array in 'ar'...

    for (index = 0; index < ar.length; ++index)
    {
        doSomethingWith(ar[index]);
    }
}

...at the bottom:

function foo()
{
    var ar;

    // ...do some stuff, create an array in 'ar'...

    for (index = 0; index < ar.length; ++index)
    {
        doSomethingWith(ar[index]);
    }

    var index;
}

...anywhere in the middle:

function foo()
{
    var ar;

    // ...do some stuff, create an array in 'ar'...

    for (index = 0; index < ar.length; ++index)
    {
        var index;
        doSomethingWith(ar[index]);
    }
}

...or even all of them!

function foo()
{
    var ar;
    var index;

    // ...do some stuff, create an array in 'ar'...

    for (var index = 0; index < ar.length; ++index)
    {
        doSomethingWith(ar[index]);
    }

    var index;
}

We can get away with that last one because a var statement defining a variable that already exists in the current scope does not replace the variable (this is what keeps you from accidentally masking your function's arguments, or even the arguments array that's provided for you).

This seems like an odd way to define the var statement until you get into the plumbing of JavaScript and how it sets up calls to functions. You can get into some of that by reading my earlier post, Closures are not complicated, but the net effect of the plumbing is that all var statements are treated as though they were at the top of the function (if they have initializers, those become assignments and stay where they are).

So does that mean that the common idiom of declaring an indexer within the loop statement is "wrong"? Well, that's a matter of perspective, and ~~the older I get~~ the more experience I accumulate, the less I think in terms of absolutes like right and wrong. The language spec allows it, so in that sense it's not "wrong". In some ways, it's sort of a shorthand way of telling the next person reading the code that you're going to use it for the loop (and only for the loop, right?), so in that sense perhaps it's not "wrong".

But the further your code gets from expressing what's really happening, the easier it is for someone reading the code later (perhaps you!) to get the wrong end of the stick and introduce a problem. For example, suppose you have a 30-some-odd-line function and the loop appears in within the body of a conditional about two-thirds of the way down:

function foo(someArray)
{
    var thingy;
    var otherThingy;

    // ...20 lines of code...

    if (thingy > otherThingy)
    {
        for (var index = 0; index < someArray.length; ++index)
        {
            doSomethingWith(someArray[index]);
        }
    }

    // ...10 more lines of code...
}

Six months after you write this, Mike edits the function and needs to remember the index of something at the top so he can do something with it at the bottom; he declares an "index" variable, sets index at the top, and then uses it at the bottom, having missed the loop:

function foo(someArray)
{
    var thingy;
    var otherThingy;
    var index;

    index = findSomething(someArray);

    // ...20 lines of code...

    if (thingy > otherThingy)
    {
        for (var index = 0; index < someArray.length; ++index)
        {
            doSomethingWith(someArray[index]);
        }
    }

    // ...10 more lines of code...

    restoreSomething(someArray, index);
}

Mike's introduced a bug, an irritating, intermittent bug. Sometimes the restoreSomething call at the end fails for some reason; not always, mind, but sometimes. (Because index gets set by the loop, but only when thingy > otherThingy.)

Obviously, this bug could have been avoided if Mike had read through the entire function carefully before making his mods. Or if you'd chosen a different name for your index variable (in hopes of reducing the odds of Mike using it). Or it could have been caught by thorough unit tests that explore all conditions (and then Mike would have to go back and fix it).

But let's throw Mike a bone, eh? If we declare the variable in the text in the same place it's defined by the interpreter at runtime, we help him avoid making the mistake in the first place. And we like Mike, we don't want to trip him up...right?

Regardless of your decision about how to write your code, though, understanding what var is really doing can help you get that code doing what you want it to do.

Nifty Snippets