Top.Mail.Ru
silnith, posts by tag: java - LiveJournal — LiveJournal
? ?

silnith, posts by tag: java - LiveJournal — LiveJournal

May. 14th, 2020

01:20 pm - Criticisms of C# and .NET: Dynamic Linking

Code is often distributed in reusable pieces called libraries. In C#, libraries are Windows .dll files, "Dynamic Link Library". (If you compile and run C# code on Linux or MacOS, it will still produce and consume .dll files. Yeah, this was designed to be cross-platform. Sure it was.) Linking is the connecting of function calls in an executable to symbols in a library. In static linking, this is done at compile time and the code being called is fixed. In dymanic linking, this is deferred until the code is executed, so if a library is updated then code that links against it will automatically use the updated version for future executions.

In .NET Framework, the original runtime + standard libraries for C#, when a program is compiled, the compiler looks at the versions of all the libraries that the program uses, computes a cryptographic hash of each one, and embeds that information into the program. Then, when the program is run, it checks each library being loaded to make sure it exactly matches what the program was compiled against originally. If a library does not match exactly, the runtime aborts loading the library and in most cases the program stops running.

Look at the previous two paragraphs. In .NET Framework, libraries are distributed as "dynamic link libraries", but the runtime forces the library to exactly match what it was compiled against. This is the very definition of static linking, but it is misnamed as dynamic linking. When I first learned how .NET Framework worked, I was so stunned that I could do nothing but cover my face with my palm.

But it gets better! Developers immediately and correctly complained that this mechanism prevented them from updating the libraries used by an application. Microsoft's response was to invent a mechanism called "binding redirects". Binding redirects are written as entries in an XML file distributed alongside the executable. Inside the XML file, there are entries matching the signature that the .NET Framework linker expects to find for any particular library. This entry then specifies a new version number for the library that the .NET Framework linker should load instead of the one it was compiled against.

Read that again. .NET Framework uses static linking while calling it dynamic linking, and when people complained that that was not dynamic linking, it added an XML-configured replacement step in the static linking process. I cannot fathom words sufficient to express just how idiotic and backwards this whole process is.

By some miracle, when Microsoft undertook the effort to completely rewrite .NET, they did the sane thing and simply used dynamic linking as it has existed in the industry for longer than I have been alive. This rewrite was named .NET Core, and it is an immeasurably vast improvement over .NET Framework.

In Java, libraries are distributed as .jar files, which is also how applications are distributed. Libraries are discovered by the JVM scanning filesystem directories specified in an environment variable called CLASSPATH, very similar to how native executables locate native libraries on unix-style systems. In order to update a library, simply replace a .jar file with a newer version. Rational developers use a dependency management system such as Maven to manage their CLASSPATH automatically.

12:21 pm - Criticisms of C# and .NET: Enumerations

For a while now, people have asked me why I do not want to work in .NET anymore. The list is long, and it's irritating having to pick out individual complaints to explain. Therefore, I plan to write a series of articles explaining exactly why C# and .NET are so inferior to Java, in excruciating technical detail.

For the first, let me explain the difference between enumeration types in the two languages.

C# enumerations are based on C++ enumerations designed to be compatible with C. C++ enumerations are essentially an alias for a numeric primitive type such as int. Declaring an enum type consists of listing a sequence of constant names and optionally assigning a numeric value to each one. Any not assigned a numeric value take the value of the previous plus one, and the first takes the value of zero. However, there are no restrictions on the values assigned to any particular enumeration constant, so skipping values and duplicating values are perfectly legal and an often-used feature. In particular, if a library author wants to update the names of enumeration constants, new constants can be created and given the exact same values as the deprecated constants so compiled code will be identical. Another heavily-used example is the bitflag enumeration, where constants are assigned values that are powers of two so that they can be combined using logical operators. This is used for scenarios such as passing multiple options to a function in a single numeric primitive, saving stack and register space.

Java enumerations are fully-fledged objects. They can provide methods, implement interfaces, make use of polymorphism, and do anything else normal objects can do with one exception; they are considered final, so they cannot be subclassed or inherited. In addition, the JVM generates an ordinal value for every declared enumeration constant, starting at zero and increasing by one for each declared name. These ordinal values are fixed once the class is compiled [footnote 1], they cannot be reassigned or modified. They are guaranteed to be unique successive integer values starting with zero and ending at one less than the number of enumeration constants defined. If a new version of a library changes the ordering of the enumeration constants, the ordinal values will also be changed and the new class will not be compatible with the old class.

Old-school C programmers will instantly dislike Java enumerations for a variety of reasons. They are full objects instead of primitives, which takes more memory. They are passed by reference instead of value. It is not possible to combine them using bitwise operators. Method invocations go through the virtual table. Getting an ordinal value requires calling a method. Therefore, a lot of C and C++ programmers prefer C# enumerations over Java enumerations. But let's analyze these complaints in detail.

Full Objects

Java enumeration values are full objects instead of primitive numeric values. This may sound like a disadvantage, but in practice it really is not. The objects are created when the enum class is loaded by the classloader, and the objects remain static thereafter. All uses of enumeration constants in the code are just references to these static objects, so they are really just pointers once converted to machine code, and their size will be one native word for the architecture. On most machines this will be a 64-bit or 32-bit address, which is no less efficient than the numeric primitive that a C# enumeration constant would be. Modern architectures never load less than a native word at a time, so defining a type to be smaller than that does not make the resulting code any faster (and in some rare cases can actually make it slower). And since the constants refer to single instances, using enumerations causes no additional garbage and does not impact the garbage collector.

Bitwise Operations

C# enumerations can be combined using bitwise logical operators. Java enumerations cannot. On the surface this looks like an advantage for C#. But once you actually start using them, reality shapes up a little differently. In particular, modern development makes heavy use of standard libraries and standard data structures. C# code is filled with instances of Dictionary<Key, Value> implementing the interface IDictionary<Key, Value>. Java code is filled with instances of HashMap<K, V> implementing Map<K, V>. Good coders use the interfaces so that underlying types can be swapped out as needed. And in a truly epic fail, C# initially did not have any generics and then later provided reified generics, which meant the original interfaces were not compatible with the new generics forcing library authors to implement two different interfaces for each data structure. Because of this, all the standard data structures end up passing through the non-generic interface at some point, so any use of C# enumerations in standard data structures suffers the auto-boxing and auto-unboxing cost. [footnote 2] This makes the performance gains of using primitives wash away. But the problem is even more insidious than that. With C++ enumerations and templates, one could make use of template specialization to provide optimized data structures for types using enumerations. Since C# does not have a mechanism for "generic specialization", this is not possible in C#. Enterprising coders (including myself) have nevertheless tried to write specialized maps and lists for enumeration types, only to find that the C# language specification actually forbids using the Enum type as a bound for any generic declaration. [footnote 3] A few people have ignored that rule and tried anyway, and found that it worked on the current implementation of C#. However, due to the language restriction, there is no guarantee that it will continue to work on future versions of C#, something that Microsoft emphasizes strongly.

Contrast this with Java. In Java, there are types EnumMap<K, V> and EnumSet<T> provided in the standard library. These are implementations of the Map<K, V> and Set<T> interfaces specifically written for enumeration types. The EnumSet<T> class stores the set values by taking the ordinal values of the enumeration constants and converting them to bitflags that it combines into numeric primitives. For any enumeration with 64 or fewer constants, it simply stores everything in one long variable. For enumerations with more than 64 constants, it stores everything in an array of long values, so there is no practical limit on the number of enumeration constants. The EnumMap<K, V> is even simpler, it stores map entries in an array with length equal to the number of enumeration constants. Since all enumerations are guaranteed to have generated ordinal values in the range [0, size), the range check for the array access can be optimized out and lookups end up being one type check (which can also be optimized out in many scenarios) and one array access. Contrast this with a C# data structure containing enumeration constants, which must generate the hash code, deal with hash collisions, and be prepared for bucket table resizing. In all scenarios, the Java enumeration structures are faster than C# structures containing enumerations, take less memory, and provide extensive opportunities for low-level optimization of the machine code by the JIT compiler.

Now think back to the popular usage scenario for C and C++ enumeration types, bitwise flags. As it turns out, thanks to the EnumSet<T> type, this exact scenario is fully supported in Java code. And the code ends up being even more clear. Here is some C++ code using enumerations.

enum {
    flag1 = 1 << 0,
    flag2 = 1 << 1,
    flag3 = 1 << 2
} foo;

function_call("parameter", flag1 | flag2 | flag3);

The Java equivalent:

public enum Foo {
    Flag1,
    Flag2,
    Flag3;
}

obj.callMethod("parameter", EnumSet.of(Flag1, Flag2, Flag3));

When the code runs, both versions end up using a single numeric primitive to store the flags. But in the Java version, that container implements the Set<T> interface, and can be passed to any standard algorithm or existing code that accepts Set<T> or Collection<T>, and it will retain its optimal properties. Furthermore, the Java version does not need to worry about the 64-element limit (or 32-element limit on 32-bit architectures), nor does it need the programmer to studiously list out all the flag values allowing mistakes. And finally, checking for a specific value is far simpler:

C++:

foo my_foo;
if (my_foo & flag2 == flag2) {}
if (my_foo & (flag1 | flag3) == (flag1 | flag3)) {}

my_foo = my_foo & ~(flag2 | flag3);

Java:

Foo myFoo;
if (myFoo.contains(Flag2)) {}
if (myFoo.containsAll(EnumSet.of(Flag1, Flag3)) {}

myFoo.removeAll(EnumSet.of(Flag2, Flag3));

In the "remove all" case, the EnumSet<T> implementation will check if the parameter is also an EnumSet<T>, and if so it will use the bitwise logical operators to perform the set subtraction. Otherwise, it will iterate. The programmer can safely use the clearest code and not need to worry about the implementation details, and the generated code will still maintain its optimized behavior. Using the Java specialized types for enumerations always works correctly, never runs slower than not using them, and in all common cases is far more efficient. And at all points in the code standard interface types can be used without sacrificing anything.

Polymorphism

C# enumerations are simply primitive numeric types. There can be no added methods, no implemented interfaces, no polymorphism. Any additional behavior for the enumeration type must be provided by the users of that type, encouraging code copying and the breaking of encapsulation. Java enumerations are full objects and can provide methods, contain members, implement interfaces, expose static members, and more. The behavior of having non-contiguous values or duplicate values can be provided with a custom method, it is only the ordinal() method that cannot be modified. Classes are free to provide their own customOrdinal() method that returns anything at all. (But in practice virtually all uses of non-contiguous ordinals are better handled by the specialized data structures.)

And there is one feature that is quite powerful but a lot of people overlook. Java enumeration values can override methods per-constant.

public enum Foo {
    Value1,
    Value2 {
        @Override
        public String surpriseMe() {
            return "This is a surprise!";
        }
    },
    Value3;

    public String surpriseMe() {
        return "No.";
    }
}

Foo foo = ...;
System.out.println(foo.surpriseMe());

Calls to Foo.Value2.surpriseMe() will be polymorphic and return a different value than calls to Foo.Value1.surpriseMe(). Of course, this could also be achieved by declaring a String member that is initialized in the constructor, then passing different values to the constructors of different enumeration members. There is a lot of flexibility when using enumeration types in Java.

Overall the advantages of Java enumerations are so great that I actively look for opportunities to use them. In my LR(1) parser generator, I was careful to keep the type of terminal symbols unrestricted specifically because parser generators involve lots of set computations. The only restriction I placed was that the type implement a marker interface, and the caller could provide a custom factory for instances of sets of that type. Because of this, the code seamlessly handles both enumeration types and non-enumeration types, and providing a custom factory that generates instances of EnumSet<T> provides a noticeable improvement in performance with no code or logic changes. Forgetting to provide that custom factory results in it using standard HashSet<T> instances and everything works normally, just not as fast.

Footnotes

  1. I believe the ordinal values are actually generated when the class is loaded, not when it is compiled. The distinction may seem trivial but it does come into play in cases like serialization of enumeration types.
  2. This was true when I stepped through code using enumeration types in C# back when I (briefly) worked for Microsoft. It may have been fixed since then.
  3. This was true when I read the specification back when I worked for Microsoft. It may not be true for more current versions of C#.

Tags: , , , , ,

Jun. 8th, 2018

01:02 am - C# and Java

My new job is entirely working with C# on the Azure platform. After my brief stint at Microsoft, I deliberately avoided adding C# to my resume because I did not want to work in it again. A couple co-workers from my previous job convinced me to join them at Starbucks, and since I like them I decided to go for it and give C# another try.

The new codebase is wildly better than anything I had to deal with at Microsoft. It uses standard design patterns, is modular, has dependency injection and normal logging and unit tests. Overall it is a significantly better experience than the prior one.

However, even though I've been able to find C# analogues for most of the things I've come to rely on in the Java ecosystem, there is a constant stream of little frustrations and gotchas that make me appreciate all the things that Java provided for me.

Here is an example. Today I tried to figure out how to inject a database connection into a handler in a standard C#-idiomatic way. I was looking for something akin to JNDI in an application container, where I set up the database connection in external configuration and the code merely accepts and uses a generic connection interface.

After a few hours of searching online, I managed to cobble together enough tiny clues and analyze enough bad code samples to produce roughly this code (edited to remove company-specific details).

using System.Data;
using System.Data.Common;

namespace Org.Silnith
{
    public class DatabaseToucher
    {
        private readonly DbProviderFactory _dbProviderFactory;
        private readonly string _connectionString;
        
        public DatabaseToucher(DbProviderFactory dbProviderFactory, string connectionString)
        {
            _dbProviderFactory = dbProviderFactory;
            _connectionString = connectionString;
        }

        public async Task<int> Update(string transactionId)
        {
            using (var connection = _dbProviderFactory.CreateConnection())
            {
                if (connection == null)
                {
                    throw new Exception();
                }
                connection.ConnectionString = _connectionString;

                await connection.OpenAsync();
                try
                {
                    using (var command = connection.CreateCommand())
                    {
                        const string transactionIdParameterName = "@transactionId";
                        const string requestSentParameterName = "@requestSent";
                        var insertStatement = $"insert into transactions (transaction_id, request_sent) values ({transactionIdParameterName}, {requestSentParameterName})";

                        var transactionIdParameter = command.CreateParameter();
                        transactionIdParameter.ParameterName = transactionIdParameterName;

                        var requestSentParameter = command.CreateParameter();
                        requestSentParameter.ParameterName = requestSentParameterName;

                        command.CommandText = insertStatement;
                        command.Parameters.Add(transactionIdParameter);
                        command.Parameters.Add(requestSentParameter);

                        command.Prepare();

                        transactionIdParameter.Value = transactionId;
                        requestSentParameter.Value = false;

                        using (var transaction = connection.BeginTransaction(IsolationLevel.Serializable))
                        {
                            command.Transaction = transaction;

                            var rowsUpdated = await command.ExecuteNonQueryAsync();

                            transaction.Commit();

                            return rowsUpdated;
                        }
                    }
                }
                finally
                {
                    connection.Close();
                }
            }
        }

        public async Task<bool> Query(string transactionId)
        {
            using (var connection = _dbProviderFactory.CreateConnection())
            {
                if (connection == null)
                {
                    throw new Exception();
                }
                connection.ConnectionString = _connectionString;

                await connection.OpenAsync();
                try
                {
                    using (var command = connection.CreateCommand())
                    {
                        const string transactionIdParameterName = "@transactionId";
                        const string requestSentParameterName = "@requestSent";
                        var selectStatement = $"select count(*) from transactions where transaction_id = {transactionIdParameterName}";

                        var transactionIdParameter = command.CreateParameter();
                        transactionIdParameter.ParameterName = transactionIdParameterName;

                        command.CommandText = selectStatement;
                        command.Parameters.Add(transactionIdParameter);

                        command.Prepare();

                        transactionIdParameter.Value = transactionId;

                        using (var transaction = connection.BeginTransaction(IsolationLevel.Serializable))
                        {
                            command.Transaction = transaction;

                            var count = (int) await command.ExecuteScalarAsync();

                            transaction.Commit();

                            return count > 0;
                        }
                    }
                }
                finally
                {
                    connection.Close();
                }
            }
        }
    }
}

When my co-worker asked me what about the .NET database connection API was making me grumble, I took a deep breath and produced a rant a good thirty seconds long, cut off only because I started a coughing fit and could not continue.

For reference, this is the Java code that I wrote tonight that provides the same functionality.

package org.silnith.temp;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import javax.inject.Inject;
import javax.sql.DataSource;

public class DatabaseToucher {
    
    private final DataSource dataSource;
    
    @Inject
    public DatabaseToucher(final DataSource dataSource) {
        super();
        this.dataSource = dataSource;
    }

    public int insert(final String transactionId) throws SQLException {
        try (final Connection connection = dataSource.getConnection()) {
            connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
            connection.setAutoCommit(false);
            try (final PreparedStatement insertStatement = connection.prepareStatement("insert into transactions (transaction_id, request_sent) values (?, ?)")) {
                insertStatement.setString(1, transactionId);
                insertStatement.setBoolean(2, false);
                
                final int rowsUpdated = insertStatement.executeUpdate();
                
                connection.commit();
                
                return rowsUpdated;
            }
        }
    }
    
    public boolean query(final String transactionId) throws SQLException {
        try (final Connection connection = dataSource.getConnection()) {
            try (final PreparedStatement selectStatement = connection.prepareStatement("select count(*) from transactions where transaction_id = ?")) {
                selectStatement.setString(1, transactionId);
                
                try (final ResultSet resultSet = selectStatement.executeQuery()) {
                    while (resultSet.next()) {
                        int count = resultSet.getInt(1);
                        
                        return count > 0;
                    }
                    throw new IllegalStateException("Cannot reach this point.");
                }
            }
        }
    }
    
}

Note that I added extra lines to the Java version in an attempt to make it as functionally-equivalent to the C# version as possible. The original was actually shorter.

So why does the C# version bother me so much? Let me count the ways.

  1. The DbProviderFactory object only provides a mechanism for instantiating database drivers. It does not encapsulate all the information needed to create a connection to a specific database. Therefore, you also need to inject the database connection string to every location where you want to use a connection. Why does every object need to know what the database connection string is? That's breaking the abstraction in a horrible way, and allowing a significant source of errors.
  2. The DbProviderFactory.CreateConnection method can return null. So inside of a using block, you also need to check the variable to see if it is null and handle the failure.
  3. I already created a connection, now I also have to set the connection string and then open the connection, too? I think of a database connection as a functional object, not a class that provides the functionality for connecting manually. I said give me a connection, not give me the means to build a connection. In Java, when I get a connection it is an open connection to the database, ready to execute queries and updates.
  4. The idea of having named parameters always sounds nice in theory, but once you try it in practice you quickly discover that you have to repeat yourself constantly. I can either type the parameter names twice, once when specifying them and again when using them in the SQL statement, or I can use string interpolation to re-use a constant with the names in them. Reality does not come out as clean as theory predicted.
  5. I create objects to represent the parameters in the statement. It sounds reasonable, but in practice it does not really add any additional descriptive power or improve the type safety at all. And why do I need to attach the parameters back onto the command from which I created them in the first place? I said, Have this command create a parameter, it is perfectly reasonable to expect the parameter object to be associated with the command when I receive it.
  6. On a similar note, why do I have to use the connection to create a command, use the same connection to create a transaction, and then associate the two myself? A connection is inherently stateful, the whole point of having a persistent connection is so it can maintain state. If I can create a command and not associate it with the transaction, does that mean I can run multiple transactions on a single connection? That truly defies logic, you would be maintaining multiple states for a single stateful object (the connection). So if it makes no sense to have a command not associated with the open transaction, then why are they not pre-associated? It is additional work for the programmer and an opportunity for errors.
  7. Oh, yay. Asynchronous method calls. So instead of the thread blocking and the operating system doing a context switch, instead the programming language runtime can emulate a context switch. You still need to save and restore the stack, and you still incur the same cache and buffer change costs. But now there are two context switching mechanisms instead of one, and the new mechanism breaks a lot of the assumptions that the operating system was built around, as well as making useless any specialized hardware the machine offers for context switching.
  8. Oh, and since we had to manually open the connection instead of having the open implicit in acquiring it, we are also responsible for closing it as well, and we cannot make use of a using block for it since the open was a simple method call, not an object retrieval.

Now contrast with the Java version.

  1. The DataSource is an abstraction for the database itself, not a piece of code used to communicate with the database. Acquiring a connection gets a functional connection, ready to be used. Any failure throws an exception, so the returned object is guaranteed to never be null.
  2. Creation of the SQL prepared statement parameters is implicit in creating the statement abstraction object itself. Parameters are positional, so no need to keep names in multiple locations synchronized.
  3. In Java there is no such thing as disposing of an object. If an object maintains resources that must be released, it is closed, which is what the try-with-resources block does. There is only one abstraction for releasing resources, not two. (Distinct close versus dispose.)
  4. Transactions are part of the state of a connection, not a distinct entity. It is not possible to have a statement outside of a transaction, or to have multiple open transactions associated with a connection. A transaction begins when the previous one ends.

The Java version is just so much cleaner, conceptually, than the C# version that it boggles my mind anybody could see the C# code as superior in any way to the Java code. I know many people do, and many of them are very smart and experienced people. I just cannot see the world in the way that they do. I really do not mean to belittle or denigrate those people. I am simply befuddled.

Tags: , ,

May. 6th, 2011

08:05 pm - Gird your loins, we're going technical.

Gird your loins, we’re going technical.Collapse )

Sometimes I think the true mark of intelligence (or, perhaps, wisdom) is knowing when to avoid complicated designs and requirements, and stick with elegant simplicity.

Tags: , , ,
Current Mood: accomplishedaccomplished

Jan. 3rd, 2011

07:45 pm

Sample code I encountered in a professional setting.Collapse ) Why this code is bad.Collapse )

To summarize, use the standard libraries! They are better than your code, even if you have thirty years of experience.

Aug. 5th, 2007

11:46 pm - It feels good!

It feels good to turn all the unit tests green. I have successfully migrated another class to a clean and sane structure, and eliminated all the references to the custom pooling and threading and synchronization. It was difficult to figure out why that one test was failing, until I finally figured out that the framework code was going through several layers of calls to superclass code that invoked callbacks in the subclass, up and down, back and forth, jumping all over. Once I tracked it all down and read through it, I realized it was doing something quite simple, and I just implemented it directly in a fraction of the code and no subroutine calls. The call() method is a bit long now, but it is verbosely documented and all the code is clear-cut procedural programming and one-purpose for loops.

This whole project is going to take forever. The codebase is organized as a bunch of “Managers” that are run for each public API, which in turn invoke the individual plugins that do all the calculating. I’m going through the Managers one by one, and converting each individually. They don’t really share much code, so that’s the only way to do it. And each one involves hours of head-scratching to figure out the obscure unintuitive (and completely undocumented) behavior. One a day is pretty good.

Current Mood: pleasedpleased

Aug. 3rd, 2007

01:40 am

In brighter news, I completed the first step in tearing apart and rebuilding our service framework. This week I checked in the first sections of code to be rewritten to invoke plugins directly and execute them by passing them to the java.util.concurrent.ExecutorService functionality. Given the horrendous performance we see under load and the highly erratic latencies that vaguely correspond to the latencies of our synchronized blocks, I fervently hope and strongly believe that my work will result in drastic performance improvements.

I mean, really now, who writes their own custom thread pooling implementation in Java?

Current Mood: accomplishedaccomplished

Feb. 16th, 2006

08:38 pm - The dangers of cloning.

Cloning is a commonly known and selectively used tool for a variety of purposes. Most people believe it is simple and straightforward. But it is a sensitive issue that few truly understand, far fewer than engage in it.Collapse ) In summary, cloning is an activity that should only be entered into with careful thought, reflection, and a thorough understanding of the subtle issues involved.

Current Mood: geekygeeky

12:57 am - A teasing tidbit about my job.

So, did you ever notice that every software engineer on the planet starts 80% of his or her sentences with the word, “so”? It is the “like” of the professional geek, the verbal pause, the softener that allows them to speak as though continuing a previous thought from a moment ago, be that moment yesterday or last week, rather than commit the heinous crime of actually jarring the world by smashing the comfortable silent separation of individuals with unsolicited conversation. Nevermind that that conversation could be critical to doing their job, the rift between people cannot be broken so blatantly, it must be cushioned with a drawled “sooooooo”, be it in an Indian accent, Chinese accent, Southern accent, Canadian accent (eh?), or the clipped Northwestern accent of a true Seattle native. (Unfortunately for me, I speak with the pure dulcet tones of the Midwesterner, the invisible accent taught media personalities for its correctness and easy comprehensibility. At least I learned to mumble like my father before me.)

Anyway, the point of this post was to brag for a moment before going to sleep. (Monty has already passed out at my feet waiting for me.) I hear so much about the horrors of coding companies, the inane policies and restrictive hurdles, the blockades to actually improving the codebase upon which people work. In my job, I just switched from working on the legacy C++ codebase to the new Java engine. In doing so, I had to read our coding standards and guidelines. Near the top of the list is this sweet nectar, “Refactor mercilessly.” I can just hear the programmers out there collectively gasping in interleaved astonishment, pleasure, and seething envy. But how does this play out in practice? Back last month when I was writing C++ code, I would take breaks from the task of the moment to go through files and switch variable initializations to the constructor syntax, reformat braces to the standardized layout, and add const consistency to object APIs. I spent a day just going through a deserialization system and removing all hard-coded data types from the code, replacing them with the appropriate typedefed names to ensure future consistency in the face of changing header files. In the process I removed several lingering bugs caused by type mismatches and eliminated all explicit tying of generic types to concrete machine representations in our routines. (This is what I do to relax at work.)

Ahh, but I composed this post to brag about my current responsibilities, not my past ones! We are in the process of migrating and rewriting our engine in Java instead of C++, and simultaneously decoupling it from the overall system and pulling it out into its own service we can control and deploy at will. This began (for simplicity) with a 1-to-1 port of our C++ engine into Java. It worked fairly well, partially because we were already using smart pointers (some auto_ptr, but more often a reference-counting templatized pointer class developed in-house) and polymorphic object hierarchies. The heavy use of templates in some places was a little troublesome, but we are making good progress. There are some downsides to beginning with a 1-to-1 port and refactoring afterwards, such as when I encounter an abstract class that should be an interface (silly multiple-inheritance versus interfaces!) or when I find an entire hierarchy of objects and methods for doing deep copies that merely mimics the Cloneable mechanism in Java. It’ll take time to smoke out all such issues, but in the meantime I can take great pleasure in mercilessly hunting down all instances of a copy() method inherited from an ICopyable abstract class and replacing them with implementations of clone() that are typically about 20% the size. I can even replace the convoluted list– and map–copying helper routines with two lines of Java, which makes me appreciate the new Java 5.0 for loop and the Java 1.0 clone() method all the more. Deep copies are so easy when most members are immutable, and a little thought makes the mutable ones obvious. Soon that whole ICopyable class hierarchy will vanish, leaving one fewer of those abominable I–prefixed interface names!

But in all this, one thought perpetually pesters me. Why the hell did Sun not include an Immutable interface or flag or even annotation in Java? Having a formalized way of declaring a class to be immutable would help immensely in so many situations, not least of which is the ability to write a single clone() method in Object that could function through reflection to do deep copying on any unknown descendant class. It would be so straightforward, too.

Current Mood: contentcontent

Image