silnith, posts by tag: code - LiveJournal — LiveJournal

Jan. 11th, 2024

11:52 am - API Design

I would like to analyze what makes a programming API good versus bad. I feel like a lot of new APIs lack wisdom and foresight. A good programming API should guide the programmer away from making mistakes and towards using the API correctly and efficiently. But it seems that a lot of APIs are designed by people who only think of how they personally would use the API, while neglecting to consider how others may perceive it. Towards this end I have spent the past couple months getting back into my graduate school studies of computer graphics, in order to compare an API I consider to be one of the finest ever created, namely OpenGL 1.x, against an API I consider to be truly terrible, namely OpenGL 3.x.

I have implemented a simple screensaver using different versions of the OpenGL API. It consists of a bunch of solid-color squares that spin around an invisible axis. The point is not to make anything fancy, it is simply to exercise a theoretical 3D graphics pipeline in its purest form.

First, a few ground rules. The rendered output for all versions is absolutely identical. There is no variation in where or how computations are performed except where forced by fundamental changes in the OpenGL API itself. The most important consequence of this is that I purposely ignore the pervasive online advice and tutorials that all urge programmers to perform vertex transformations on the CPU; the original version 1.0 of the OpenGL API was designed to facilitate handling all vertex transformations by any available hardware, so I have maintained that discipline when rewriting my program to work on the newer versions of the API.

Second, because the original version 1.0 of the OpenGL API included a highly versatile "call list" feature that allowed pre-compiling sequences of state changes, transformations, and rendering commands and storing the result directly in the "server" high-speed graphics memory, I have used the transform feedback mechanism to emulate what a high-quality implementation of the original API would have been able to achieve. Not all implementations provided such sophisticated support, but the API was designed with the intention that such offerings could and would appear and so my rewrite stays consistent with that capability.

Finally, I have used the OpenGL API as it was designed to be used, not as it is most popularly used. The most prominent example of this is how in the 3.2 version I explicitly query the uniform offsets for the uniform block defined in the shaders, rather than specify the compatibility layout qualifier. The intent of allowing the OpenGL implementation to compile shaders rather than be forced to accept a pre-defined intermediate form is so that any OpenGL implementation is free to use whatever optimizations it prefers, including unique memory layout and padding. Therefore my program makes no assumptions about any of these details, it simply complies with whatever the OpenGL implementation provides. This applies to vertex attribute locations, uniform locations, and uniform block offsets. A much less significant example is how I allocate my call lists in the 1.1 version of the program, even though according to the specification it is not strictly necessary.

Here are the two versions of the screensaver, with all extraneous code removed to leave only the direct uses of OpenGL itself. (The GitHub repository contains the full source code.)

OpenGL 1.1 version

Global initialization.

glGetString(GL_VERSION);
glEnable(GL_DEPTH_TEST);
glPolygonOffset(-0.5, -2);
glEnable(GL_POLYGON_OFFSET_LINE);
glLoadIdentity();
gluLookAt(0, 50, 50,
    0, 0, 13,
    0, 0, 1);
wingDisplayList = glGenLists(1);
glNewList(wingDisplayList, GL_COMPILE);
glBegin(GL_QUADS);
glVertex2f(1, 1);
glVertex2f(-1, 1);
glVertex2f(-1, -1);
glVertex2f(1, -1);
glEnd();
glEndList();
rotatingWingDisplayLists = glGenLists(numWings);

// Executed when the window changes size.
glViewport(0, 0, width, height);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(-26.6, 26.6, -20, 20, 35, 105);
glMatrixMode(GL_MODELVIEW);

Per-frame changes.

glNewList(displayList, GL_COMPILE);
glPushMatrix();
glRotatef(angle, 0, 0, 1);
glTranslatef(radius, 0, 0);
glRotatef(-yaw, 0, 0, 1);
glRotatef(-pitch, 0, 1, 0);
glRotatef(roll, 1, 0, 0);
glCallList(wingDisplayList);
glPopMatrix();
glEndList();

Drawing code.

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
glPushMatrix();
for (auto wing : wings) {
    glTranslatef(0, 0, deltaZ);
    glRotatef(deltaAngle, 0, 0, 1);
    glColor3f(red, green, blue);
    glCallList(displayList);
}
glPopMatrix();
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
glPushMatrix();
for (auto wing : wings) {
    glTranslatef(0, 0, deltaZ);
    glRotatef(deltaAngle, 0, 0, 1);
    glColor3f(red, green, blue);
    glCallList(displayList);
}
glPopMatrix();
glFlush();

Analysis

I believe the most prominent feature of this code is how virtually every API call is a direct expression of a concept that is well-known to people who have studied 3D graphics. All graphics programmers understand what a depth test is, because it is an intrinsic part of a 3D pipeline. No matter what programming API or hardware one decides to use, the 3D graphics code will still make use of linear algebra to accomplish rotations and translations expressed as matrices. The data to render will always be represented as polygons specified by vertices. Really the only concept that might be foreign to some would be the concept of a "call list", but even that is a simple abstraction of "achieve the same result as if this sequence of other API calls had been made." (Some make the mistake of assuming that means the implementation must issue an exact duplicate of the sequences of API calls, but the OpenGL implementation is only required to produce the same result as if that sequence had been called. Hence why creating a call list uses the term "compilation".)

Some may look at the drawing code and complain that all of the transformations are "slow" because they change the state of the modelview matrix. This would be a misunderstanding of the concept of a 3D graphics pipeline. The 3D pipeline was so-named because state changes like vertex colors and modelview transformations are designed to be pipelined. The reason that the drawing code iterates the list of wings twice is that changing the rendering mode from polygon fill to outline and back is much more likely to trigger a pipeline flush than simply modifying the transformation matrices, because that replaced one rasterization algorithm with another. At least, that is how 3D hardware worked before it was all scrapped and replaced with generic stream processors. Which leads us to the next version of the code.

OpenGL 3.2 version

Global initialization.

GLfloat const quadVertices[16]{ 1,1,-1,1,-1,-1,1,-1, };
GLuint const quadIndices[4]{ 0,1,2,3 };
GLfloat const identity[16]{ 1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1, };

glGetIntegerv(GL_MAJOR_VERSION, &glMajorVersion);
glGetIntegerv(GL_MINOR_VERSION, &glMinorVersion);
glEnable(GL_DEPTH_TEST);
glPolygonOffset(0.5, 2);
glEnable(GL_POLYGON_OFFSET_FILL);
glGenBuffers(3 + 3 * numWings, bufferNames);
glGenVertexArrays(2, vertexArrayNames);
glBindBuffer(GL_ARRAY_BUFFER, originalVertexBuffer);
glBufferData(GL_ARRAY_BUFFER, sizeof(quadVertices), quadVertices, GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, wingIndexBuffer);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(quadIndices), quadIndices, GL_STATIC_DRAW);
GLuint feedbackVertexShader = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(feedbackVertexShader, 1, source, nullptr);
glCompileShader(feedbackVertexShader);
glGetShaderiv(feedbackVertexShader, GL_INFO_LOG_LENGTH, &logSize);
glGetShaderInfoLog(feedbackVertexShader, logSize, nullptr, log);
glGetShaderiv(feedbackVertexShader, GL_COMPILE_STATUS, &compilationSuccess);
GLuint feedbackProgram = glCreateProgram();
GLchar const* varyings[3]{ "gl_Position","varyingWingColor","varyingEdgeColor", };
glTransformFeedbackVaryings(feedbackProgram, 3, varyings, GL_SEPARATE_ATTRIBS);
glAttachShader(feedbackProgram, feedbackVertexShader);
glLinkProgram(feedbackProgram);
glDetachShader(feedbackProgram, feedbackVertexShader);
glGetProgramiv(feedbackProgram, GL_INFO_LOG_LENGTH, &logSize);
glGetProgramInfoLog(feedbackProgram, logSize, nullptr, log);
glGetProgramiv(feedbackProgram, GL_LINK_STATUS, &linkSuccess);
glDeleteShader(feedbackVertexShader);
glBindVertexArray(wingTransformVertexArray);
vertexAttribLocation = glGetAttribLocation(feedbackProgram, "vertex");
glEnableVertexAttribArray(vertexAttribLocation);
GLuint renderVertexShader = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(renderVertexShader, 1, source, nullptr);
glCompileShader(renderVertexShader);
glGetShaderiv(renderVertexShader, GL_INFO_LOG_LENGTH, &logSize);
glGetShaderInfoLog(renderVertexShader, logSize, nullptr, log);
glGetShaderiv(renderVertexShader, GL_COMPILE_STATUS, &compilationSuccess);
GLuint renderFragmentShader = glCreateShader(GL_FRAGMENT_SHADER);
glShaderSource(renderFragmentShader, 1, source, nullptr);
glCompileShader(renderFragmentShader);
glGetShaderiv(renderFragmentShader, GL_INFO_LOG_LENGTH, &logSize);
glGetShaderInfoLog(renderFragmentShader, logSize, nullptr, log);
glGetShaderiv(renderFragmentShader, GL_COMPILE_STATUS, &compilationSuccess);
GLuint renderProgram = glCreateProgram();
glBindFragDataLocation(renderProgram, 0, "fragmentColor");
glAttachShader(renderProgram, renderVertexShader);
glAttachShader(renderProgram, renderFragmentShader);
glLinkProgram(renderProgram);
glDetachShader(renderProgram, renderFragmentShader);
glDetachShader(renderProgram, renderVertexShader);
glGetProgramiv(renderProgram, GL_INFO_LOG_LENGTH, &logSize);
glGetProgramInfoLog(renderProgram, logSize, nullptr, log);
glGetProgramiv(renderProgram, GL_LINK_STATUS, &linkSuccess);
glDeleteShader(renderVertexShader);
glDeleteShader(renderFragmentShader);
glBindVertexArray(renderVertexArray);
vertexAttribLocation = glGetAttribLocation(renderProgram, "vertex");
colorAttribLocation = glGetAttribLocation(renderProgram, "color");
glEnableVertexAttribArray(vertexAttribLocation);
glEnableVertexAttribArray(colorAttribLocation);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, wingIndexBuffer);
GLuint blockIndex = glGetUniformBlockIndex(renderProgram, "ModelViewProjection");
glUniformBlockBinding(renderProgram, blockIndex, modelViewProjectionBindingIndex);
GLchar const* names[3]{ "model", "view", "projection" };
glGetUniformIndices(renderProgram, 3, names, uniformIndices);
glGetActiveUniformsiv(renderProgram, 3, uniformIndices, GL_UNIFORM_OFFSET, uniformOffsets);
glGetActiveUniformBlockiv(renderProgram, blockIndex, GL_UNIFORM_BLOCK_DATA_SIZE, &modelViewProjectionUniformDataSize);
glBindBufferBase(GL_UNIFORM_BUFFER, modelViewProjectionBindingIndex, modelViewProjectionUniformBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, modelViewProjectionUniformBuffer);
glBufferData(GL_UNIFORM_BUFFER, modelViewProjectionUniformDataSize, nullptr, GL_STATIC_DRAW);
glBufferSubData(GL_UNIFORM_BUFFER, uniformOffsets[0], sizeof(identity), identity);
GLfloat const view[16]{ /* implement your own gluLookAt */ };
glBufferSubData(GL_UNIFORM_BUFFER, uniformOffsets[1], sizeof(view), view);
for (auto wing : wings) {
    glBindBuffer(GL_ARRAY_BUFFER, wingVertexBuffer);
    glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * 4 * numVertices, nullptr, GL_STREAM_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, wingColorBuffer);
    glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * 3 * numVertices, nullptr, GL_STREAM_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, wingEdgeColorBuffer);
    glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * 3 * numVertices, nullptr, GL_STREAM_DRAW);
}

// Executed when the window changes size.
glViewport(0, 0, width, height);
GLfloat const projection[16]{ /* implement your own glOrtho */ };
glBindBuffer(GL_UNIFORM_BUFFER, modelViewProjectionUniformBuffer);
glBufferSubData(GL_UNIFORM_BUFFER, uniformOffsets[2], sizeof(projection), projection);

Per-frame vertex shader for transform feedback.

#version 150

uniform vec2 radiusAngle;
uniform vec3 rollPitchYaw;
uniform vec3 color;
uniform vec3 edgeColor = vec3(1, 1, 1);

in vec4 vertex;

smooth out vec3 varyingWingColor;
smooth out vec3 varyingEdgeColor;

mat4 rotate(in float angle, in vec3 axis) {
    float c = cos(radians(angle));
    float s = sin(radians(angle));
    mat3 initial = outerProduct(axis, axis)
                   * (1 - c);
    mat3 c_part = mat3(c);
    mat3 s_part = mat3(0, axis.z, -axis.y,
                       -axis.z, 0, axis.x,
                       axis.y, -axis.x, 0)
                  * s;
    mat3 temp = initial + c_part + s_part;
    mat4 rotation = mat4(1.0);
    rotation[0].xyz = temp[0];
    rotation[1].xyz = temp[1];
    rotation[2].xyz = temp[2];
    return rotation;
}

mat4 translate(in vec3 move) {
    mat4 trans = mat4(1.0);
    trans[3].xyz = move;
    return trans;
}

const vec3 xAxis = vec3(1, 0, 0);
const vec3 yAxis = vec3(0, 1, 0);
const vec3 zAxis = vec3(0, 0, 1);

void main() {
    float radius = radiusAngle[0];
    float angle = radiusAngle[1];
    float roll = rollPitchYaw[0];
    float pitch = rollPitchYaw[1];
    float yaw = rollPitchYaw[2];

    varyingWingColor = color;
    varyingEdgeColor = edgeColor;

    mat4 wingTransformation = rotate(angle, zAxis)
                              * translate(vec3(radius, 0, 0))
                              * rotate(-yaw, zAxis)
                              * rotate(-pitch, yAxis)
                              * rotate(roll, xAxis);
    gl_Position = wingTransformation * vertex;
}

Per-frame changes.

GLint radiusAngleUniformLocation = glGetUniformLocation(transformProgram, "radiusAngle");
GLint rollPitchYawUniformLocation = glGetUniformLocation(transformProgram, "rollPitchYaw");
GLint colorUniformLocation = glGetUniformLocation(transformProgram, "color");
GLint edgeColorUniformLocation = glGetUniformLocation(transformProgram, "edgeColor");
GLuint vertexAttributeLocation = glGetAttribLocation(transformProgram, "vertex");

glUseProgram(transformProgram);
glUniform2f(radiusAngleUniformLocation, radius, angle);
glUniform3f(rollPitchYawUniformLocation, roll, pitch, yaw);
glUniform3f(colorUniformLocation, red, green, blue);
glUniform3f(edgeColorUniformLocation, edgeRed, edgeGreen, edgeBlue);
glBindVertexArray(wingTransformVertexArray);
glBindBuffer(GL_ARRAY_BUFFER, originalVertexBuffer);
glVertexAttribPointer(vertexAttributeLocation, 2, GL_FLOAT, GL_FALSE, 0, 0);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, wingVertexBuffer);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, wingColorBuffer);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 2, wingEdgeColorBuffer);
glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, 0, numVertices);
glEndTransformFeedback();

Drawing vertex shader.

#version 150

uniform ModelViewProjection {
    mat4 model;
    mat4 view;
    mat4 projection;
};

uniform vec2 deltaZ = vec2(15, 0.5);

in vec4 vertex;
in vec4 color;

smooth out vec4 varyingColor;

mat4 rotate(in float angle, in vec3 axis) {
    float c = cos(radians(angle));
    float s = sin(radians(angle));
    mat3 initial = outerProduct(axis, axis)
                   * (1 - c);
    mat3 c_part = mat3(c);
    mat3 s_part = mat3(0, axis.z, -axis.y,
                       -axis.z, 0, axis.x,
                       axis.y, -axis.x, 0)
                  * s;
    mat3 temp = initial + c_part + s_part;
    mat4 rotation = mat4(1.0);
    rotation[0].xyz = temp[0];
    rotation[1].xyz = temp[1];
    rotation[2].xyz = temp[2];
    return rotation;
}

mat4 translate(in vec3 move) {
    mat4 trans = mat4(1.0);
    trans[3].xyz = move;
    return trans;
}

const vec3 xAxis = vec3(1, 0, 0);
const vec3 yAxis = vec3(0, 1, 0);
const vec3 zAxis = vec3(0, 0, 1);

void main() {
    float deltaAngle = deltaZ[0];
    float deltaZ = deltaZ[1];

    mat4 modelViewProjection = projection * view * model;

    varyingColor = color;
    gl_Position = modelViewProjection
                  * translate(vec3(0, 0, deltaZ))
                  * rotate(deltaAngle, zAxis)
                  * vertex;
}

Drawing fragment shader.

#version 150

smooth in vec4 varyingColor;

out vec4 fragmentColor;

void main() {
    fragmentColor = varyingColor;
}

Drawing code.

GLint deltaZUniformLocation = glGetUniformLocation(renderProgram, "deltaZ");
GLuint vertexAttribLocation = glGetAttribLocation(renderProgram, "vertex");
GLuint colorAttribLocation = glGetAttribLocation(renderProgram, "color");

glUseProgram(renderProgram);
glBindVertexArray(renderVertexArray);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
for (auto wing : wings) {
    glUniform2f(deltaZUniformLocation, deltaAngle, deltaZ);
    glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer);
    glVertexAttribPointer(vertexAttribLocation, 4, GL_FLOAT, GL_FALSE, 0, 0);
    glBindBuffer(GL_ARRAY_BUFFER, edgeColorBuffer);
    glVertexAttribPointer(colorAttribLocation, 3, GL_FLOAT, GL_FALSE, 0, 0);
    glDrawElements(GL_LINE_LOOP, numIndices, GL_UNSIGNED_INT, 0);
    glBindBuffer(GL_ARRAY_BUFFER, colorBuffer);
    glVertexAttribPointer(colorAttribLocation, 3, GL_FLOAT, GL_FALSE, 0, 0);
    glDrawElements(GL_TRIANGLE_FAN, numIndices, GL_UNSIGNED_INT, 0);
}
glFlush();

Analysis

What strikes me the most about this code is how little of it actually has to do with rendering 3D graphics. Instead, it is almost entirely devoted to allocating and managing memory buffers. What little remains in it of actual 3D graphics and linear algebra is mostly places where standard functionality that used to be provided by the API is instead forced back onto the programmer. All of the matrix transformations must be re-implemented in every program instead of one single, shared, optimized implementation being provided. And as I had previously indicated, this has resulted in the overwhelming majority of educational material repeating the terrible mistake of moving these calculations away from the specialized hardware designed for efficient matrix computations and instead dumping them back on the general-purpose CPU with its slow shared memory.

But in my mind this is not actually the primary sin of version 3.2 of the OpenGL API. No, that sin is the sheer number of opportunities that the "updated" version of the API provides for the programmer to make terrible mistakes. As I pointed out, the majority of the API calls in this "newer" version of the API are for managing memory buffers. That means we now have to deal with all of the potential problems and pitfalls inherent in memory management. We have uninitialized memory, we have buffer overruns, we have data format mismatches, and we have to deal with all of these problems in an environment where we (by design) cannot even guarantee we can view the memory we are managing in order to debug our mistakes! Meanwhile, back in OpenGL 1.0 land, the API rigidly constrained the structure and format of data that we could pass in. When writing the OpenGL 1.1 version of the screensaver, the worst problem I had to deal with was polygons appearing in the wrong position on screen or possibly not at all. When writing the OpenGL 3.2 version, I had nasty crashes where the video driver would segfault, not even giving me a stack trace into my own program.

The other thing that strikes me about the 3.2 version is that, in exchange for all of the massive amounts of initialization done at the start, I had expected that the per-frame code and drawing code would be significantly shorter than the 1.1 version. But upon review, they really are not much shorter at all, and looking at the actual calls it really feels like they should be more expensive to execute, rather than less.

Originally I had intended to go through each API call in both versions, one by one, and identify exactly what mistakes each call allowed the programmer to make as well as explain the significance and potential damage. But now that I see the full listing, I find myself overwhelmed by just how many hundreds of potential catastrophes I would need to explain in the OpenGL 3.2 version of the program. For those curious, I also have a version of the program that uses OpenGL 4.1, but it solves virtually none of the problems over the 3.2 version so I omitted it. I also implemented versions for OpenGL 2.1 and OpenGL 1.5, but everything interesting about those was overshadowed by the 3.2 version.

Tags: c, code, coding, opengl, video cards

(Leave a comment)

May. 19th, 2020

07:56 pm - Software Developer Job Interviews: Palindromes

Are you a software developer? Have you interviewed for a job and been given a problem such as, Write code to identify if a string is a palindrome?

You did it wrong.

But I’m quite smart! I know my code was correct!

You did it wrong.

But you have never seen my code!

You did it wrong. Everybody does it wrong. In my career I have not met even a single person that I am confident would get it right. I know several that might get it right, but I would not bet my life on it. Frankly, I’m not even convinced that I can do it correctly. But here we go.

You likely got the palindrome part of the question right. Here is a simple implementation in Java.


    public boolean isPalindrome(final List<CharSequence> characters) {
        final ListIterator<CharSequence> forward = characters.listIterator();
        final ListIterator<CharSequence> backward = characters.listIterator(characters.size());
        
        while (forward.nextIndex() < backward.previousIndex()) {
            final CharSequence next = forward.next();
            final CharSequence prev = backward.previous();
            
            if ( !next.equals(prev)) {
                return false;
            }
        }
        
        return true;
    }

Simple enough. Where you messed up is the part where you converted the string into a list of characters. The lazy or naïve simply treated each (in Java) 16-bit character as a character of the string. The clever may have handled surrogate pairs correctly. But how many considered the problem of combining characters? They are not surrogate pairs, they are distinct code points that follow another code point. In the word naïve used earlier, the ï character can be written using code point U+00EF, but it can also be written as a normal Latin i (U+0069) followed by U+0308.

Suddenly this topic got an order of magnitude more complex. This also brings me to my thesis, which is that most software developers think they are far more competent than they really are. They mistake intelligence for understanding, and then they think they can do things like write software to drive a car, or recommend sentences for criminal offenders, or identify copyright infringement. The software we write has real-world consequences, and it can make people suffer simply by a developer being ignorant of their ignorance.

Here is some code that is much closer to correct, though as indicated earlier there are likely many cases where it will fail.


    public List<CharSequence> breakIntoCharacters(final CharSequence text, final Locale locale) {
        final String normalized = Normalizer.normalize(text, Normalizer.Form.NFC);
        
        final BreakIterator characterBreaker = BreakIterator.getCharacterInstance(locale);
        characterBreaker.setText(normalized);
        
        final List<CharSequence> list = new ArrayList<>();
        int start = characterBreaker.first();
        int end = characterBreaker.next();
        while (end != BreakIterator.DONE) {
            final CharSequence subSequence = normalized.subSequence(start, end);
            list.add(subSequence);
            start = end;
            end = characterBreaker.next();
        }
        
        return list;
    }

By using the extensive Java facilities related to language support, iterating through the string to identify individual characters is done on our behalf in a way that can take language-specific logic into account. This code will correctly handle Unicode combining characters, though it still cannot deal with more complex combining that happens in some languages.

Footnote: terse explanations of the items listed above

drive a car: Driving in a city is far more of a social problem than a technical or physics problem. Proper driving involves watching other drivers to identify their intentions and anticipate their actions. I cannot count the number of potential collisions I have avoided because I was watching other drivers’ heads and eyes and hands, and took actions others would consider unusual or unnecessary because of what I saw was about to happen.
recommend sentences for criminal offenders: Negligent people simply applied machine learning without understanding what the underlying algorithms actually do. They extrapolate existing data, which means any existing biases or mistakes get perpetuated and normalized. Black people have suffered tremendously because of this.
identify copyright infringement: No algorithm can correctly identify copyright infringement for a variety of reasons, not least of which is that there are copious exceptions where copying something is perfectly legal and even expected. If courts cannot even agree in many situations, how does a software developer think he can create an algorithm that will work? And critically, there are many cases where ownership information is not stored correctly (or at all) causing copyright owners to lose their own works to people with no legal right to it.

Tags: clear vision, code, coding, computers

(Leave a comment)

May. 14th, 2020

01:20 pm - Criticisms of C# and .NET: Dynamic Linking

Code is often distributed in reusable pieces called libraries. In C#, libraries are Windows .dll files, "Dynamic Link Library". (If you compile and run C# code on Linux or MacOS, it will still produce and consume .dll files. Yeah, this was designed to be cross-platform. Sure it was.) Linking is the connecting of function calls in an executable to symbols in a library. In static linking, this is done at compile time and the code being called is fixed. In dymanic linking, this is deferred until the code is executed, so if a library is updated then code that links against it will automatically use the updated version for future executions.

In .NET Framework, the original runtime + standard libraries for C#, when a program is compiled, the compiler looks at the versions of all the libraries that the program uses, computes a cryptographic hash of each one, and embeds that information into the program. Then, when the program is run, it checks each library being loaded to make sure it exactly matches what the program was compiled against originally. If a library does not match exactly, the runtime aborts loading the library and in most cases the program stops running.

Look at the previous two paragraphs. In .NET Framework, libraries are distributed as "dynamic link libraries", but the runtime forces the library to exactly match what it was compiled against. This is the very definition of static linking, but it is misnamed as dynamic linking. When I first learned how .NET Framework worked, I was so stunned that I could do nothing but cover my face with my palm.

But it gets better! Developers immediately and correctly complained that this mechanism prevented them from updating the libraries used by an application. Microsoft's response was to invent a mechanism called "binding redirects". Binding redirects are written as entries in an XML file distributed alongside the executable. Inside the XML file, there are entries matching the signature that the .NET Framework linker expects to find for any particular library. This entry then specifies a new version number for the library that the .NET Framework linker should load instead of the one it was compiled against.

Read that again. .NET Framework uses static linking while calling it dynamic linking, and when people complained that that was not dynamic linking, it added an XML-configured replacement step in the static linking process. I cannot fathom words sufficient to express just how idiotic and backwards this whole process is.

By some miracle, when Microsoft undertook the effort to completely rewrite .NET, they did the sane thing and simply used dynamic linking as it has existed in the industry for longer than I have been alive. This rewrite was named .NET Core, and it is an immeasurably vast improvement over .NET Framework.

In Java, libraries are distributed as .jar files, which is also how applications are distributed. Libraries are discovered by the JVM scanning filesystem directories specified in an environment variable called CLASSPATH, very similar to how native executables locate native libraries on unix-style systems. In order to update a library, simply replace a .jar file with a newer version. Rational developers use a dependency management system such as Maven to manage their CLASSPATH automatically.

Tags: code, coding, csharp, java

(Leave a comment)

12:21 pm - Criticisms of C# and .NET: Enumerations

For a while now, people have asked me why I do not want to work in .NET anymore. The list is long, and it's irritating having to pick out individual complaints to explain. Therefore, I plan to write a series of articles explaining exactly why C# and .NET are so inferior to Java, in excruciating technical detail.

For the first, let me explain the difference between enumeration types in the two languages.

C# enumerations are based on C++ enumerations designed to be compatible with C. C++ enumerations are essentially an alias for a numeric primitive type such as int. Declaring an enum type consists of listing a sequence of constant names and optionally assigning a numeric value to each one. Any not assigned a numeric value take the value of the previous plus one, and the first takes the value of zero. However, there are no restrictions on the values assigned to any particular enumeration constant, so skipping values and duplicating values are perfectly legal and an often-used feature. In particular, if a library author wants to update the names of enumeration constants, new constants can be created and given the exact same values as the deprecated constants so compiled code will be identical. Another heavily-used example is the bitflag enumeration, where constants are assigned values that are powers of two so that they can be combined using logical operators. This is used for scenarios such as passing multiple options to a function in a single numeric primitive, saving stack and register space.

Java enumerations are fully-fledged objects. They can provide methods, implement interfaces, make use of polymorphism, and do anything else normal objects can do with one exception; they are considered final, so they cannot be subclassed or inherited. In addition, the JVM generates an ordinal value for every declared enumeration constant, starting at zero and increasing by one for each declared name. These ordinal values are fixed once the class is compiled [footnote 1], they cannot be reassigned or modified. They are guaranteed to be unique successive integer values starting with zero and ending at one less than the number of enumeration constants defined. If a new version of a library changes the ordering of the enumeration constants, the ordinal values will also be changed and the new class will not be compatible with the old class.

Old-school C programmers will instantly dislike Java enumerations for a variety of reasons. They are full objects instead of primitives, which takes more memory. They are passed by reference instead of value. It is not possible to combine them using bitwise operators. Method invocations go through the virtual table. Getting an ordinal value requires calling a method. Therefore, a lot of C and C++ programmers prefer C# enumerations over Java enumerations. But let's analyze these complaints in detail.

Full Objects

Java enumeration values are full objects instead of primitive numeric values. This may sound like a disadvantage, but in practice it really is not. The objects are created when the enum class is loaded by the classloader, and the objects remain static thereafter. All uses of enumeration constants in the code are just references to these static objects, so they are really just pointers once converted to machine code, and their size will be one native word for the architecture. On most machines this will be a 64-bit or 32-bit address, which is no less efficient than the numeric primitive that a C# enumeration constant would be. Modern architectures never load less than a native word at a time, so defining a type to be smaller than that does not make the resulting code any faster (and in some rare cases can actually make it slower). And since the constants refer to single instances, using enumerations causes no additional garbage and does not impact the garbage collector.

Bitwise Operations

C# enumerations can be combined using bitwise logical operators. Java enumerations cannot. On the surface this looks like an advantage for C#. But once you actually start using them, reality shapes up a little differently. In particular, modern development makes heavy use of standard libraries and standard data structures. C# code is filled with instances of Dictionary<Key, Value> implementing the interface IDictionary<Key, Value>. Java code is filled with instances of HashMap<K, V> implementing Map<K, V>. Good coders use the interfaces so that underlying types can be swapped out as needed. And in a truly epic fail, C# initially did not have any generics and then later provided reified generics, which meant the original interfaces were not compatible with the new generics forcing library authors to implement two different interfaces for each data structure. Because of this, all the standard data structures end up passing through the non-generic interface at some point, so any use of C# enumerations in standard data structures suffers the auto-boxing and auto-unboxing cost. [footnote 2] This makes the performance gains of using primitives wash away. But the problem is even more insidious than that. With C++ enumerations and templates, one could make use of template specialization to provide optimized data structures for types using enumerations. Since C# does not have a mechanism for "generic specialization", this is not possible in C#. Enterprising coders (including myself) have nevertheless tried to write specialized maps and lists for enumeration types, only to find that the C# language specification actually forbids using the Enum type as a bound for any generic declaration. [footnote 3] A few people have ignored that rule and tried anyway, and found that it worked on the current implementation of C#. However, due to the language restriction, there is no guarantee that it will continue to work on future versions of C#, something that Microsoft emphasizes strongly.

Contrast this with Java. In Java, there are types EnumMap<K, V> and EnumSet<T> provided in the standard library. These are implementations of the Map<K, V> and Set<T> interfaces specifically written for enumeration types. The EnumSet<T> class stores the set values by taking the ordinal values of the enumeration constants and converting them to bitflags that it combines into numeric primitives. For any enumeration with 64 or fewer constants, it simply stores everything in one long variable. For enumerations with more than 64 constants, it stores everything in an array of long values, so there is no practical limit on the number of enumeration constants. The EnumMap<K, V> is even simpler, it stores map entries in an array with length equal to the number of enumeration constants. Since all enumerations are guaranteed to have generated ordinal values in the range [0, size), the range check for the array access can be optimized out and lookups end up being one type check (which can also be optimized out in many scenarios) and one array access. Contrast this with a C# data structure containing enumeration constants, which must generate the hash code, deal with hash collisions, and be prepared for bucket table resizing. In all scenarios, the Java enumeration structures are faster than C# structures containing enumerations, take less memory, and provide extensive opportunities for low-level optimization of the machine code by the JIT compiler.

Now think back to the popular usage scenario for C and C++ enumeration types, bitwise flags. As it turns out, thanks to the EnumSet<T> type, this exact scenario is fully supported in Java code. And the code ends up being even more clear. Here is some C++ code using enumerations.

enum {
    flag1 = 1 << 0,
    flag2 = 1 << 1,
    flag3 = 1 << 2
} foo;

function_call("parameter", flag1 | flag2 | flag3);

The Java equivalent:

public enum Foo {
    Flag1,
    Flag2,
    Flag3;
}

obj.callMethod("parameter", EnumSet.of(Flag1, Flag2, Flag3));

When the code runs, both versions end up using a single numeric primitive to store the flags. But in the Java version, that container implements the Set<T> interface, and can be passed to any standard algorithm or existing code that accepts Set<T> or Collection<T>, and it will retain its optimal properties. Furthermore, the Java version does not need to worry about the 64-element limit (or 32-element limit on 32-bit architectures), nor does it need the programmer to studiously list out all the flag values allowing mistakes. And finally, checking for a specific value is far simpler:

C++:

foo my_foo;
if (my_foo & flag2 == flag2) {}
if (my_foo & (flag1 | flag3) == (flag1 | flag3)) {}

my_foo = my_foo & ~(flag2 | flag3);

Java:

Foo myFoo;
if (myFoo.contains(Flag2)) {}
if (myFoo.containsAll(EnumSet.of(Flag1, Flag3)) {}

myFoo.removeAll(EnumSet.of(Flag2, Flag3));

In the "remove all" case, the EnumSet<T> implementation will check if the parameter is also an EnumSet<T>, and if so it will use the bitwise logical operators to perform the set subtraction. Otherwise, it will iterate. The programmer can safely use the clearest code and not need to worry about the implementation details, and the generated code will still maintain its optimized behavior. Using the Java specialized types for enumerations always works correctly, never runs slower than not using them, and in all common cases is far more efficient. And at all points in the code standard interface types can be used without sacrificing anything.

Polymorphism

C# enumerations are simply primitive numeric types. There can be no added methods, no implemented interfaces, no polymorphism. Any additional behavior for the enumeration type must be provided by the users of that type, encouraging code copying and the breaking of encapsulation. Java enumerations are full objects and can provide methods, contain members, implement interfaces, expose static members, and more. The behavior of having non-contiguous values or duplicate values can be provided with a custom method, it is only the ordinal() method that cannot be modified. Classes are free to provide their own customOrdinal() method that returns anything at all. (But in practice virtually all uses of non-contiguous ordinals are better handled by the specialized data structures.)

And there is one feature that is quite powerful but a lot of people overlook. Java enumeration values can override methods per-constant.

public enum Foo {
    Value1,
    Value2 {
        @Override
        public String surpriseMe() {
            return "This is a surprise!";
        }
    },
    Value3;

    public String surpriseMe() {
        return "No.";
    }
}

Foo foo = ...;
System.out.println(foo.surpriseMe());

Calls to Foo.Value2.surpriseMe() will be polymorphic and return a different value than calls to Foo.Value1.surpriseMe(). Of course, this could also be achieved by declaring a String member that is initialized in the constructor, then passing different values to the constructors of different enumeration members. There is a lot of flexibility when using enumeration types in Java.

Overall the advantages of Java enumerations are so great that I actively look for opportunities to use them. In my LR(1) parser generator, I was careful to keep the type of terminal symbols unrestricted specifically because parser generators involve lots of set computations. The only restriction I placed was that the type implement a marker interface, and the caller could provide a custom factory for instances of sets of that type. Because of this, the code seamlessly handles both enumeration types and non-enumeration types, and providing a custom factory that generates instances of EnumSet<T> provides a noticeable improvement in performance with no code or logic changes. Forgetting to provide that custom factory results in it using standard HashSet<T> instances and everything works normally, just not as fast.

Footnotes

I believe the ordinal values are actually generated when the class is loaded, not when it is compiled. The distinction may seem trivial but it does come into play in cases like serialization of enumeration types.
This was true when I stepped through code using enumeration types in C# back when I (briefly) worked for Microsoft. It may have been fixed since then.
This was true when I read the specification back when I worked for Microsoft. It may not be true for more current versions of C#.

Tags: c, c++, code, coding, csharp, java

(Leave a comment)

Jul. 29th, 2018

07:51 pm - How to inject a database connection in an ASP.NET Core application.

This is a follow-up to my last post, the one about database connections in C# and .NET Core. I figured out a way to make using a database connection in C# a tad less of an anti-pattern and a tad more like how Java does it. In particular, I found a slick and simple way to encapsulate the connection string so that components using the database connection do not need to know or care what the connection string is.

namespace Foo
{
    class Startup
    {
        public IConfiguration Configuration { get; }

        public Startup(IConfiguration configuration)
        {
            Configuration = configuration;
        }

        public void ConfigureServices(IServiceCollection services)
        {
            services.AddSingleton<DbConnectionStringBuilder>(new SqlConnectionStringBuilder
                {
                    ConnectionString = Configuration.GetConnectionString("ImportantDatabase");
                    Password = Configuration.GetSection("SecurePasswords").GetValue<string>("ImportantDatabasePassword");
                });
             services.AddTransient<IDbConnection>(provider =>
                {
                    var builder = provider.GetRequiredService<DbConnectionStringBuilder>();
                    var providerFactory = DbProviderFactories.GetFactory("System.Data.SqlClient");
                    var conn = providerFactory.CreateConnection();
                    conn.ConnectionString = builder.ConnectionString;
                    return conn;
                });
        }
    }
}

Then in your consuming code, all you need to do is retrieve an IDbConnection from the injection system.

namespace Foo.Bar
{
    class DAL
    {
        private readonly IServiceProvider _provider;

        public DAL(IServiceProvider provider)
        {
            _provider = provider;
        }

        public void SomethingImportant()
        {
            using (var dbConnection = _provider.GetRequiredService<IDbConnection>())
            {
                dbConnection.Open();

                using (var dbCommand = dbConnection.CreateCommand())
                {
                    // do stuff
                }
            }
        }
    }
}

I apologize if I got any class names or method names wrong. I am writing this from memory, because the code I wrote is at work and I am at home.

I split out the database password because I firmly believe that database connection strings should be standard configuration, and that they should not contain credentials. Store the secret credentials separately. And obviously the provider invariant name ("System.Data.SqlClient") should be configuration instead of hard-coded. This is just example code to demonstrate the technique.

Tags: code, coding, csharp

(Leave a comment)

Jun. 8th, 2018

01:02 am - C# and Java

My new job is entirely working with C# on the Azure platform. After my brief stint at Microsoft, I deliberately avoided adding C# to my resume because I did not want to work in it again. A couple co-workers from my previous job convinced me to join them at Starbucks, and since I like them I decided to go for it and give C# another try.

The new codebase is wildly better than anything I had to deal with at Microsoft. It uses standard design patterns, is modular, has dependency injection and normal logging and unit tests. Overall it is a significantly better experience than the prior one.

However, even though I've been able to find C# analogues for most of the things I've come to rely on in the Java ecosystem, there is a constant stream of little frustrations and gotchas that make me appreciate all the things that Java provided for me.

Here is an example. Today I tried to figure out how to inject a database connection into a handler in a standard C#-idiomatic way. I was looking for something akin to JNDI in an application container, where I set up the database connection in external configuration and the code merely accepts and uses a generic connection interface.

After a few hours of searching online, I managed to cobble together enough tiny clues and analyze enough bad code samples to produce roughly this code (edited to remove company-specific details).


using System.Data;
using System.Data.Common;

namespace Org.Silnith
{
    public class DatabaseToucher
    {
        private readonly DbProviderFactory _dbProviderFactory;
        private readonly string _connectionString;
        
        public DatabaseToucher(DbProviderFactory dbProviderFactory, string connectionString)
        {
            _dbProviderFactory = dbProviderFactory;
            _connectionString = connectionString;
        }

        public async Task<int> Update(string transactionId)
        {
            using (var connection = _dbProviderFactory.CreateConnection())
            {
                if (connection == null)
                {
                    throw new Exception();
                }
                connection.ConnectionString = _connectionString;

                await connection.OpenAsync();
                try
                {
                    using (var command = connection.CreateCommand())
                    {
                        const string transactionIdParameterName = "@transactionId";
                        const string requestSentParameterName = "@requestSent";
                        var insertStatement = $"insert into transactions (transaction_id, request_sent) values ({transactionIdParameterName}, {requestSentParameterName})";

                        var transactionIdParameter = command.CreateParameter();
                        transactionIdParameter.ParameterName = transactionIdParameterName;

                        var requestSentParameter = command.CreateParameter();
                        requestSentParameter.ParameterName = requestSentParameterName;

                        command.CommandText = insertStatement;
                        command.Parameters.Add(transactionIdParameter);
                        command.Parameters.Add(requestSentParameter);

                        command.Prepare();

                        transactionIdParameter.Value = transactionId;
                        requestSentParameter.Value = false;

                        using (var transaction = connection.BeginTransaction(IsolationLevel.Serializable))
                        {
                            command.Transaction = transaction;

                            var rowsUpdated = await command.ExecuteNonQueryAsync();

                            transaction.Commit();

                            return rowsUpdated;
                        }
                    }
                }
                finally
                {
                    connection.Close();
                }
            }
        }

        public async Task<bool> Query(string transactionId)
        {
            using (var connection = _dbProviderFactory.CreateConnection())
            {
                if (connection == null)
                {
                    throw new Exception();
                }
                connection.ConnectionString = _connectionString;

                await connection.OpenAsync();
                try
                {
                    using (var command = connection.CreateCommand())
                    {
                        const string transactionIdParameterName = "@transactionId";
                        const string requestSentParameterName = "@requestSent";
                        var selectStatement = $"select count(*) from transactions where transaction_id = {transactionIdParameterName}";

                        var transactionIdParameter = command.CreateParameter();
                        transactionIdParameter.ParameterName = transactionIdParameterName;

                        command.CommandText = selectStatement;
                        command.Parameters.Add(transactionIdParameter);

                        command.Prepare();

                        transactionIdParameter.Value = transactionId;

                        using (var transaction = connection.BeginTransaction(IsolationLevel.Serializable))
                        {
                            command.Transaction = transaction;

                            var count = (int) await command.ExecuteScalarAsync();

                            transaction.Commit();

                            return count > 0;
                        }
                    }
                }
                finally
                {
                    connection.Close();
                }
            }
        }
    }
}

When my co-worker asked me what about the .NET database connection API was making me grumble, I took a deep breath and produced a rant a good thirty seconds long, cut off only because I started a coughing fit and could not continue.

For reference, this is the Java code that I wrote tonight that provides the same functionality.


package org.silnith.temp;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import javax.inject.Inject;
import javax.sql.DataSource;

public class DatabaseToucher {
    
    private final DataSource dataSource;
    
    @Inject
    public DatabaseToucher(final DataSource dataSource) {
        super();
        this.dataSource = dataSource;
    }

    public int insert(final String transactionId) throws SQLException {
        try (final Connection connection = dataSource.getConnection()) {
            connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
            connection.setAutoCommit(false);
            try (final PreparedStatement insertStatement = connection.prepareStatement("insert into transactions (transaction_id, request_sent) values (?, ?)")) {
                insertStatement.setString(1, transactionId);
                insertStatement.setBoolean(2, false);
                
                final int rowsUpdated = insertStatement.executeUpdate();
                
                connection.commit();
                
                return rowsUpdated;
            }
        }
    }
    
    public boolean query(final String transactionId) throws SQLException {
        try (final Connection connection = dataSource.getConnection()) {
            try (final PreparedStatement selectStatement = connection.prepareStatement("select count(*) from transactions where transaction_id = ?")) {
                selectStatement.setString(1, transactionId);
                
                try (final ResultSet resultSet = selectStatement.executeQuery()) {
                    while (resultSet.next()) {
                        int count = resultSet.getInt(1);
                        
                        return count > 0;
                    }
                    throw new IllegalStateException("Cannot reach this point.");
                }
            }
        }
    }
    
}

Note that I added extra lines to the Java version in an attempt to make it as functionally-equivalent to the C# version as possible. The original was actually shorter.

So why does the C# version bother me so much? Let me count the ways.

The DbProviderFactory object only provides a mechanism for instantiating database drivers. It does not encapsulate all the information needed to create a connection to a specific database. Therefore, you also need to inject the database connection string to every location where you want to use a connection. Why does every object need to know what the database connection string is? That's breaking the abstraction in a horrible way, and allowing a significant source of errors.
The DbProviderFactory.CreateConnection method can return null. So inside of a using block, you also need to check the variable to see if it is null and handle the failure.
I already created a connection, now I also have to set the connection string and then open the connection, too? I think of a database connection as a functional object, not a class that provides the functionality for connecting manually. I said give me a connection, not give me the means to build a connection. In Java, when I get a connection it is an open connection to the database, ready to execute queries and updates.
The idea of having named parameters always sounds nice in theory, but once you try it in practice you quickly discover that you have to repeat yourself constantly. I can either type the parameter names twice, once when specifying them and again when using them in the SQL statement, or I can use string interpolation to re-use a constant with the names in them. Reality does not come out as clean as theory predicted.
I create objects to represent the parameters in the statement. It sounds reasonable, but in practice it does not really add any additional descriptive power or improve the type safety at all. And why do I need to attach the parameters back onto the command from which I created them in the first place? I said, Have this command create a parameter, it is perfectly reasonable to expect the parameter object to be associated with the command when I receive it.
On a similar note, why do I have to use the connection to create a command, use the same connection to create a transaction, and then associate the two myself? A connection is inherently stateful, the whole point of having a persistent connection is so it can maintain state. If I can create a command and not associate it with the transaction, does that mean I can run multiple transactions on a single connection? That truly defies logic, you would be maintaining multiple states for a single stateful object (the connection). So if it makes no sense to have a command not associated with the open transaction, then why are they not pre-associated? It is additional work for the programmer and an opportunity for errors.
Oh, yay. Asynchronous method calls. So instead of the thread blocking and the operating system doing a context switch, instead the programming language runtime can emulate a context switch. You still need to save and restore the stack, and you still incur the same cache and buffer change costs. But now there are two context switching mechanisms instead of one, and the new mechanism breaks a lot of the assumptions that the operating system was built around, as well as making useless any specialized hardware the machine offers for context switching.
Oh, and since we had to manually open the connection instead of having the open implicit in acquiring it, we are also responsible for closing it as well, and we cannot make use of a using block for it since the open was a simple method call, not an object retrieval.

Now contrast with the Java version.

The DataSource is an abstraction for the database itself, not a piece of code used to communicate with the database. Acquiring a connection gets a functional connection, ready to be used. Any failure throws an exception, so the returned object is guaranteed to never be null.
Creation of the SQL prepared statement parameters is implicit in creating the statement abstraction object itself. Parameters are positional, so no need to keep names in multiple locations synchronized.
In Java there is no such thing as disposing of an object. If an object maintains resources that must be released, it is closed, which is what the try-with-resources block does. There is only one abstraction for releasing resources, not two. (Distinct close versus dispose.)
Transactions are part of the state of a connection, not a distinct entity. It is not possible to have a statement outside of a transaction, or to have multiple open transactions associated with a connection. A transaction begins when the previous one ends.

The Java version is just so much cleaner, conceptually, than the C# version that it boggles my mind anybody could see the C# code as superior in any way to the Java code. I know many people do, and many of them are very smart and experienced people. I just cannot see the world in the way that they do. I really do not mean to belittle or denigrate those people. I am simply befuddled.

Tags: code, csharp, java

(Leave a comment)

Apr. 18th, 2018

06:10 am - Using formal parsing methods on natural languages.

I've been thinking about this for years. I really love compilers, so much that I took the class on it multiple times in school. Possibly my favorite aspect of it is the formal theory built around parsing. The concept of a right-recursive descent parser is beautiful, both elegant and effective. For computer languages, it is unambiguously the best way to parse them for compilation.

But for years I've been thinking about how the concept should be useful for parsing natural languages, too. The basic idea is that you build a grammar for the language, then take a sentence of the language and parse it according to that grammar. The parser first tokenizes the sentence into a string of tokens, then tokens are matched to grammar rules to identify what grammatical construct they represent. Based on that you build a syntax tree representing what the sentence (for lack of a better word) means. The beauty here is that the parser does not try to guess what syntactical construct is coming next. It simply looks at what is there, and matches that against all possible legal grammatical constructs and picks the one that matches.

The core problem, of course, is ambiguity. In a computer language, ambiguity is unacceptable and hence disallowed. A grammar must be unambiguous for a parser to be built that can recognize it. For natural languages, ambiguity is not just allowed, it is prevalent. A lot of effective communication relies on ambiguity, and it is required for a lot of artwork and humor. Therefore in order to parse a natural language, the parser must be able to handle ambiguity gracefully.

In a recursive-descent parser, ambiguity manifests in the form of shift-reduce conflicts and reduce-reduce conflicts. Basically when applying a grammar, the parser finds more than one possible match in the grammar, and it does not know what action to take. In more precise terms, it finds multiple matching grammar rules and must decide (somehow) which one to apply. Again, in computer languages, ambiguity is not acceptable and so no grammar is allowed where such a conflict can happen. In theory. In practice it is a lot of painful work to come up with completely unambiguous grammars so most parser generators allow some ambiguous constructs and rules provided that all resulting conflicts are resolved somehow. A simple example would be, For all shift-reduce conflicts, shift.

What if, instead of statically choosing one grammar rule to use for parsing, we non-deterministically check all possible parses? In the same fashion as a deterministic finite state automata versus a non-deterministic finite state automata, we could create a non-deterministic right-recursive descent parser. In the case of conflict, instead of only applying one rule, we apply all rules. If no grammatical rules match, then of course nothing is applied.

In this way we can do something much more analogous to what people do in their heads when reading; we keep track of multiple possible interpretations of what they are reading and resolve down to one when they reach the end of a sentence. (Oftentimes a human will need to go back and re-read a sentence that is ambiguous to try a different parsing of it. A computer with vastly more short-term memory could do it concurrently.)

There is a piece missing from this. A key part of writing a parser is writing the tokenizer. Grammars are written in terms of streams of tokens. In a computer language, a sequence of characters can be identified and mapped to a type of token before the grammar rules are applied. That is why most languages have reserved words and rules about how numbers must be formatted and the like. In a natural language, different words representing different parts of speech can have the same spelling. For example, the word tastes could be a plural noun or a present-tense third-person singular verb. Leading could be an adjective or a noun. Lead could be a present-tense first-person singular verb, a past-tense first-person singular verb, either of those in third-person form, a noun, or an adjective. Since natural language grammar rules depend on these parts of speech, there is ambiguity in the part of the parser normally separated out as the lexer. One natural solution to this would be to move the identification of part of speech out of the lexer and into the grammar itself. This would solve the ambiguity problem by reducing it to a known problem (an ambiguous grammar) and applying the solution we already have. A major drawback is that it would drastically increase the size of the grammar itself. (Insert mathematician joke here.) Another solution would be to extend the non-determinism into the lexer. Instead of the lexer passing a single definitive token type to the parser, it could pass an ambiguous token that contains a set of all possible interpretations. The grammar parser would then need to try each of these possibilities. This would keep the grammar smaller (and theoretically more pure) but would complicate the implementation of both the parser and lexer, and may require storing more state at runtime.

Non-deterministic algorithms are well-known for being problematic to implement in reality. That is why we have a formal method for converting an NFA into a DFA, so we can actually run it in a reasonable time using a reasonable amount of memory. Since at the time of writing I know of no formal method for converting a non-deterministic LR(n) parser into a deterministic LR(n) parser, we would need to fall back on holding multiple parser states in memory simultaneously. I believe this will not be a problem in practice. When it comes to compiling a computer program, most languages have grammatical rules akin to this: program := statement (semicolon statement)*. Natural languages have very few of these constructs and they are used infrequently. Most natural language sentences are relatively quite short. Even if applied to entire paragraphs at a time, the size of the input is minuscule compared to the size of input most compilers receive on a daily basis. Even exponential algorithms can be executed if the input size is small enough. The parser can prune non-matching states as it executes, so in most cases where the amount of ambiguity is small the actual runtime states kept in memory will also be small.

For the lexer-based approach to assigning parts of speech to each word, even though the size of the problem may at first seem daunting, I believe it is not an issue. We can imagine a database of every word in a natural language that contains mappings of every verb tense conjugation or noun declension. In practice this would be slightly larger than a normal dictionary. In practical terms this is small enough to be considered a trivial problem for any standard database. For efficiently looking up a word by one of its spelling variants, imagine this database as a table where each row is a word and each column is its conjugation or declension. Now imagine creating a database index for each column. At this point I can wave my hands and invoke standard database theory and claim that the issue is solved. In practice this will involve tries or B-trees or some such, since it is a known problem with known solutions I do not need to go into detail.

I'll write more later. Right now Whiny Cat is whining at me to go back to bed, so I'll go nap for a few more hours.

Tags: code, coding, computers

Current Mood: sleepy

sleepy

(Leave a comment)

May. 6th, 2011

08:05 pm - Gird your loins, we're going technical.

( Gird your loins, we’re going technical.Collapse )

Sometimes I think the true mark of intelligence (or, perhaps, wisdom) is knowing when to avoid complicated designs and requirements, and stick with elegant simplicity.

Tags: code, coding, java, work

Current Mood:

accomplished

(Leave a comment)

Apr. 1st, 2011

01:17 am - life and comics

I love Maria Bamford. I want to meet her on an online dating website. That would rule. Is she in my age range?

Spent the past two days at work writing a servlet to expose our Spring configurations. I’m quite proud of it. I made an interface so beans can pretty-print their configuration by filling in a DocumentFragment. I know a lot of coders shudder at the DOM, but I prefer using the “right” tools for the job. And as it turns out, it is good I did as I discovered when some of our configuration strings were of the form, http://blahblah:8080/abcd123<#234 or some such nonsense. By using a programmatic interface instead of string concatenation, all the encoding and escaping was handled cleanly with no thought on my part. I appreciate not having to track down subtle bugs later on. I wish more of my colleagues shared my preferences.

I’d totally do Maria Bamford. She’s cute.

Tags: code, coding, girls

Current Location: home

Current Mood:

accomplished

Current Music: Futurama

(Leave a comment)