Image

Modularity.

I've been turning this over in my head for a while now, and it still seems like there ought to be some clean and simple solution that I'm just not seeing. The jist of the question is: Loading or inclusion of modules in a C program, either at compile time or at run time. How do you do it cleanly?

The gritty detail goes something like this:

It's pretty common for a program to grow to the point where optionally including various modules (often written by third parties) becomes quite an important feature. Perfect examples of this would be Exim and Apache, but it can just as easily show up anywhere where complex site-specific behavior is useful.

Often the problem of site-specific behavior is solved with scripting. This has a couple of problems: Scripts are often slow to execute and scripting languages / engines to drive them are a pain to write correctly. Scripts are still very useful, especially when a fairly limited range of functionality is being exposed. They can usually be read on startup, "compiled" into a rapidly executable form within the running program, and then used on demand without incurring much of a time penalty.

The problem of nasty scripting languages can be escaped by making the "scripts" executable as separate processes which communicate with the main process over some IPC mechanism such as pipes. With a good data format, this allows the modules to be written in just about any language. You still have to write crufty parsing code for transferring data back and forth, and it's still rather annoying to try to expose vast sets of functionality, but at least the constraints on the user are somewhat eased. (Along with the learning curve.)

Pipes and modules-as-programs is an extremely handy idea when working with data filters, since it allows you to abstract the work away from the engine for maintaining / displaying the data. This allows the filters to be chained together for automated operations without even having the "main program" involved. For server applications and the like, they're not so handy, since they increase load on the process table and take up file descriptors. (No, really, this *does* become an issue eventually.)

This leads straight into the idea of the module actually being part of the program, in terms of actually running within the same process as it, etc... The first version of this I saw involved each module being a single function, all with the same prototype, one per file. The files were then compiled to become .o files, stripped, and tossed in a directory. The main program read this directory, memmapped the contents of each file from it, and then used those memmapped locations as function pointers.

On one hand, that was a *really* hacked up and disgusting way of doing it. (Especially if you consider that it depended on executable data segments.) On the other hand, it was easy to add new modules without recompiling the main program, and they were *fast*. Everything the modules needed had to be passed in the parameters, but in most cases that's just plain good design anyway. (Only *most* cases, though. I'm a firm believer in the rare but existent validity of global variables.) The main pesky limitation was having only one function per file, since often it would be desirable for a single module to embody a variety of related functions.

For compile-time inclusion of modules, the most common structure I've seen goes like this: Each module has its own file (usually all the modules sit together in a single directory), containing a ton of functions of similar prototype, and a setup/initialization function. These files make specific includes to give them access to various variables and functions in the main program. Somewhere in the main program is a jump table which during program execution contains all the specs (like function pointers) for all the functions from loaded modules. Somewhere during the startup of the program, a function is called which sets up this table by calling the setup functions for all the modules which have been compiled in.

Nice system, allows the modules a lot of access to program internals (not always an advantage, but sometimes essential), but has two huge downsides: You cannot add modules without recompiling the whole thing, and you have to go edit the table setup function (or a data table it relies on) every time you add a module. (Not being able to monitor the data flow between the program and the module very easily could also be considered a downside, but I'm not worrying about that in this instance, since anything where that really matters ought to be using a more IPC-like model anyway.)

What I would really like, and have been utterly failing to come up with, would be a mix of those last two options. Post-compile addition of modules, many functions per module, but fairly deep access to program internals, and not relying on that silly memmap behavior unless someone can show that it's portable/standard. I just don't see a sensible way of doing it. I'd even settle for just compile-time addition that didn't involve any edits to the main code.

Suggestions?