All the main files

Back in the day there was a main.c file and all was well. Inside of it there was int main() in some shape or form and that was also great. All one had to do to work on a project was make and run the program. And all was well until THE MAINS MULTIPLIED!

Multiplied? That's right. Modern applications are like a disease. Like viruses... Uhh... Not quite, please scratch that, but they are often distributed with multiple independently running processes. Often (eventually) not on the same host machine. Production environments have a myriad of different methods to deal with starting them up and shutting them down and replacing them. Think Docker, Kubernetes, or Coolify or others.

Yet development environments rarely receive an equivalent care and the solutions vary from just running:

a & b & c & d

in the terminal to some variations of "first start up the development web interface and then toggle individual services from there".

Exhibit A

If I have multiple applications in the same repository, I usually want to start more than one of them at the same time. And sometimes a specific selection of them. Why? Because startup speed matters when developing anything. Imagine a similar setup:

- website_about_spiders
- website_about_cows
- etc...

Of course it is best to allow automatic reloading and keep the applications and services small and fast. This will get us 90% there. As the repository grows and I can afford to have really large repositories these days, it can become weighty to start all components.

An alternative is to allow starting some of the applications. Why not have a script that starts all the applications based on a specific naming scheme.

It also offers a reason to organize the applications in a specific way. Applications and services that are usually started together can be placed in the same directory.

Exhibit B

Now the opposite issue with a single main entry point. Imagine that I have a web server. Most examples would recommend building it something like this (given a JS web server):

// main.js
import express from 'express'

const app = express.app()

app.get('/spiders', ...) // One endpoint
app.get('/cows', ...)    // Second endpoint

app.listen(3000)

It makes sense, especially for an example. As a catch though, it sets the expectation that all the endpoints must be loaded at once and are directly dependent on each other.

The problem with it is that, for example, any tests that are written to test a single endpoint will need to start up the entire application. If it is a tiny application then it is fine, but applications grow like dragons. At a modest sign of success they become hungry for more code!

And therein goes the ease and speed of writing a new endpoint and testing it in isolation...

So what if... What if it was not the case? Or what if I used PHP? It lets us have endpoints in isolation?

Changing the way of thinking about servers

PHP based Apache servers were rather nice in one way. Each endpoint was separate. The issue is that there is still "a server" that needs to run the entire thing and separating it from the application or applications completely creates an external dependency that is more complex to test and reason about.

I think it may be in part behind the popularity of NodeJS and Python varieties of servers. It's easier to think of the server as part of the "code" and also easier to have a single entry point for starting up the entire application. That's the bad side. It means the application becomes a natural monolith forcibly tied together with that server.

To avoid it, it's possible to think of each endpoint or a small groups of endpoints as independent units that may be started up independently or together with some other endpoints during development and testing.

As I appreciate simplicity in Forever 200 OK, I went with a variation that is a little bit of both.

Ground rules

First I made up some ground rules to keep life simple:

  1. EVERYTHING must be runnable on local development machine.
  2. I have a single repository, but not a single application.
  3. Any "main" or a process entry point, something that can be executed directly from the operating system, MUST be executable independent of any other entry point. This serves a great purpose as it allows us to be rather certain that none of the applications affect each other. I can still have shared portions of code, but these belong to the libraries or the lib directory only.
  4. If in ANY directory of the project, I'd like to be able to start all mains under it. This allows for really slicing down the size of the processes if I so desire.

To remain more faithful to the truth, in my case I merge the processes on startup as I use NodeJS only. It means that I still have a single main file, but I can start any sub-main or entry point independently. There is more below though before I get to that.

Let's mix the setup from Exhibit A and Exhibit B:

- website
   - spiders_endpoint
   - cows_endpoint
- etc...

If I agree that startup.js file is the entry point and have a script that allows us to "start" everything in a given folder, I could place the startup.js file in spiders endpoint folder and another in cows_endpoint folder.

Of course it doesn't make sense to create a separate process, if they should in general run together. So the typical endpoint might look slightly different and I'd also need a singleton of the application server or a few of them.

Example

If I have a helper to lazy initialize the server similar to this:

// helper.js
import express from "express";

let app;

export async function server() {
  return new Promise((resolve, _reject) => {
    if (app) {
      resolve(app);
      return;
    }

    const app = express.app();

    // Note: in reality it's better to
    // resolve with more things as they'll be
    // occasionally required by endpoints. Eg:
    // custom router(s), to be able to modify order of endpoints
    // separate http server
    // and more
    app.listen(3000, () => resolve(app));
  });
}

then startup.js files might look like this:

// spiders_endpoint/startup.js
// It's useful to export as default
// Note: In newer NodeJS versions the function could
// be skipped as await is possible at root level
export default async function startup() {
  // Acquiring the server here instead of starting it
  // at the end of the script calling all startup.js files
  // makes it possible to have non-server startup.js files
  // that might communicate with other services in a different way.
  // Such as queues, custom protocols over TCP or UDP etc
  const app = await server()

  // Yes, an expressjs server can be started before adding endpoints!
  // No, it does not work with all http server libraries
  app.get('/spiders', ...) 
}

Please keep in mind that the above is just an example and one possible solution of many to isolate more parts of the overarching application/service infrastructure for development purposes. Whether it is possible or not depends on the capabilities of the server used. Some servers and languages make it impossible for endpoints to be added AFTER the server is started. In those cases having a proxy may be a simpler option for a similar setup, but I digress.

The best way to ensure simple architecture is to avoid direct dependencies between the parts of the infrastructure and even parts of the server or servers that are in the infrastructure.

In the above example, the endpoints do not need any shared data and even if they did, they should request it, rather than have it passed to them.

Basically - having a global "context" is a bad idea. Better option is lazy initialization of the server on first demand as it:

  • avoids "always" starting a HTTP server, even if not needed by the part of the application
  • still allows sharing the HTTP server resource

With NodeJS each of these endpoints starts up instantaneously, so the entire server can be up and running very fast.

Important!

  1. None of the main files can hog a permanent loop. It's best to use (async () { ... code here ... })() without an await to allow all the files start up independently. An alternative is to use a separate thread, but coroutines work just as well in most cases.
  2. Do NOT give into temptation to parametrize the start() {} methods. It is an entry point to the application. If parameters need to be passed, it's better to use process.argv, configuration files or the environment variables. As an exception, I do like to allow an optional test database pool to be passed into the start function. This lets me start up the application naturally from within the code for E2E testing without polluting the local development database.

Summary

I promise, it is a better way to build any larger application. At least in case of server style applications. It also resembles unix philosophy in that this way each part of the system or a program can focus only on what it needs to do. Of course there may be some shared parts, but shared libraries are also used in *nix based systems.

It also makes the development simple, because it can all be started at once as a single application. At the same time, each part can be started on its own and used as a microservice without any great changes.

  • Heidi (Founder)