Friday, February 25, 2005

One person's side-effect is another's parameter...

(Originally published on: Tue 10 of Aug, 2004)
I've been puzzling for some time over the role of side-effects in the design and development of software systems. There are a number of different flavors of side-effects in common usage, but they are most often encountered in the form of global variables.

Most competant programmers abhor global variables by tradition and bitter experience. However I've been pondering lately whether globals are terrible by nature or if their problems are an artifact of implementation.

So, what are the attributes of a global anyway? Are all globals bad or just some kinds?

Let's see...

There are static constants. What's wrong with them? Not much, except that global constants declared in separate packages can conflict, either causing compile-time errors or masking each other and causing unexpected runtime errors.

So, namespace is an issue. Specifically, the conflict between the desire to have a global namespace full of well-known entities and the desire to assemble it incrementally by composition based on input from an arbitrary set of packages that have no particular knowledge of each other.

How about global variables?

There are a number of problems typically associated with global variables, most of which are variations of one basic problem. Because the variable is global, its value can be set or read at any time from anywhere within the global scope of the system. This manifests itself as a problem with:
  • Initialization
  • Finalization (memory leakage)
  • Lost and unexpected changes (race conditions)
  • Hidden dependencies (inability to compile or run extracted code fragments)

It is interesting to observe that both of these problems are addressed at length in other contexts. Namespace normalization and integration is widely tackled in the areas of schema integration and metadata management. The issue of race condtions surrounding access to a global is tackled by both concurrent programming groups and database engineers under the general banner of transactional or monitored access to shared resources.

Why the techniques used in these other application areas haven't been rolled back in lightweight form to improve the usefulness and safety of globals is a question I'll come back to another day. For now, it's interesting to consider the last variant problem: hidden dependencies.

Hidden dependencies commonly happen when one block of code sets a global variable under certain circumstances and another block of code reads and acts on the values of the variable in specific ways. If the first block of code changes its behavior, the second block of code may break in surprising ways. In fact, a block of code can break itself if it both reads and sets the global in an inconsistent manner during its execution.

In theory terms, what's happening in the first case is that the range of one function and the domain of a second are interlocked but there is no explicitly declared definition of the domain or range of either so it is extremely difficult to disentangle them. When considered strictly from an API perspective, both functions really have an imaginary parameter as part of their signature.
r1i = f(a, b)
r2 = f2(a, b, i)

This can also occur with local variables, when a single block of code first changes and then refers to a variable during subsequently executed lines of code. (In this case, the function's range is folded back so that it's domain interlocks with it and the function has an implicit self-dependency.) In practice, this is seldom considered to be a problem unless complex procedural logic like recursion is encountered, or if the state of the variable persists across invocations of the function.

So this leads to what I've been pondering about side-effects...

Something is an acceptible behavior if it is explicit, has a contractual definition and is introspectable. Otherwise, it is generally considered to be a side-effect. All of these characteristics of global variables that I've been talking about are well-mannered behaviors when they occur in other forms (databases, shared resources, etc.) because in those forms, the behaviors are complex enough, and occur frequently enough, that programmers are willing to spend the effort to codify contractual and introspectable definitions around them.

In the case of plain old global variables, however, they are generally used in situations where simplicity and speed of development are desirable and their implications are frequently not well-thought-out.

So that kind of leads me to the question: are there other ways to provide the benefits of explicit contractual relationshps around globals without having to force a heavyweight, design-intensive experience on the developer? Possibly by embedding the analysis in code refactoring tools rather than in runtime mechanisms?

No comments: