This article is about necessary and unavoidable code redundancy and discusses a model of code redundancy that helps to understand why source code generators do what they do, why they are needed at all.
The code you write in Java, or for that matter in any other language, is redundant. Not by the definition that says (per Wikipedia page https://en.wikipedia.org/wiki/Redundant_code):
In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary, such as…
Your code may also be redundant this way, but that is a different kind of story than I want to talk here and now. If it is, then fix it, and improve your coding skills. But this probably is not the case because you are a good programmer. The redundancy that is certainly in your code is not necessarily unnecessary. There are different sources of redundancy and some redundancies are necessary, others are unnecessary but unavoidable.
The actual definition of redundancy we need, in this case, is more like the information theory definition of redundancy (per the Wikipedia page https://en.wikipedia.org/wiki/Redundancy_(information_theory))
In Information theory, redundancy measures the fractional difference between the entropy H(X) of an ensemble X, and its maximum possible value log(|A_X|)
UPPPS… DO NOT STOP READING!!!
This is a very precise, but highly unusable definition for us. Luckily the page continues and says:
Informally, it is the amount of wasted “space” used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy.
In other words, some information encoded in some form is redundant if it can be compressed.
For example, downloading and zipping the text of the classical English novel Moby Dick will shrink its size down to 40% of the original text. Doing the same with the source code of Apache Commons Lang we get 20%. It is definitely NOT because of this “code in a computer program that is unnecessary”. This is some other “necessary” redundancy. English and other languages are redundant, programming languages are redundant and this is the way it is.
If we analyze this kind of redundancy we can see that there are six levels of redundancy. What I will write here about the six layers is not well-known or well-established theory. Feel free to challenge it.
This model and categorization are useful to establish a way of thinking about code generation when to generate code, why to generate code. After all, I came up with this model when I was thinking about the Java::Geci framework and I was thinking about why I invested a year of hobby time into this when there are so many other code generation tools. This redundancy model kind of gives the correct reason that I was only feeling before.
Levels of Redundancy
Then the next question is if these (English and programming language) are the only reasons for redundancy. The answer is that we can identify six different levels of redundancy including those already mentioned.
This is the redundancy of the English language or just any other natural language. This redundancy is natural and we got used to it. The redundancy evolved with the language and it was needed to help the understanding a noisy environment. We do not want to eliminate this redundancy, because if we did we may end up reading some binary code. For most of us, this is not really appealing. This is how human and programmer brain works.
The programming language is also redundant. It is even more redundant than the natural language it is built on. The extra redundancy is because the number of keywords is very limited. That makes the compression ration from 60% percent up to 80% in the case of Java. Other languages, like Perl, are denser and alas they are less readable. However, this is also a redundancy that we do not want to fight. Decreasing the redundancy coming from the programming language redundancy certainly would decrease readability and thus maintainability.
There is another source of redundancy that is already independent of the language. This is the code structure redundancy. For example when we have a method that has one argument then the code fragments that call this method should also use one argument. If the method changes for more arguments then all the places that call the method also have to change. This is a redundancy that comes from the program structure and this is not only something that we do not want to avoid, but it is also not possible to avoid without losing information and that way code structure.
3 Domain induced
We talk about domain induced redundancy when the business domain can be described in a clear and concise manner but the programming language does not support such a description. A good example can be a compiler. This example is in a technical domain that most programmers are familiar with. A context-free syntax grammar can be written in a clear and nice form using BNF format. If we create the parser in Java it certainly will be longer. Since the BNF form and the Java code mean the same and the Java code is significantly longer we can be sure that the Java code is redundant from the information theory point of view. That is the reason why we have tools for this example domain, like ANTLR, Yacc and Lex and a few other tools.
Another example is the Fluent API. A fluent API can be programmed implementing several interfaces that guide the programmer through the possible sequences of chained method calls. It is a long and hard to maintain way to code a fluent API. The same time a fluent API grammar can be neatly described with a regular expression because fluent APIs are described by finite-state grammars. The regular expression listing the methods describing alternatives, sequences, optional calls, and repetitions is more readable and shorter and less redundant than the Java implementation of the same. That is the reason why we have tools like Java::Geci Fluent API generators that convert a regular expression of method calls to fluent API implementation.
This is an area where decreasing the redundancy can be desirable and may result in easier to maintain and more readable code.
4 Language evolution
Language evolution redundancy is similar to the domain induced redundancy but it is independent of the actual programming domain. The source of this redundancy is a weakness of the programming language. For example, Java does not automatically provide getters and setters for fields. If you look at C# or Swift, they do. If we need them in Java, we have to write the code for it. It is boilerplate code and it is a weakness in the language. Also, in Java, there is no declarative way to define
hashCode() methods. There may be a later version of Java that will provide something for that issue. Looking at past versions of Java it was certainly more redundant to create an anonymous class than writing a lambda expression. Java evolved and this was introduced into the language.
Language evolution is always a sensitive issue. Some languages run fast and introduce new features. Other languages, like Java, are more relaxed or, we can say conservative. As Brian Goetz wrote in response to a tweet that was urging new features:
“It depends. Would you rather get the wrong feature sooner, but have to live with it forever?”
@BrianGoetz Replying to @joonaslehtinen and @java 10:43 PM · Sep 16, 2019
The major difference between domain induced redundancy and language evolution caused redundancy is that while it is impossible to address all programming domains in a general-purpose programming language, the language evolution will certainly eliminate the redundancy enforced by language shortages. While the language evolves we have code generators in the IDEs and in programs like Lombok that address these issues.
5 Programmer induced
This kind of redundancy correlates with the classical meaning of code redundancy. This is when the programmer cannot generate good enough code and there are unnecessary and excessive code structures or even copy-paste code in the program. The typical example is the before mentioned “Legend of the sub-par developer”. In this case, code generation can be a compromise but it is usually a bad choice. On a high level, from the project manager point of view, it may be okay. They care about the cost of the developers and they may decide to hire only cheaper developers. On the programmer level, on the other hand, this is not acceptable. If you have the choice to generate code or write better code you have to choose the latter. You must learn and develop yourself so that you can develop better code.
… or takeaway.
When I first started to write about the Java::Geci framework, somebody commented “why another code generation tool”? And the question is certainly valid. There are many tools like that as mentioned in the article.
However, if we look at the code redundancy categorization then what we can see is that Java::Geci can be used to manage the Domain Induced redundancy and perhaps the Language Evolution caused redundancy. In the case of the latter, there are many concurrent programs, and Java::Geci cannot compete, for example with the ease of use of the IDE built-in code generation.
There are many generators that address some specific domains and manage the extra redundancy using code generation. Java::Geci is the only one to my knowledge that provides a general framework that makes the domain-specific code generator creation simple.
To recognize that the real use case is for domain-specific generators the above redundancy model helps a lot.