Supporting Java 8

Although Java has version 13 released as for now, there are a lot of production installations running with Java 8. As a professional, I develop Java 8 code many times even these days and I have to be happy that this is not Java 6. On the other hand as an open-source developer, I have my liberty to develop my Java code using Java 11, 12 or even 13 if that pleases me. And it does.

On the other hand, though, I want my code to be used. Developing a tool like License3j or Java::Geci, which are kind of libraries releasing Java 11 compatible byte code cuts off all the Java 8 based applications that may use these libraries.

I want the libraries to be available from Java 8.

One solution is to keep two branches parallel in the Git repo and have a Java 11+ and a Java 8 version of the code. This is what I have done for Java::Geci 1.2.0 release. It is cumbersome, error-prone and it is a lot of work. I had this code only because my son, who is also a Java developer starting his career volunteered.

(No, I did not pressure him. He speaks and writes better English than I do, and he regularly reviews these articles fixing my broken languages. If he has different opinion about the pressure, he is free to insert here any note till the closing parenthesis, I will not delete or modify that. NOTE: )

Anything above between the NOTE: and ) is his opinion.

The other possibility is to use Jabel.

In this article, I will write about how I used Jabel in the project Java::Geci. The documentation of Jabel is short but still complete and it really works like that for simpler projects. For example I really only had to add a few lines to the pom.xml in case of the Licenese3j project. For more complex projects that were developed over a year without thinking about any compromise for Java 8 compatibility, it is a bit more complex.

About Jabel

Jabel is an open-source project available from https://github.com/bsideup/jabel. If you have a Java 9+ project source you can configure Jabel to be part of the compilation process. It is an annotation processor that hooks into the compilation process and kind of tricks the compiler to accept the Java 9+ features as they were available for Java 8. The compiler will work and generate Java 8, Jabel does not interfere with the byte code generation, so this is as genuine as it can be out of the Java compiler fresh and warm. It only instructs the compiler not to freak out on Java 9+ features when compiling the code.

The way it works and why it can work is well written on the project’s GitHub page. What I wrote above may not even be precise.

Backport issues

When creating Java code using Java 9+ features targeting a Java 8 JVM it is not only the byte code version that we should care about. The code executed using the Java 8 JVM will use the Java 8 version of the JDK and in case we happen to use some classes or methods that are not available there then the code will not run. Therefore we have two tasks:

  • Configure the build to use Jabel to produce Java 8 byte-code
  • eliminate the JDK calls that are not available in Java 8.

Configure Build

I will not describe here how to configure Jabel to be part of the build using Maven. It is documented on the site and is straightforward.

In the case of Java::Geci I wanted something different. I wanted a Maven project that can be used to create Java 8 as well as Java 11 targets. I wanted this because I wanted Java::Geci to support JPMS just as before and also to create state-of-the-art byte code (class nesting instead of bridge methods for example) for those projects that run on Java 11 or later.

As the first step, I created a profile named JVM8. Jabel is only configured to run only when this profile is active.

This profile also sets the release as

<release>8</release>

so the very first time the compiler was freaking out when it saw the module-info.java files. Fortunately, I can exclude files in the POM file in the JVM8 profile. I also excluded javax0/geci/log/LoggerJDK9.java and I will talk about that later.

I also tried to use Maven to automatically configure the version number to have the -JVM8 postfix if it runs with the JVM8 profile but it was not possible. Maven is a versatile tool and can do many things and in case of a simpler project doing that should be the approach. In the case of Java::Geci I could not do that because Java:Geci is a multi-module project.

Multi-module projects refer to each other. At least the child module reference the parent module. The version of the child module may be different from the version of the parent module. It is kind of logical since their evolution and development are not necessarily tied together. However, usually, it is. In projects, like Java::Geci that has seven child modules and each child module has the very same version number as the parent the child modules can inherit all the parameters, dependencies, compiler options and so on, from the parent but the version. It cannot inherit the version because it does not know which parent version to inherit it from. It is a catch 22.

The Java::Geci development goes around this problem using the Jamal preprocessor maintaining the eight pom.xml files. Whenever there is a change in the build configuration it has to be edited in one of the pom.xml.jam files or in one of the included *.jim files and then the command line mvn -f genpom.xml clean will regenerate all the new pom.xml files. This also saves some repetitive code as the preprocessed Jamal files are not so verbose as the corresponding XML files. The price for this is that the macros used have to be maintained.

Java::Geci has a version.jim file that contains the version of the project as a macro. When targeting a Java 8 release then the version in this file has to be changed to x.y.z-JVM8 and the command mvn -f genpom.xml clean has to be executed. Unfortunately, this is a manual step that I may forget. I may also forget to remove the -JVM8 postfix after the Java 8 target was created.

To mitigate the risk of this human error I developed a unit test that checks the version number is coherent with the compilation profile. It identified the compilation profile reading the /javax0/geci/compilation.properties file. This is a resource file in the project filtered by Maven and contains

projectVersion=${project.version}
profile=${profile}

When the test runs the properties are replaced by the actual values as defined in the project. project.version is the project version. The property profile is defined in the two profiles (default and JVM8) to be the name of the profile.

If the version and the profile do not match the test fails. Following the philosophy of Java::Geci, the test does not just order the programmer to fix the “bug” when the test itself can also fix the bug. It modifies the version.jim file so that it contains the correct version. It does not, however, run the pom file generating Jamal macros.

As a result of this I will get release files with version x.y.z and also x.y.z-JVM8 after the second build with some manual editing work.

Eliminate Java 8+ JDK calls

Simple calls

This is a simple task at first sight. You must not use methods that are not in Java 8 JDK. We could do anything using Java 8 so it is a task that certainly possible.

For example every

" ".repeat(tab)

has to be eliminated. To do that I created a class JVM8Tools that contain static methods. For example:

public static String space(int n){
    final StringBuilder sb = new StringBuilder(/*20 spaces*/"                    ");
    while( sb.length() < n){
        sb.append(sb);
    }
    return sb.substring(0,n).toString();
}

is defined there and using this method I can write

space(tab)

instead of the invocation of String::repeat method. This part was easy.

Mimicking getNestHost

What was a bit more difficult is to implement the getNestHost() method. There is no such thing in Java 8, but the selector expressions included in the Tools module of Java::Geci lets you use expressions, like

Selector.compile("nestHost -> (!null & simpleName ~ /^Map/)").match(Map.Entry.class)

to check that the class Entry is declared inside Map, which it trivially is. It makes sense to use this expression even in Java 8 environment someone chooses to do so and I did not want to perform amputation dropping this feature from Java::Geci. It had to be implemented.

The implementation checks the actual run-time and in case the method is there in the JDK then it calls that via reflection. In other cases, it mimics the functionality using the name of the class and trying to find the $ character that separates the inner and the enclosing class name. This may lead to false results in the extremely rare case when there are multiple instances of the same class structures loaded using different class loaders. I think that a tool, like Java::Geci can live with it, it barely happens while executing unit tests.

There is also a speed drawback calling the method Class#getNestHost reflectively. I decide to fix it if there will be real demand.

Logging support

The last issue was logging. Java 9 introduced a logging facade that is highly recommended to be used by the libraries. Logging is a long-standing problem in the Java environment. The problem is not that there is not any. Quite the opposite. There are too many. There is Apache Commons Logging, Log4j, Logback, the JDK built-in java util logging. A standalone application can select the logging framework it uses, but in case a library uses a different one then it is difficult if not impossible to funnel the different log messages into the same stream.

Java 9 thus introduced a new facade that a library can use to send out its logs and the applications can channel the output through the facade to whatever logging framework they want. Java::Geci uses this facade and provides logging API for the generators through it. In case the JVM8 environment this is not possible. In that case Java::Geci channels the log messages into the standard Java logger. To do that there is a new interface LoggerJDK implemented by two classes LoggerJVM8 and LoggerJDK9. The source code for the latter is excluded from the compilation in case the target is Java 8.

The actual logger tries to get the javax0.geci.log.LoggerJDK9#factory via reflection. If it is there, then it is possible to use the Java 9 logging. If it is not there then the logger falls back to with the factory to javax0.geci.log.LoggerJVM8#factory. That way only the logger factory is called via reflection, which happens only once for every logger. Logging itself is streamlined and uses the target logging without any reflection, thus without speed impediment.

Takeaway

It is possible to support Java 8 in most of the library project without unacceptable compromise. We can create two different binaries from the same source that support the two different versions in a way that the version supporting Java 9 and later does not “suffer” from the old byte code. There are certain compromises. You must avoid calling Java 9+ API and in case there is an absolute need, you have top provide a fall-back and you can provide a reflection-based run-time detection solution.

Creating a Video Tutorial

I usually write technical articles here. This article is an exception. I do not know if this is a checked exception or not though. I do not even know if this really is an exception or rather an error or just something throwable. (I am just fooling around with the different Java exception types only because I am such a fun guy and also because this is a Java blog, so it SHOULD, as defined in rfc2119, have some words about Java.)

This article is about how I create video tutorials. I have created a few. Not many. The implication of the amount is that what I tell you here is not the ultima ratio. I am almost sure that in many things I am wrong and I am open to criticism. Just be polite: a few people actually read this blog, including the comments.

I created screen video recording as product documentation when I was running my own company ten years ago. I also created some as a training for my current employer, EPAM, and also for this blog and for PACKT. (Yes, this part of the article is a commercial, please go and subscribe and learn Java 9 new features from me listening to Java 9 New Features Deep Dive [Video].)

Length

The length of a video should be 5 to 10 minutes. The shorter the better. I was worried at first about not being able to fill these time frames. But it is easy. I usually struggle with the opposite. Sometimes I can not make the video as short as I would like to.

Presentation

Many times I create a presentation to highlight what I will talk about during the demonstration. This is important. These visuals help the audience get the content and understand what they can expect in the coming five or ten minutes. In other cases, the presentation itself is the main attraction. I usually use Microsoft PowerPoint simply because that is what I have the most experience with and it is available both on Windows and OSX.

Screen Recording

I use OSX and iShowU Instant. I record video in HD format these days and I also use an external monitor attached to my mbp. The recording control is on the built-in display of the mbp, which is a bit higher resolution than HD and the recorded scene is running on the external screen.

I record applications maximized and if possible set to full screen. There is no reason to show the little “minimize, maximize, close” icons or the application frame. This is equally true on OSX, Linux or Windows.

When you do something on the screen do not explain it while doing it. Explain it before and then do it. The reason for that is that this way the keyboard and mouse noise is separated from the talk and can be muted. Also, when you type silently you have the option later while editing the video to speed up the typing. The audience gets bored seeing how the typed letters come up one after the other. You can simply speed it up for a longer typing ten times even. They will see that this is sped up, but that is not a problem unless you want to demonstrate the speed of something.

Voice Recording

I live in a little, peaceful Swiss dorf (village). The road is near and the airplanes landing to Kloten (ZRH) just fly above the house, so the voice recording environment is not ideal, but around 10pm it is acceptable. In my former (Budapest) location, I could not record without noise. So the first thing is that you need a very quiet environment. Perhaps this is the most costly investment, but it also serves other purposes: it boosts your sleep, irons your nerves. Peace is invaluable, world peace… you know.

When you consider the noise, do not only rely on your ears. I have a neighbor who is a professional drum player. Switzerland has strict noise regulation and these guys living here mean it: he is using some special drum set that suppresses the sound a lot. I am 52 and it means my hearing started to slowly decay. I would not have noticed that he is playing the drum sometimes till 11pm (which is strictly illegal, you can do the noisy activity until something like 8pm) unless I started recording. The microphone was recording it and I could hear it in the headset attached to the mic.

I also realized through the headset that the table and the chair is a huge source of the noise. PACKT supports content creators (at least they supported me) with some PDFs that give some very practical technical advice and the chair was mentioned there. Table was not. Do not lean on the table when recording. Better yet, do not even touch it.

The second important thing is the microphone. I tried to use the built-in mic of my MacBook Pro, which is exceptionally good for things like Skype, ad-hoc recording, recording a meeting, but not sufficient for tutorial recording. I bought an external microphone for 28CHF but it was not good enough. It was noisy. The one that I finally found is sufficient is a Zoom H2n recorder that also works as a USB microphone.

It stands on my desk on a tripod. I usually put a pillow between the mic and the notebook, so the noise of the vent is dumped and I also moved the external HDD under the table. The pillow thing was coming from one of the PACKT materials and it is a great idea: it works and it is simple. The HDD now stands on the floor on a cork wood base (originally it was some IKEA cooking thing) which is put on top of a thick, folded (multiple times) cloth. Even though the noise of it is almost inaudible I disconnect it when I do the recording. That also prevents a backup firing off while recording eating the CPU off from the screen recorder, which itself is not a CPU hog to my surprise, but that is a different story. Here we talk about the noise (sic) recording. Btw: while recording also disable the network, unless you want to demonstrate something that needs it. You do not want to record notification popups.

While I talk I attach a BOSE Q25 headset directly to the mic and through that I hear my own voice amplified. Because you hear your own voice from inside through your bones when you talk it sounds totally different when you listen to the recording. With the headset, the voice leaving my mouth is amplified and with active noise cancellation, I hear myself more from outside only through the microphone. It helps me to articulate better and also to recognize when my tongue twists.

Talking

I had to realize that I have to talk slow. I mean really slow. And as far as I know, most people who record voice run in the same shoe. When you record something, slow down your talk and when you feel that this is ridiculously slow then it probably is just okay.

When your tongue twists or you just realize that you made a mistake in a sentence: do not correct the part like you would do in a live presentation. Stop. Take a breath. Think. Wait 5 seconds or more. Take your time and restart from the start of the last, erroneous sentence. The 5 seconds helps you to think about where to restart from, but this is also something easy to notice on the waveform when the recorded video is edited. If there is a pause in the voice it probably is something to cut off. I also hit the table with my palm, which makes a noise overloading the microphone and is a clearly visible peak on the waveform. You can also clap your hands or use a whistle. May seem ridiculous first.

Recording face

You may want to record your face while you talk about some slides. This is good for the audience, it makes your presentation more personal. I use an external webcam for this purpose. Although iShowU Instant can put the video input on the recorded screen as a picture in the picture, I decided to record the video input separately. On OSX I can record simultaneous screen using iShowU and the video input using PhotoBooth. That way both inputs will have the same audio recorded from the same microphone. This helps to put the two videos in sync when editing and then one of the audio (presumably the one from the presentation, as it is the one less sensitive to slipped audio) can be deleted.

This way it is also possible to put the PIP face at different locations although I do not recommend that you move it a lot around. But it can many times be removed from the screen. For example, if you record slides as well as demo code then you can show your talking head on the slides and hide it when showing code demo on the screen.

When you talk you have to face the camera. It is difficult because you want to talk about a slide that is not in the camera. It is on a screen that is just at the side of the camera. The bad news is that the audience will see that you are not looking into their eyes (which is the camera). You HAVE TO look into the camera.

I was told to look at the slide, read it and then look into the camera and say the text again and then cut it off during the editing phase. It did not work. What worked was that I created a teller machine from a cardboard box, picture glass, and black paint. I also bought for something like 5$ a teller application that runs on my iPad and is reflected from the glass, which is set 45 degrees in front of the webcam. It all stands on a tripod on the table.

Video Background

I use a curtain behind my chair to have an ambient background. There is nothing wrong with a room in the background, but it may cause some problems.

A clock on the wall will show that you recorded the video in several steps. It will jump back and forth and it is distracting for the audience. It is also bad when some background items, chairs, tables, etc. jump between different cuts of the video.

Video Editing

To edit the video I use iMovie. This comes free with the operating system on a mac and has enough functionalities to edit a technical video. Sometimes I feel I lack some features, which are available in professional video editing software products but later I realize that I do not need them.

I value the Kern Burns cropping functionality very much. This was originally invented to show still pictures in a dynamic, moving way in a movie. When doing screen capture I can use this functionality to move the focus to the area of the screen, (usually showing the IDE when programming Java) that is important from the demo point of view.

Takeaway

There are many ways of doing tutorial videos, and I cannot tell what will fit your personality, topic, and audience. I wrote down my experience and I hope you can find something useful in it for you.

A New Era for Determining Equivalence in Java?

A few month ago I read a blog post of the title “A New Era for Determining Equivalence in Java?” and it was somehow very much in line with what I developed that time in my current liebling side project Java::Geci. I recommend that you pause reding here and read the original article and then return here, even knowing that telling that a sizable percentage of the readers will not come back. The article is about how to implement equals() and hashCode() properly in Java and some food for thoughts about how it should be or rather how it should have been. In this article, I will details these for those who do not read the original articles and I also add my thoughts. Partly how using Java::Geci addresses the problems and towards the end of my article how recursive data structures should be handled in equals() and in hashCode(). (Note that the very day I was reading the article I was also polishing the mapper generator to handle recursive data structures. It was very much resonating with the problems I was actually fixing.)

If you came back or even did not go away reading the original article and even the referenced JDK letter of Liam Miller-Cushon titled “Equivalence” here you can have a short summary from my point of view of the most important statements / learning from those articles:

  • Generating equals() and hashCode() is cumbersome manually.
  • There is support in the JDK since Java 7, but still the code for the methods is there and has to be maintained.
  • IDEs can generate code for these methods, but regenerating them is still not an automated process and executing the regeneration manually is a human-error prone maintenance process. (a.k.a. you forget to run the generator)

The JDK letter from Liam Miller-Cushon titled “Equivalence” lists the tipical errors in the implementation of equals() and hashCode(). It is worth reiterating these in a bit more details. (Some text is quoted verbatim.)

  • “Overriding Object.equals(), but not hashCode(). (The contract for Object.hashCode states that if two objects are equal, then calling the hashCode() method on each of the two objects must produce the same result. Implementing equals() but not hashCode() makes that unlikely to be the case.)” This is a rookie mistake and you may say that you will never commit that. Yes, if you are a senior as a programmer but not yet a senior in your mental capabilities e.g.: forgetting where your dental prostheses are then you will never forget to create hashCode() whenever you create equals(). Note, however, that this is a very short and temporal period in life. Numerous juniors also form the codebase and the lacking hashCode() may always lurk in the deep dark corners of the haystack of the Java code and we have to use all economically viable measures to avoid the non-existence of them.
  • “Equals implementations that unconditionally recurse.” This is a common mistake and even seniors many times ignore this possible error. This is hardly ever a problem because the data structures we use are usually not recursive. When they are recursive the careless recursive implementation of the equals() or hashCode() methods may result in an infinite loop, stack overflow, and other inconvenient things. I will talk about this topic towards the end of the article.
  • “Comparing mismatched pairs of fields or getters, e.g. a == that.a && b == that.a. This is a topical typing error and it remains unnoticed very easily like topical -> typical.
  • Equals implementations that throw a NullPointerException when given a null argument. (They should return false instead.)
  • Equals implementations that throw a ClassCastException when given an argument of the wrong type. (They should return false instead.)
  • Implementing equals() by delegating to hashCode(). (Hashes collide frequently, so this will lead to false positives.)
  • Considering state in hashCode() that is not tested in the corresponding equals() method. (Objects that are equal must have the same hashCode().)
  • equals() and hashCode() implementations that use reference equality or hashCode() for array members. (They likely intended value equality and hashCode().)
  • Other bugs (which are out of scope for the proposal): usage errors like comparing two statically different types, or non-local errors with definitions (e.g. overriding equals and changing semantics, breaking substitutability)

What can we do to avoid these errors? One possibility is to enhance the language, as the mentioned proposal suggests so that the methods hashCode() and equals() can be described in a declarative way and the actual implementation, which is routine and cumbersome is done by the compiler. This is a bright future, but we have to wait for it. Java is not famous for incorporating ideas promptly. When something is implemented it is maintained for eternity in a backward-compatible manner. Therefore the choice is to implement it fast, possibly in the wrong way and live with it forever. Or wait till the industry is absolutely sure how it has to be implemented in the language and then and only that time implement it. Java is following the second way of development.

This is a shortage in the language that comes from language evolution as I described in the article Your Code is Redundant…. A temporal shortage that will be fixed later but as for now, we have to handle this shortage.

One answer to such shortage is code generation and that is where Java::Geci comes into the picture.

Java::Geci is a code generation framework that is very well fitted to create code generators that help reduce code redundancy for domain-specific problems. The code generators run during unit test execution time, which may seem a bit later, as the code was already compiled. This is, however, fixed with the working that the code generating “test” fails if it generated any code and executing the compilation and the tests the second time will not fail anymore.

Side note: This way of working may also be very familiar to any software developer: let’s run it again, it may work!

In the case of programming language evolution shortages Java::Geci is just as good, from the technical point of view. There is no technical difference between code generation for domain-specific reasons and code generation for language evolution shortage reason. In the case of language evolution issues, however, it is likely that you will find other code generation tools that also solve the issue. To generate equals() and hashCode() you can use the integrated development environment. There can be nothing simpler than selecting a menu from the IDE and click: “generate equals and hashCode”.

This solves all but one of the above problems, assuming that the generated code is well-behaving. That only one problem is that whenever the code is updated it will not run the code generator again to update the generated code. This is something that IDEs can hardly compete with Java::Geci. It is more steps to set up the Java::Geci framework than just clicking a few menu items. You need the test dependency, you have to create a unit test method and you have to annotate the class that needs the generator, or as an alternative, you have to insert an editor-fold block into the code that will contain the generated code. However, after that, you can forget the generator and you do not need to worry about any of the developers in your team forgetting to regenerate the equals() or hashCode() method.

Takeaway

  • Having the proper equals() and hashCode() methods for a class is not as simple as it seems. Writing them manually is hardly ever the best approach.

  • Use come tool that generates them and ensure that the generated code and the code generation does not exhibit any of the above common mistakes.

  • If you just need it Q&D then use the IDE menu and generate the methods. On the other hand, if you have a larger codebase, with many developers working on it and it is possible that the code generation may need re-execution then use a tool that automates the execution of the code generation. Example: Java::Geci.

  • Use the newest possibe version of the tools, like Java so that you do not lag behind available technology.

Java Record

The https://openjdk.java.net/jeps/359 outlines a new Java feature that may/will be implemented in some future versions of Java. The JEP suggests having a new type of “class”: record. The sample in the JEP reads as follows:

record Range(int lo, int hi) {
  public Range {
    if (lo > hi)  /* referring here to the implicit constructor parameters */
      throw new IllegalArgumentException(String.format("(%d,%d)", lo, hi));
  }
}

Essentially a record will be a class that intends to have only final fields that are set in the constructor. The JEP as of today also allows any other members that a class has, but essentially a record is a record, pure data and perhaps no functionality at its core. The description of a record is short and to the point and eliminates a lot of boilerplate that we would need to encode such a class in Java 13 or less or whichever version record will be implemented. The above code using conventional Java will look like the following:

public class Range {

    final int lo;
    final int hi;

    public Range(int lo, int hi) {
        if (lo > hi)  /* referring here to the implicit constructor parameters */
            throw new IllegalArgumentException(String.format("(%d,%d)", lo, hi));
        this.lo = lo;
        this.hi = hi;
    }
}

Considering my Java::Geci code generation project this was something that was screaming for a code generator to bridge the gap between today and the day when the new feature will be available on all production platforms.

Thus I started to think about how to develop this generator and I faced a few issues. The Java::Geci framework can only convert a compilable project to another compilable project. It cannot work like some other code generators that convert an incomplete source code, which cannot be compiled without the modifications of the code generator, to a complete version. This is because Java::Geci works during the test phase. To get to the test phase the code has to compile first. This is a well-known trade-off and was a design decision. In most of the cases when Java::Geci is useful this is something easy to cope with. On the other hand, we gain the advantage that the generators do not need configuration management like reading and interpreting property or XML files. They only provide an API and the code invoking them from a test configure the generators through it. The most advantage is that you can even provide call-backs in forms of method references, lambdas or object instances that are invoked by the generators so that these generators can have a totally open structure in some aspects of their working.

Why is this important in this case? The record generation is fairly simple and does not need any complex configuration, as a matter of fact, it does not need any configuration at all. On the other hand, the compilable -&gt; compilable restrictions are affecting it. If you start to create a record using, say Java 8 and Java::Geci then your manual code will look something like this:

@Geci("record")
public class Range {

    final int lo;
    final int hi;
}

This does not compile, because by the time of the first compilation before the code generation starts the default constructor does not initialize the fields. Therefore the fields cannot be final:

@Geci("record")
public class Range {

    int lo;
    int hi;
}

Running the generator we will get

package javax0.geci.tests.record;

import javax0.geci.annotations.Geci;

@Geci("record")
public final class Range {
    final  int  lo;
    final  int  hi;

    //<editor-fold id="record">
    public Range(final int lo, final int hi) {
        this.lo = lo;
        this.hi = hi;
    }

    public int getLo() {
        return lo;
    }

    public int getHi() {
        return hi;
    }

    @Override
    public int hashCode() {
        return java.util.Objects.hash(lo, hi);
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Range that = (Range) o;
        return java.util.Objects.equals(that.lo, lo) && java.util.Objects.equals(that.hi, hi);
    }
    //</editor-fold>
}

what this generator actually does is that

  • it generates the constructor
  • converts the class and the fields to final as it is a requirement by the JEP
  • generates the getters for the fields
  • generates the equals() and hashCode() methods for the class

If the class has a void method that has the same (though case insensitive) name as the class, for example:

    public void Range(double hi, long lo) {
        if (lo > hi)  /* referring here to the implicit constructor parameters */
            throw new IllegalArgumentException(String.format("(%d,%d)", lo, hi));
    }

then the generator will

  • invoke that method from the generated constructor,
  • modify the argument list of the method to match the current list of fields.
    public void Range(final int lo, final int hi) {
        if (lo > hi)  /* referring here to the implicit constructor parameters */
            throw new IllegalArgumentException(String.format("(%d,%d)", lo, hi));
    }

    //<editor-fold id="record">
    public Range(final int lo, final int hi) {
        Range(lo, hi);
        this.lo = lo;
        this.hi = hi;
    }

Note that this generation approach tries to behave the possible closest to the actual record as proposed in the JEP and generates code that can be converted to the new syntax as soon as it is available. This is the reason why the validator method has to have the same name as the class. When converting to a real record all that has to be done is to remove the void keyword converting the method to be a constructor, remove the argument list as it will be implicit as defined in the JEP and remove all the generated code between the editor folds (also automatically generated when the generator was executed first).

The modification of the manually entered code is a new feature of Java::Geci that was triggered by the need of the Record generator and was developed to overcome the shortcomings of the compilable -&gt; compilable restriction. How a generator can use this feature that will be available in the next 1.3.0 release of Java::Geci will be detailed in a subsequent article.

Takeaway

The takeaway of this article is that you can use Java records with Java 8, 9, … even before it becomes available.

Handling repeated code automatically

In this article I will describe how you can use Java::Geci generator Repeated to overcome the Java language shortage that generics cannot be primitive. The example is a suggested extension of the Apache Commons Lang library.

Introduction

When you copy-paste code you do something wrong. At least that is the perception. You have to create your code structure more generalized so that you can use different parameters instead of similar code many times.

This is not always the case. Sometimes you have to repeat some code because the language you use does not (yet) support the functionality that would be required for the problem.

This is too abstract. Let’s have a look at a specific example and how we can manage it using the Repeated source generator, which runs inside the Java::Geci framework.

The problem

The class org.apache.commons.lang3.Functions in the Apache Commons Lang library defines an inner interface FailableFunction. This is a generic interface defined as

    @FunctionalInterface
    public interface FailableFunction<I, O, T extends Throwable> {
        /**
         * Apply the function.
         * @param pInput the input for the function
         * @return the result of the function
         * @throws T if the function fails
         */
        O apply(I pInput) throws T;
    }

This is essentially the same as Function<I,O>, which converts an I to an O but since the interface is failable, it can also throw an exception of type T.

The new need is to have

    public interface Failable<I>Function<O, T extends Throwable> 

itnerfaces for each <I> primitive values. The problem is that the generics cannot be primitive (yet) in Java, and thus we should separate interfaces for each primitive types, as

    @FunctionalInterface
    public interface FailableCharFunction<O, T extends Throwable> {
        O apply(char pInput) throws T;
    }
    @FunctionalInterface
    public interface FailableByteFunction<O, T extends Throwable> {
        O apply(byte pInput) throws T;
    }
    @FunctionalInterface
    public interface FailableShortFunction<O, T extends Throwable> {
        O apply(short pInput) throws T;
    }
    @FunctionalInterface
    public interface FailableIntFunction<O, T extends Throwable> {
        O apply(int pInput) throws T;
    }
... and so on ...

This is a lot of very similar methods that could easily be described by a template and then been generated by some code generation tool.

Template handling using Java::Geci

The Java::Geci framework comes with many off-the-shelf generators. One of them is the powerful Repeated generator, which is exactly for this purpose. If there is a code that has to be repeated with possible parameters then you can define a template, the values and Repeated will generate the code resolving the template parameters.

Adding dependency to the POM

The first thing we have to do is to add the Java::Geci dependencies to the pom.xml file. Since Apache Commons Language is still Java 8 based we have to use the Java 8 backport of Java::Geci 1.2.0:

    <dependency>
      <groupId>com.javax1.geci</groupId>
      <artifactId>javageci-core</artifactId>
      <version>1.2.0</version>
      <scope>test</scope>
    </dependency>

Note that the scope of the dependency is test. The generator Repeated can conveniently be used without any Geci annotations that remain in the byte code and thus are compile-time dependencies. As a matter of fact, all of the generators can be used without annotations thus without any compile dependencies that would be an extra dependency for the production. In the case of Repeated this is even easy to do.

Unit test to run the generator

The second thing we have to do is to create a unit test that will execute the generator. Java::Geci generators run during the unit test phase, so they can access the already compiled code using reflection as well as the actual source code. In case there is any code generated that is different from what was already there in the source file the test will fail and the build process should be executed again. Since generators are (should be) idempotent the test should not fail the second time.

As I experience, this workflow has an effect on the developer behavior, unfortunately. Run the test/ fails, run again! It is a bad cycle. Sometimes I happen to catch myself re-executing the unit tests when it was not a code generator that failed. However, this is how Java::Geci works.

There are articles about the Java::Geci workflow

so I will not repeat here the overall architecture and how its workflow goes.

The unit tests will be the following:

    @Test
    void generatePrimitiveFailables() throws Exception {
        final Geci geci = new Geci();
        Assertions.assertFalse(geci.source(Source.maven().mainSource())
                .only("Functions")
                .register(Repeated.builder()
                    .values("char,byte,short,int,long,float,double,boolean")
                    .selector("repeated")
                    .define((ctx, s) -> ctx.segment().param("Value", CaseTools.ucase(s)))
                    .build())
                .generate(),
            geci.failed());
    }

The calls source(), register() and only() configure the framework. This configuration tells the framework to use the source files that are in the main Java src directory of the project and to use only the file names "Functions". The call to register() registers the Repeated generator instance right before we call generate() that starts the code generation.

The generator instance itself is created using the built-in builder that lets us configure the generator. In this case, the call to values() defines the comma-separated list of values with which we want to repeat the template (defined later in the code in a comment). The call to selector() defines the identifier for this code repeated code. A single source file may contain several templates. Each template can be processed with a different list of values and the result will be inserted into different output segments into the source file. In this case there is only one such code generation template, still, it has to be identified with a name and this name has also to be used in the editor-fold section which is the placeholder for the generated code.

The actual use of the name of the generator has two effects. One is that it identifies the editor fold segment and the template. The other one is that the framework will see the editor-fold segment with this identifier and it will recognize that this source file needs the attention of this generator. The other possibility would be to add the @Repeated or @Geci("repeated") annotation to the class.

If the identifier were something else and not repeated then the source code would not be touched by the generator Repeated or we would need another segment identified as repeated, which would not actually be used other than trigger the generator.

The call to define() defines a BiConsumer that gets a context reference and an actual value. In this case, the BiConsumer calculates the capitalized value and puts it into the actual segment parameter set associated with the name Value. The actual value is associated with the name value by default and the BiConsumer passed to the method define() can define and register other parameters. In this case, it will add new values as

value       Value

char    --> Char
byte    --> Byte
short   --> Short
int     --> Int
long    --> Long
float   --> Float
double  --> Double
boolean --> Boolean

Source Code

The third thing is to prepare the template and the output segment in the source file.

The output segment preparation is extremely simple. It is only an editor fold:

    //<editor-fold id="repeated">
    //</editor-fold>

The generated code will automatically be inserted between the two lines and the editors (Eclipse, IntelliJ or NetBeans) will allow you to close the fold. You do not want to edit this code: it is generated.

The template will look like the following:

    /* TEMPLATE repeated
    @FunctionalInterface
    public interface Failable{{Value}}Function<O, T extends Throwable> {
        O apply({{value}} pInput) throws T;
    }
    */

The code generator finds the start of the template looking for lines that match the /* TEMPLATE name format and collect the consecutive lines till the end of the comment.

The template uses the mustache template placeholder format, namely the name of the values enclosed between double braces. Double braces are rare in Java.

When we run the unit test it will generate the code that I already listed at the start of the article. (And after that it will fail of course: source code was modified, compile it again.)

Summary and Takeaway

The most important takeaway and WARNING: source code generation is a tool that aims to amend shortages of the programming language. Do not use code generations to amend a shortage that is not of the language but rather your experience, skill or knowledge about the language. The easy way to code generation is not an excuse to generate unnecessarily redundant code.

Another takeaway is that it is extremely easy to use this generator in Java. The functionality is comparable to the C preprocessor that Java does not have and for good. Use it when it is needed. Even though the setup of the dependencies and the unit test may be a small overhead later the maintainability usually pays this cost back.

Your code is redundant, live with it!

This article is about necessary and unavoidable code redundancy and discusses a model of code redundancy that helps to understand why source code generators do what they do, why they are needed at all.

Intro

The code you write in Java, or for that matter in any other language, is redundant. Not by the definition that says (per Wikipedia page https://en.wikipedia.org/wiki/Redundant_code):

In computer programming, redundant code is source code or compiled code in a computer program that is unnecessary, such as…

Your code may also be redundant this way, but that is a different kind of story than I want to talk here and now. If it is, then fix it, and improve your coding skills. But this probably is not the case because you are a good programmer. The redundancy that is certainly in your code is not necessarily unnecessary. There are different sources of redundancy and some redundancies are necessary, others are unnecessary but unavoidable.

The actual definition of redundancy we need, in this case, is more like the information theory definition of redundancy (per the Wikipedia page https://en.wikipedia.org/wiki/Redundancy_(information_theory))

In Information theory, redundancy measures the fractional difference between the entropy H(X) of an ensemble X, and its maximum possible value log(|A_X|)

UPPPS… DO NOT STOP READING!!!

This is a very precise, but highly unusable definition for us. Luckily the page continues and says:

Informally, it is the amount of wasted “space” used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy.

In other words, some information encoded in some form is redundant if it can be compressed.

For example, downloading and zipping the text of the classical English novel Moby Dick will shrink its size down to 40% of the original text. Doing the same with the source code of Apache Commons Lang we get 20%. It is definitely NOT because of this “code in a computer program that is unnecessary”. This is some other “necessary” redundancy. English and other languages are redundant, programming languages are redundant and this is the way it is.

If we analyze this kind of redundancy we can see that there are six levels of redundancy. What I will write here about the six layers is not well-known or well-established theory. Feel free to challenge it.

This model and categorization are useful to establish a way of thinking about code generation when to generate code, why to generate code. After all, I came up with this model when I was thinking about the Java::Geci framework and I was thinking about why I invested a year of hobby time into this when there are so many other code generation tools. This redundancy model kind of gives the correct reason that I was only feeling before.

Levels of Redundancy

Then the next question is if these (English and programming language) are the only reasons for redundancy. The answer is that we can identify six different levels of redundancy including those already mentioned.

0 Natural

This is the redundancy of the English language or just any other natural language. This redundancy is natural and we got used to it. The redundancy evolved with the language and it was needed to help the understanding a noisy environment. We do not want to eliminate this redundancy, because if we did we may end up reading some binary code. For most of us, this is not really appealing. This is how human and programmer brain works.

1 Language

The programming language is also redundant. It is even more redundant than the natural language it is built on. The extra redundancy is because the number of keywords is very limited. That makes the compression ration from 60% percent up to 80% in the case of Java. Other languages, like Perl, are denser and alas they are less readable. However, this is also a redundancy that we do not want to fight. Decreasing the redundancy coming from the programming language redundancy certainly would decrease readability and thus maintainability.

2 Structural

There is another source of redundancy that is already independent of the language. This is the code structure redundancy. For example when we have a method that has one argument then the code fragments that call this method should also use one argument. If the method changes for more arguments then all the places that call the method also have to change. This is a redundancy that comes from the program structure and this is not only something that we do not want to avoid, but it is also not possible to avoid without losing information and that way code structure.

3 Domain induced

We talk about domain induced redundancy when the business domain can be described in a clear and concise manner but the programming language does not support such a description. A good example can be a compiler. This example is in a technical domain that most programmers are familiar with. A context-free syntax grammar can be written in a clear and nice form using BNF format. If we create the parser in Java it certainly will be longer. Since the BNF form and the Java code mean the same and the Java code is significantly longer we can be sure that the Java code is redundant from the information theory point of view. That is the reason why we have tools for this example domain, like ANTLR, Yacc and Lex and a few other tools.

Another example is the Fluent API. A fluent API can be programmed implementing several interfaces that guide the programmer through the possible sequences of chained method calls. It is a long and hard to maintain way to code a fluent API. The same time a fluent API grammar can be neatly described with a regular expression because fluent APIs are described by finite-state grammars. The regular expression listing the methods describing alternatives, sequences, optional calls, and repetitions is more readable and shorter and less redundant than the Java implementation of the same. That is the reason why we have tools like Java::Geci Fluent API generators that convert a regular expression of method calls to fluent API implementation.

This is an area where decreasing the redundancy can be desirable and may result in easier to maintain and more readable code.

4 Language evolution

Language evolution redundancy is similar to the domain induced redundancy but it is independent of the actual programming domain. The source of this redundancy is a weakness of the programming language. For example, Java does not automatically provide getters and setters for fields. If you look at C# or Swift, they do. If we need them in Java, we have to write the code for it. It is boilerplate code and it is a weakness in the language. Also, in Java, there is no declarative way to define equals() and hashCode() methods. There may be a later version of Java that will provide something for that issue. Looking at past versions of Java it was certainly more redundant to create an anonymous class than writing a lambda expression. Java evolved and this was introduced into the language.

Language evolution is always a sensitive issue. Some languages run fast and introduce new features. Other languages, like Java, are more relaxed or, we can say conservative. As Brian Goetz wrote in response to a tweet that was urging new features:

“It depends. Would you rather get the wrong feature sooner, but have to live with it forever?”

@BrianGoetz Replying to @joonaslehtinen and @java 10:43 PM · Sep 16, 2019

The major difference between domain induced redundancy and language evolution caused redundancy is that while it is impossible to address all programming domains in a general-purpose programming language, the language evolution will certainly eliminate the redundancy enforced by language shortages. While the language evolves we have code generators in the IDEs and in programs like Lombok that address these issues.

5 Programmer induced

This kind of redundancy correlates with the classical meaning of code redundancy. This is when the programmer cannot generate good enough code and there are unnecessary and excessive code structures or even copy-paste code in the program. The typical example is the before mentioned “Legend of the sub-par developer”. In this case, code generation can be a compromise but it is usually a bad choice. On a high level, from the project manager point of view, it may be okay. They care about the cost of the developers and they may decide to hire only cheaper developers. On the programmer level, on the other hand, this is not acceptable. If you have the choice to generate code or write better code you have to choose the latter. You must learn and develop yourself so that you can develop better code.

Outro

… or takeaway.

When I first started to write about the Java::Geci framework, somebody commented “why another code generation tool”? And the question is certainly valid. There are many tools like that as mentioned in the article.

However, if we look at the code redundancy categorization then what we can see is that Java::Geci can be used to manage the Domain Induced redundancy and perhaps the Language Evolution caused redundancy. In the case of the latter, there are many concurrent programs, and Java::Geci cannot compete, for example with the ease of use of the IDE built-in code generation.

There are many generators that address some specific domains and manage the extra redundancy using code generation. Java::Geci is the only one to my knowledge that provides a general framework that makes the domain-specific code generator creation simple.

To recognize that the real use case is for domain-specific generators the above redundancy model helps a lot.

Test coverage decreased and it is good (short, read)

Synchronicity concept of Carl Gustav Jung says that events are “meaningful coincidences” if they occur with no causal relationship yet seem to be meaningfully related. Such a thing happened to me recently related to some pull requests. I was working on a FOSS project and I created a pull request that was refused by the CI server with the reason that a pull request that decreases the test coverage level cannot be merged. I knew why the code coverage percentage decreased and I knew that this not only was not bad but actually, it was good. I could convince the maintainers to skip this condition in this case. A few days later a junior developer told me that his pull request was refused in a totally unrelated project with the same reason. He explained to the lead developers why it was OK to decrease the code coverage, but in the end, they asked him to create some new tests. He is junior. Happening the same thing in two consecutive days made me feel that this may be meaningful and perhaps worth writing about it.

But how can that happen that the code coverage decreases and it is good?

Assume that you have a simple program, that has 100 LOC (lines of code). 50 LOC are covered by tests and the other 50 LOC are not. The code coverage is 50%.

You modify the code and refactor a method, which is originally 20 LOC, 100% covered by tests and the result is 10 LOC, 100% covered by the original tests. It is just that the old code was badly designed and redundant (level 5, Programmer induced redundancy). Now the coverage is 100%* 40/90 = 44.44%.

Is this a problem? The sheer number 44.44% by itself is actually a problem, just as well as the 50% before the refactoring was a problem. However, the fact that the code was made simpler and shorter and because of that the coverage decreased is definitely not a problem.

Should you delete this rule from the CI server build process, namely that a pull request must not decrease the relative test code coverage? Certainly not. There are many more cases when a lazy or just not careful enough developer misses some tests than the case that I described above. The decreasing coverage is a good indicator that the pull request may not be of superb quality. There are exceptions though and those have to be handled.

Should you command a junior in case of such a pull request to write some more tests that increase the coverage although totally unrelated to the actual change? I do not know. I certainly would not do that. I would accept the pull request making an exception and then I would ask the junior to create some more tests if that is needed. But these are totally unrelated. On second thought though, it may be a good idea to refuse the pull request. After all, juniors have to be educated not only about coding and programming but also about how the real-life with real-jerk seniors works.

Tools to keep JavaDoc up-to-date

There are many projects where the documentation is not up-to-date. It is easy to forget to change the documentation after the code was changed. The reason is fairly understandable. There is a change in the code, then debug, then hopefully change in the tests (or the other way around in the reverse order if you are more TDD) and then the joy of a new functioning version and the happiness about the new release makes you forget to perform the cumbersome task updating the documentation.

In this article, I will show an example, how to ease the process and ensure that the documentation is at least more up-to-date.

The tool

The tool I use in this article is Java::Geci, which is a code generation framework. The original design aim of Java::Geci is to provide a framework in which, it is extremely easy to write code generators that inject code into already existing Java source code or generate new Java source files. Hence the name: GEnerate Code Inline or GEnerate Code, Inject.

What does a code generation support tool do when we talk about documentation?

On the highest level of the framework, the source code is just a text file. Documentation, like JavaDoc, is text. Documentation in the source directory structure, like markdown files, is text. Copying and transforming parts of the text to other location is a special form of code generation. This is exactly what we will do.

Two uses for documentation

There are several ways Java::Geci supports documentation. I will describe one of these in this article.

The way is to locate some lines in the unit tests and copy the content after possible transformation into the JavaDoc. I will demonstrate this using a sample from the apache.commons.lang project current master version after release 3.9. This project is fairly well documented although there is room for improvement. This improvement has to be performed with as little human effort as possible. (Not because we are lazy, but rather because the human effort is error-prone.)

It is important to understand that Java::Geci is not a preprocessing tool. The code gets into the actual source code and it gets updated. Java::Geci does not eliminate the redundancy of copy-paste code and text. It manages it and ensures that the code remains copied and created over and over again whenever something inducing change in the result happens.

How Java::Geci works in general

If you have already heard about Java::Geci you can skip this chapter. For the others here is the brief structure of the framework.

Java::Geci generates code when the unit tests run. Java::Geci actually runs as one or more unit tests. There is a fluent API to configure the framework. This essentially means that a unit test that runs generators is one single assertion statement that creates a new Geci object, calls the configuration methods and then calls generate(). This method, generate() returns true when it has generated something. If all the code it generated is exactly the same as it was already in the source files then it returns false. Using an Assertion.assertFalse around it will fail the test in case there was any change in the source code. Just run the compilation and the tests again.

The framework collects all the files that were configured to be collected and invokes the configured and registered code generators. The code generators work with abstract Source and Segment objects that represent the source files and the lines in the source files that may be overwritten by generated code. When all the generators have finished their work the framework collects all segments, inserts them into Source objects and if any of them significantly changed then it updates the file.

Finally, the framework returns to the unit test code that started it. The return value is true if there was any source code file updated and false otherwise.

Examples into JavaDoc

The JavaDoc example is to automatically include examples into the documentation of the method org.apache.commons.lang3.ClassUtils.getAbbreviatedName() in the Apache Commons Lang3 library. The documentation currently in the master branch is:

/**
*

Gets the abbreviated class name from a {@code String}.

*
*

The string passed in is assumed to be a class name - it is not checked.

*
*

The abbreviation algorithm will shorten the class name, usually without
* significant loss of meaning.

*

The abbreviated class name will always include the complete package hierarchy.
* If enough space is available, rightmost sub-packages will be displayed in full
* length.

*
*

**
*
*
*
*
*
<table><caption>Examples</caption>
<tbody>
<tr>
<td>className</td>
<td>len</td>
<td>return</td>
<td>null</td>
<td>1</td>
<td>""</td>
<td>"java.lang.String"</td>
<td>5</td>
<td>"j.l.String"</td>
<td>"java.lang.String"</td>
<td>15</td>
<td>"j.lang.String"</td>
<td>"java.lang.String"</td>
<td>30</td>
<td>"java.lang.String"</td>
</tr>
</tbody>
</table>
* @param className the className to get the abbreviated name for, may be {@code null}
* @param len the desired length of the abbreviated name
* @return the abbreviated name or an empty string
* @throws IllegalArgumentException if len <= 0
* @since 3.4
*/

The problem we want to solve is to automatize the maintenance of the examples. To do that with Java::Geci we have to do three things:

  1. Add Java::Geci as a dependency to the project
  2. Create a unit test that runs the framework
  3. Mark the part in the unit test, which is the source of the information
  4. replace the manually copied examples text with a Java::Geci `Segment` so that Java::Geci will automatically copy the text from the test there

Dependency

Java::Geci is in the Maven Central repository. The current release is 1.2.0. It has to be added to the project as a test dependency. There is no dependency for the final LANG library just as there is no dependency on JUnit or anything else used for the development. There are two explicit dependencies that have to be added:

com.javax0.geci
javageci-docugen
1.2.0
test


com.javax0.geci
javageci-core
1.2.0
test

The artifact javageci-docugen contains the document handling generators. The artifact javageci-core contains the core generators. This artifact also bring the javageci-engine and javageci-api artifacts. The engine is the framework itself, the API is, well the API.

Unit test

The second change is a new file, org.apache.commons.lang3.docugen.UpdateJavaDocTest. This file is a simple and very conventional Unit tests:

/*
* Licensed to the Apache Software Foundation (ASF) ...
*/
package org.apache.commons.lang3.docugen;

import *;

public class UpdateJavaDocTest {

@Test
void testUpdateJavaDocFromUnitTests() throws Exception {
final Geci geci = new Geci();
int i = 0;
Assertions.assertFalse(geci.source(Source.maven())
.register(SnippetCollector.builder().files("\\.java$").phase(i++).build())
.register(SnippetAppender.builder().files("\\.java$").phase(i++).build())
.register(SnippetRegex.builder().files("\\.java$").phase(i++).build())
.register(SnippetTrim.builder().files("\\.java$").phase(i++).build())
.register(SnippetNumberer.builder().files("\\.java$").phase(i++).build())
.register(SnipetLineSkipper.builder().files("\\.java$").phase(i++).build())
.register(MarkdownCodeInserter.builder().files("\\.java$").phase(i++).build())
.splitHelper("java", new MarkdownSegmentSplitHelper())
.comparator((orig, gen) -> !orig.equals(gen))
.generate(),
geci.failed());
}

}

What we can see here is huge Assertions.assertFalse call. First, we create a new Geci object and then we tell it where the source files are. Without getting into the details, there are many different ways how the user can specify where the sources are. In this example, we just say that the source files are where they usually are when we use Maven as a build tool.

The next thing we do is that we register the different generators. Generators, especially code generators usually run independent and thus the framework does not guarantee the execution order. In this case, these generators, as we will see later, very much depend on the actions of each other. It is important to have them executed in the correct order. The framework let us achieve this via phases. The generators are asked how many phases they need and in each phase, they are also queried if they need to be invoked or not. Each generator object is created using a builder pattern and in this, each is told which phase it should run. When a generator is configured to run in phase i (calling .phase(i)) then it will tell the framework that it will need at least i phases and for phases 1..i-1 it will be inactive. This way the configuration guarantees that the generators run in the following order:

  1. SnippetCollector
  2. SnippetAppender
  3. SnippetRegex
  4. SnippetTrim
  5. SnippetNumberer
  6. SnipetLineSkipper
  7. MarkdownCodeInserter

Technically all these are generators, but they do not “generate” code. The SnippetCollector collects the snippets from the source files. SnippetAppender can append multiple snippets together, when some sample code needs the text from different parts of the program. SnippetRegex can modify the snippets before using regular expressions and replaceAll functionality (we will see that in this example). SnippetTrim can remove the leading tabs and spaces from the start of the lines. This is important when the code is deeply tabulated. In this case, simply importing the snipped into the documentation could easily push the actual characters off of the printable area on the right side. SnippetNumberer can number snippet lines in case we have some code where the documentation refers to certain lines. SnipetLineSkipper can skip certain lines from the code. For example, you can configure it so that the import statements will be skipped.

Finally, the real “generator” that may alter the source code is MarkdownCodeInserter. It was created to insert the snippets into the Markdown-formatted files, but it works just as well for Java source files when the text needs to be inserted into a JavaDoc part.

The last two but one configuration calls tell the framework to use the MarkdownSegmentSplitHelper and to compare the original lines and those that were created after the code generation using a simple equals. SegmentSplitHelper objects help the framework to find the segments in the source code. In Java files, the segments are usually and by default between

//

and

//

lines. This helps to separate the manual and the generated code. The editor-fold is also collapsible in all advanced editor so you can focus on the manually created code.

In this case, however, we insert into segments that are inside JavaDoc comments. These JavaDoc comments more like Markdown than Java in the sense that they may contain some markup but also HTML friendly. Very specifically, they may contain XML comments that will not appear in the output document. The segment start in this case, as defined by the MarkdownSegmentSplitHelper object is between

<!-- snip snipName parameters ... -->

and

<!-- end snip -->

lines.

The comparator has to be specified for a very specific reason. The framework has two comparators built-in. One is the default comparator that compares the lines one by one and character by character. This is used for all file types except Java. In the case of Java, there is a special comparator used, which recognizes when only a comment was changed or when the code was only reformatted. In this case, we are changing the content of the comment in a Java file, so we need to tell the framework to use the simple comparator or else it will not relaize we updated anything. (It took 30 minutes to debug why it was not updating the files first.)

The final call is to generate() that starts the whole process.

Mark the code

The unit test code that documents this method is org.apache.commons.lang3.ClassUtilsTest.test_getAbbreviatedName_Class(). This should look like the following:

@Test
public void test_getAbbreviatedName_Class() {
// snippet test_getAbbreviatedName_Class
assertEquals("", ClassUtils.getAbbreviatedName((Class<?>) null, 1));
assertEquals("j.l.String", ClassUtils.getAbbreviatedName(String.class, 1));
assertEquals("j.l.String", ClassUtils.getAbbreviatedName(String.class, 5));
assertEquals("j.lang.String", ClassUtils.getAbbreviatedName(String.class, 13));
assertEquals("j.lang.String", ClassUtils.getAbbreviatedName(String.class, 15));
assertEquals("java.lang.String", ClassUtils.getAbbreviatedName(String.class, 20));
// end snippet
}

I will not present here the original, because the only difference is that the two snippet ... and end snippet lines were inserted. These are the triggers for the SnippetCollector to collect the lines between them and store them in the “snippet store” (nothing mysterious, practically a big hash map).

Define a segment

The really interesting part is how the JavaDoc is modified. At the start of the article, I already presented the whole code as it is today. The new version is:

/**
* Gets the abbreviated class name from a {@code String}.
*
* The string passed in is assumed to be a class name - it is not checked.
*
* The abbreviation algorithm will shorten the class name, usually without
* significant loss of meaning.
* The abbreviated class name will always include the complete package hierarchy.
* If enough space is available, rightmost sub-packages will be displayed in full
* length.
*
*
*
* you can write manually anything here, the code generator will update it when you start it up
*
<table><caption>Examples</caption>
<tbody>
<tr>
<td>className</td>
<td>len</td>
<td>return</td>
<!-- snip test_getAbbreviatedName_Class regex="
replace='/~s*assertEquals~((.*?)~s*,~s*ClassUtils~.getAbbreviatedName~((.*?)~s*,~s*(~d+)~)~);/*
</tr><tr>
<td>{@code $2}</td>
<td>$3</td>
<td>{@code $1}</td>
</tr>
/' escape='~'" --><!-- end snip -->
</tbody>
</table>
* @param className the className to get the abbreviated name for, may be {@code null}
* @param len the desired length of the abbreviated name
* @return the abbreviated name or an empty string
* @throws IllegalArgumentException if len <= 0
* @since 3.4
*/

The important part is where the lines 15…20 are. (You see, sometimes it is important to number the snippet lines.) The line 15 signals the segment start. The name of the segment is test_getAbbreviatedName_Class and when there is nothing else defines it will also be used as the name of the snippet to insert into. However, before the snippet gets inserted it is transformed by the SnippetRegex generator. It will replace every match of the regular expression

\s*assertEquals\((.*?)\s*,\s*ClassUtils\.getAbbreviatedName\((.*?)\s*,\s*(\d+)\)\);

with the string

*
{@code $2}$3{@code $1}

Since these regular expressions are inside a string that is also inside a string we would need \\\\ instead of a single \. That would make our regular expressions look awful. Therefore the generator SnippetRegex can be configured to use some other character of our choice, which is less fence-phenomenon prone. In this example, we use the tilde character and it usually works. What it finally results when we run it is:

<!-- snip test_getAbbreviatedName_Class regex="
replace='/~s*assertEquals~((.*?)~s*,~s*ClassUtils~.getAbbreviatedName~((.*?)~s*,~s*(~d+)~)~);/*
<tr>
<td>{@code $2}</td>
<td>$3</td>
<td>{@code $1}</td>
</tr>
/' escape='~'" -->
*
{@code (Class) null}1{@code ""}

*
{@code String.class}1{@code "j.l.String"}

*
{@code String.class}5{@code "j.l.String"}

*
{@code String.class}13{@code "j.lang.String"}

*
{@code String.class}15{@code "j.lang.String"}

*
{@code String.class}20{@code "java.lang.String"}

<!-- end snip -->

Summary / Takeaway

Document updating can be automatized. At first, it is a bit cumbersome. Instead of copying and reformatting the text the developer has to set up a new unit test, mark the snippet, mark the segment, fabricate the transformation using regular expressions. However, when it is done any update is automatic. It is not possible to forget to update the documentation after the unit tests changed.

This is the same approach that we follow when we create unit tests. At first, it is a bit cumbersome to create unit tests instead of just debugging and running the code in an ad-hoc way and see if it really behaves as we expected, looking at the debugger. However, when it is done any update is automatically checked. It is not possible to forget to check an old functionality when the code affecting that changes.

In my opinion documentation maintenance should be as automatized as testing. Generally: anything that can be automatized in software development has to be automatized to save effort and to reduce the errors.

ValueTypes video, must see for juniors

JAX-TV uploaded the recording of my talk I presented this year in Mainz at W-JAX

The topic is Java ValueTypes. The talk, however, starts with the topic of how the CPU feels the time. What are the different ratios between the different time intervals when we read something from the CPU cache, memory, SSD or even over the network? This is a knowledge needed to understand the importance of value types.

The talk uses the same metaphor (the CPU as a bureaucrat and the clock as the heartbeat), which I also used in my book. It is very graphic and easy to remember, therefore I think this is a good listen for every junior. Even perhaps for some of the seniors.

Java Projects: Book Review

This article is about the book

Java Projects Second Edition, by Peter Verhas

that I wrote last year. The aim of such an article is usually to boost the sales of the book. It is no different in this case, but since this is a book that I wrote, and I am the person, who is writing the review it would be extremely awkward to praise the book. So I will not, although I like this book a lot. I think loving your own product, at least at the time when it is ready is a must. You may think about it differently later like I do about the previous edition of the same book, which could have been better. But then again, that is why there is the second edition in addition to the fact that Java developed in the meantime and it became Java 11 from Java 9. But back to the previous thought: you have to love your product when it is finished otherwise you can just throw it away. If you do not like it no-one else will. What is also important that you also have to love your work while you are working on it. And I did and I enjoyed creating this book.

Thus now I will write about the book, what it is, and what I intended it to be. Later in the article, I will also talk about how I was working on the book, some technicalities, and some secrets. (They are not that much of a secret if I publish them here, are they.) But before those, have the URLs here, where you can buy my book at PACKT, or Amazon, etc.

Intended Audience and Content of the Book

In agreement with the publisher, I wanted to write a book for those, who want to learn Java but already have some programming experience. I did not want to write about the simple notion of variables, loops, conditional constructs. I wanted to write a book that teaches you Java and a bit of programming. I wanted a book that any PHP, Python, C#, C, C++, Go, etc. programmer fresh out of uni can read and learn some Java programming and they can decide if it is for them or not. I wanted to dedicate the last chapter to non-java programming topics, like what can happen later in your career if you start as a programmer. You can remain a programmer, or become an architect, project manager, devops engineer. There are many possibilities based upon opportunities and interest. This intention was met with less agreement from the publisher side, but they accepted that my hands are the one that hit the keyboard and we got to a compromise. So the last chapter is also about some technical topics, like Java agent, polyglot programming, annotation processing, DSL, SDLC and so on.

Content of the book

The book has ten chapters in a bit more than 500 pages.

  • Chapter ONE

is how you get started. To start you need to install the Java environment and you have to get familiar with the command line tools. This is a bit cumbersome and in the case of Java, it is more complex than it is with other languages. I have some friends who started to learn Java using this book and struggled with it (not because of the book, but because of the complexity of the task). When you start learning Java you have to be patient at this point and you must have a strong belief that it will work.

  • Chapter TWO

is about the supporting tools and about the basic language elements. Even though the book is for those who can already program in some programming language, the text has to describe the basic elements of the language like variables, classes, methods, types, expressions, loops and so on. You can see how complex Java is so that it is already the second chapter we are in and we just start the language.

  • Chapter THREE

is where we start programming something more complex than a “Hello, World”. The program is a sorting program and we implement not only the simple bubble sort but also quick sort. Along the way, we also touch topics like generics, TDD, unit tests, Java modules. These are advanced topics that originally were planned in later chapters, but I wanted to explain less the language and more programming along with the language.

  • Chapter FOUR

is a new program, and brings us new topics. In this chapter, we (I imagine the reader and I) develop the game Mastermind. The user, sitting in front of the computer “hides” the pins and the program finds out what is hidden. The same chapter talks about collections, dependency injection, and integration tests.

  • Chapter FIVE

is the one I am most proud of. It is about concurrent programming. Many books use an example that scales well. You run it on one processor and it runs. You run it on two processors and it runs twice as fast. In real life usually, tasks are not that independent. So I decided to make the Mastermind game concurrent. This needed some refactoring. Honestly: I did not realize that before I started to write chapter 5 and chapter 4 was already finished. I decided not to rewrite chapter 4 (although that would have been the smaller amount of work), rather I detailed in the chapter the coding decisions and how the code has to be refactored. This is only a part of a chapter that is already about a very complex topic, so do not expect a full-blown refactoring tutorial. If you need a good book about refactoring then read Martin Fowler’s Refactoring book.

In addition to that, the chapter details most of the concurrent programming tools: wait, notify, locks, queues. The chapter concludes with the introduction of microbenchmarking that shows how faster parallel programs run faster on many CPUs.

  • Chapter SIX

is about creating a simple web interface for the program. Because the main topic of the book is Java and not HTML, CSS and JavaScript, the front-end is fairly simple. On the other hand, the chapter focuses on IP, TCP, DNS, HTTP, and even HTTP/2. Then it goes on detailing the C/S architecture, mentions JavaServer Pages (a must is a must) and then we develop the code writing a servlet running with Jetty.

  • Chapter SEVEN

uses a new program and here we develop a REST service using Spring MVC, servlet filters, audit logging with AOP and we even discuss how dynamic proxies work.

  • Chapter EIGHT

extends the program and touches subjects like annotations, reflection, functional programming and scripting in Java.

  • Chapter NINE

is the last coding chapter. Here we create an “accounting” application using reactive interface. It is a bit awkward example but at the time I could not find anything better. Nevertheless, the principles of reactive programming and how to use the new reactive interfaces in Java are described in this chapter.

  • Chapter TEN

is the last chapter and that way it is the densest. It talks about topics that all developers should know about but hardly any developer will use. You, probably, will never create a Java agent or an annotation interface. But you should know what they are and that is why they are described here. There are also a few words about polyglot programming, which will be more and more prevalent. The majority of the chapter is about how programming in an enterprise setting works.

Motivation

My motivation was to create a programming book that will outlast the current version of Java. A book that teaches whoever reads it a bit of programming and helps them start to become a better programmer. Maybe my frustration meeting a lot of job interview candidates who had no clue in some of the very essential areas but who still thought they were senior developers was also a motivation factor.

Technicalities

At the start, I teased that I will tell you some secrets. Here they are.

Packt wanted me to write the book using Microsoft Word or using an online WordPress based WYSIWYG editor. WordPress has markup editing possibility, but this was switched off. I asked that they switch it on, but I was refused. So I decided to use Microsoft Word when I created the first edition of the book. The result was disastrous. The code samples copied from the actual source were reformatted during the editing process somewhere in the hands of the editors. Some of the formatting change made the code hard to read. Some of the changes were simply wrong, like removing all the spaces between the word int and the variable name n resulting intn.

When I started the second edition I decided to hack the system. By that time I was practicing a bit with Python and I created the Pyama project that can fetch code fragments from the source directories and it can insert it into Markdown files overriding the old versions. I also created a script that converted the special WordPress flavor HTML into Markdown and back. The first edition of my book was converted by Packt into this WordPress format.

When I opened a chapter with the WYSIWYG editor I pressed F12 to get to the debug mode and I used “edit HTML” on the WYSIWYG form to copy the HTML and paste it into a text file. I converted the input HTML to Markdown and I worked on the Markdown version. I like to work in a way that I edit the markup and at the same time, I can see the rendered page. When a chapter was ready I converted it back to HTML and I used the same debug mode to paste the code back. It worked. Packt did not know it.

Summary

I believe that I wrote a book, which can be used professionally to learn programming and a bit also Java 11. As I wrote at the start of the first chapter:

It is like going through a path in a forest. You can focus on the gravel of the road but it is pointless. Instead, you can enjoy the view, the trees, the birds, and the environment around you, which is more enjoyable. This book is similar as I won’t be focusing only on the language. From time to time, I will cover topics that are close to the road and will give you some overview and directions on where you can go further after you finish this book. I will not only teach you the language but also talk a bit about algorithms, object-oriented programming principles, tools that surround Java development, and how professionals work. This will be mixed with the coding examples that we will follow.