Category Archives: Java

Java hexadecimal floating point literal

How I met hexadecimal floating point numbers

I was developing a new functionality into Java::Geci to make it less prone to code reformatting. The current release of the code will overwrite an otherwise identical code if it was reformatted. It is annoying since it is fairly easy to press the reformatting key shortcut and many projects even require that developers set their editor to automatically format the code upon save. In those cases Java::Geci cannot be used because as soon as the code is reformatted the generator thinks that the code it generates is not the same as the one already in the source file, updates it and signals the change of the code failing the unit tests.

The solution I was crafting compares the Java source files first converting them to a list of lexical elements. That way you can even reformat the code inserting new-lines, spaces, etc. so long as long the code remains the same. To do that I needed a simplified lexical analyzer for Java. Writing a lexical analyzer is not a big deal, I created several for different reasons since I first read the Dragon Book in 1987. The only thing I really needed is the precise definition of what are the string, character, number literals, the keywords and so on. In short: what is the definition of the Java language on the lexical level and how is it processed. Fortunately, there is a precise definition for that, the Java Language Specification, which is not only precise but also readable and has examples. So I started to read the corresponding chapters.

To my bewilderment, I could see there that there is a possibility in the Java language to express a floating point in hexadecimal. Strange, is it? Since I have not ever seen it, first I thought that this was something new introduced in Java 12 but my investigation showed that it probably was introduced in Java 1.5 That was the very first Java version I really liked but not because of hexadecimal floating points. So this was how I met this beast in the standard face to face. I started to wonder if this beast can be found at all in the wild or is it only something that can be seen captive in the confinements of the text of the JLS. So…

I put up a vote on Twitter


As you can see nine decent humans answered the question, mostly saying that they have had no idea about this feature.

Probably hexadecimal floating points are the least known and used feature of the Java language right after lambdas and streams (just kidding… hexadecimal floating points are important, no?)

Even though I did some scientific study in the past, I cannot see any use of hexadecimal floating point literals.


What is a floating point number?

We will get to hexadecimal floating point numbers, but to understand that we have to know first what a floating point number, generally is.

Floating point numbers have a mantissa and exponent. The mantissa has an integer and a fractional part, like iii.ffff. The exponent is an integer number. For example, 31.415926E-1 is a floating point number and an approximation for the ratio of the diameter and the circumference of a circle.

Java internally stores the float numbers on 32 bit and double number on 64 bit. The actual bits are used according to the IEEE 754 standard.

That way the bits store a sign on a single bit, then the exponent on 8 or 11 bits and finally the mantissa on 23 or 52 bits for 32- or 64-bit float/double respectively. The mantissa is a fractional number with a value between 1 and 2. This could be represented with a bit stream, where the first bit means 1, the second means 1/2 and so on. However, because the number is always stored normalized and therefore the number is always between [1 and 2) the first bit is always 1. There is no need to store it. Tus the mantissa is stored so that the most significant bit means 1/2, the next 1/22 and so on but when we need the value we add to it 1.

The mantissa is unsigned (hence we have a separate signum bit). The exponent is also unsigned, but the actual number of bitshifts are calculated subtracting 127 or 1023 from the value to get a signed number. It specifies how many bits the mantissa should virtually be shifted to the left or right. Thus when we write 31.415926E-1f then the exponent will NOT be -1. That is the decimal format of the number.

The actual value is 01000000010010010000111111011010. Breaking it down:

  • 0 sign, the number is positive. So far so good.
  • 10000000 128, which means we have to shift the mantissa one bit left (multiply the value by two)
  • 10010010000111111011010 is 4788186/2^23+1 \approx 1.570796251296997. The hex representation of this bit stream is 0x490FDA

And here comes the

Hexadecimal floating point literal

We can write the same number in Java as 0x0.C90FDAP2f. This is the hexadecimal floating point representation of the same number.

The mantissa 0xC9aFDA should be familiar to the hexadecimal representation of the number above 0x490FDA. The difference is that the first character is C instead of 4. That is the extra one bit, which is always 1 and is not stored in the binary representation. C is 1100 while the original 4 is 0100. The exponent is the signed decimal representation of the actual bitshifts needed to push the number to the proper position.

The format of the literal is not trivial. First of all, you HAVE TO use the exponent part and the character for the exponent is p or P. This is a major difference from the decimal representation. (UPDATE: If the exponent was optional you could not tell if, for example, 0.55 is a decimal floating point or a hexadecimal floating point. A hexadecimal number can, by accident, contain only decimal characters and still be hexadecimal.)

After a little bit of thinking it becomes obvious that the exponent cannot be denoted using the conventional e or E since that character is a legitimate hexadecimal digit and it would be ambiguous in case of numbers like 0x2e3. Would this be a hexadecimal integer or 2\times 2^3. It is an integer because we use p and not e.

The reason why the exponent part is mandatory I can only guess. Because developers got used to decimal floating point numbers with e or E as exponent it would be very easy to misread 0xC90F.0e+3 as a single floating point number, even though in case of hexadecimal floating point p is required instead of e. If the exponent were not mandatory this example would be a legit sum of a floating point number and an integer. The same time it looks like a single number, and that would not be good.

The other interesting thing is that the exponent is decimal. This is also because some hexadecimal digits were already in use for other purposes. The float and double suffix. In case you want to denote that a literal is a float, you can append the f or F to the end. If you want to denote that this literal is double then you can append d or D to the end. This is the default, so appending D is optional. If the exponent were hexadecimal we would not know if 0x32.1P1f is a float literal or a double and having a lot of magnitudes different value. This way, that that exponent is decimal it is a float number (32+1/2)\times 2^1.

Java and IEEE 754

Java implemented the IEEE 754 standard strictly until Java 1.2 This standard defines not only the format of the numbers when stored in memory but also defines rules how calculations should be executed. After the Java release 1.2 (including 1.2) the standard was released to make the implementations more liberal allowing to use more bits to store intermediate results. This was and it still is available on the Intel CPU platforms and it is used heavily in numeric calculations in other languages like FORTRAN. This was a logical step to allow the implementations to use of this higher precision.

The same time to preserve backward compatibility the strictfp modifier was added to the language. When this modifier is used on a class, interface or method the floating point calculations in those codes will strictly follow the IEEE 754 standard.

Takeaway

  • There are hexadecimal floating point literals in Java. Remember it and also what strictfp is because somebody may ask you about it on a Java interview. No practical use in enterprise programming.
  • Do not use them unless it makes the code more readable. I can barely imagine any situation where this would be the case. So, simply put: do not use them just because you can.
  • Follow me on Twitter @verhas to get notification about new articles.

I think that is it, nothing more. By the time this article is published, I will probably be swimming across the lake of Zürich along with ten thousand people. This is a big event here.

Oh… and yes: if you have ever used hexadecimal floating point literals in Java to make it more readable, please share the knowledge in the comments. I dare say in the name of the readers: we are interested.

UPDATE: Joseph Darcy, (Engineer, OpenJDK developer at Oracle, marathoner, fast walker, occasional photographer, lots of other things.) provided feedback on Twitter. I copied his reply to here as it is absolutely valuable and adds value to this article for the benefit of the reader:

The mapping between decimal strings and particular settings of binary floating-point values is often non-obvious. Hexadecimal floating-point literals provide a straightforward text to binary fp mapping when needed, such as in tests. See https://blogs.oracle.com/darcy/hexadecimal-floating-point-literals

Advertisements

Inject-able only in test?

This article is about some thoughts of test design and testability. Some questions that we discussed with my son, who is a junior Java developer and currently is employed and studies at EPAM Hungary (the same company but a different subsidiary where I work). All the things in this article are good old knowledge, but still, you may find something interesting in it. If you are a junior then because of that. If you are a senior then you can get some ideas on how to explain these things. If neither: sorry.

Introduction to the problem

The task they had was some roulette program or some other game simulation code, they had to write. The output of the code was the amount of simulated money lost or won. The simulation used a random number generator. This generator caused a headache when it came to testing. (Yes, you are right: the very basis of the problem was lack of TDD.) The code behaved randomly. Sometimes the simulated player was winning the game, other times it was losing.

Make it testable: inject mock

How to make this code testable?

The answer should be fairly obvious: mock the random number generator. Make the use of the source of randomness injected and inject a different non-random source during tests. Randomness is not important during testing and there is no need to test the randomness. We have to believe that the random number generator is good (it is not, it is never good, perhaps good enough, but that is a totally different story) and was tested by its own developers.

Learning #1: Do not test the functionality of your dependency.

We can have a field of type Supplier initialized to something like () -> rnd() lambda and in case of test it is overwritten using a setter.

Is testable good?

Now we changed the structure of the class. We opened a new entry to inject a random number generator. Is this okay?

There is no general yes or no answer to that. It depends on the requirements. Programmers like to make their code configurable and more general than they are absolutely needed by the current requirements. The reason that… well… I guess, it is because many times in the past programmers experienced that requirements have changed (no kidding!) and in case the code was prepared for the change then the coding work was easier. This is fair enough reasoning but there are essential flaws in it. The programmers do not know what kind of future requirements may come. Usually, nobody really knows, and everybody has some idea about it.

Programmers usually have the least knowledge. How would they know the future? Business analysts know a bit better, and at the end of the chain, the users and customers know it the best. However, even they do not know the business environment out of their control that may require new features of the program.

Another flaw is that developing of a future requirement now has extra costs that the developers a lot of times do not comprehend.

Practice shows that the result of such ‘ahead of time’ thinking is usually complex code and flexibility that’s hardly ever needed. There is even an acronym for that: YAGNI, “You Aren’t Gonna Need It”.

So, is implementing that injectability feature a YAGNI? Not at all.

First of all: a code has many different uses. Executing it is only one. An equally important one is the maintenance of the code. If the code cannot be tested, it cannot be reliably used. If the code cannot be tested, it cannot be reliably refactored, extended: maintained.

A functionality that is only needed for testing is like a roof bridge on a house. You do not use it yourself while you live in the house, but without them, it would be hard and expensive to check the chimneys. Nobody questions the need for those roof bridges. They are needed, they are ugly and still, they are there. Without them, the house is not testable.

Learning #2: Testable code usually has better structure.

But that is not the only reason. Generally, when you create a code testable the final structure will usually be more useable as well. That is, probably, because testing is mimicking the use of the code and designing the code testable will drive your thinking towards the usability to be on the first place and implementation to be only on the second place. And, to be honest: nobody really cares about implementation. Usability is the goal, implementation is only the tool to get there.

Responsibility

Okay, we got that far: testability is good. But then there is a question about responsibility.

The source of randomness should be hard-wired into the code. The code and the developer of the code are responsible for the randomness. Not because this developer implemented it, but this developer selected the random number generator library. Selecting the underlying libraries is an important task and it has to be done responsibly. If we open a door to alter this selection of implementation for randomness then we lose control over something that is our responsibility. Or don’t we?

Yes and no. If you open the API and provide a possibility to inject a dependency then you are not inherently responsible for the functioning of the injected functionality. Still, the users (your customers) will come to you asking for help and support.

“There is a bug!” they complain. Is it because of your code or something in the special injected implementation the user selected?

You essentially have three choices:

  1. You may examine the bugs in each of those cases and tell them when the error is not your bug and help them select a better (or just the default) implementation of the function. It will cost you precious time either paid or unpaid.
  2. The same time you can also exclude the issue and say: you will not even examine any bug that cannot be reproduced using the standard, default implementation.
  3. You technically prevent the use of the feature that is there only for the testability.

The first approach needs good sales support or else you will end up spending your personal time fixing customers problem instead of spending your paid customer time. Not professional.

The second approach is professional, but customers do not like it.

The third is a technical solution to drive users from #1 to #2.

Learning #3: Think ahead about users’ expectations.

Whichever solution you choose the important thing is to do it consciously and not just by accident. Know what your users/customer may come up with and be prepared.

Prevent production injecting

When you open the possibility to inject the randomness generator into the code how do you close that door for the production environment if you really must?

The first solution, which I prefer, is not to open it wide in the first place. Use it via the initialized field holding the lambda expression (or some other way) that makes it injectable, but do not implement injection support. Let the field be private (but not final, because that may cause other problems in this situation) and apply a bit of reflection in the test to alter the content of the private field.

Another solution is to provide a package private setter, or even better an extra constructor to alter/initialize the value of the field and throw an exception if it is used in the production environment. You can check that many different ways:

  • Invoke `Class.forName()` for a test class that is not on the classpath in the production environment.
  • Use `StackWalker` and check that the caller is test code.

Why do I prefer the first solution?

Learning #4: Do not use a fancy technical solution just because you can. Boring is usually better.

First of all, because this is the simplest and puts all testing code into the test. The setter or the special constructor in the application code is essentially testing code and the byte codes for them are there in the production code. Test code should be in test classes, production code should be in production classes.

The second reason is that designing functionality that is deliberately different in the production and in the test environment is just against the basic principles of testing. Testing should mimic the production environment as much as economically feasible. How would you know that the code will work properly in the production environment when the test environment is different? You hope. There are many environmental factors already that may alter the behavior in the production environment and let bug manifest there only and silently remaining dormant in the test environment. We do not need extra such things to make our testing even riskier.

Summary

There are many more aspects of programming and testing. This article was addressing only a small and specific segment that came up in a discussion. The key learnings also listed in the article:

  • Test the system under test (SUT) and not the dependencies. Be careful, you may think you are testing the SUT when actually you are testing the functionality of some dependencies. Use stupid and simple mocks.
  • Follow TDD. Write the test before and mixed with the functionality development. If you don’t because just you don’t, then at least think about the tests before and while you write the code. Testable code is usually better (not just for the test).
  • Think about how fellow programmers will use your code. Imagine how a mediocre programmer will use your API and produce the interfaces of your code not only for the geniuses like you, who understand your intentions even better than you.
  • Do not go for a fancy solution when you are a junior just because you can. Use a boring and simple solution. You will know when you are a senior: when you no longer want to use the fancy solution over the boring one.

Converting objects to Map and back

In large enterprise applications sometimes we need to convert data objects to and from Map. Usually, it is an intermediate step to a special serialization. If it is possible to use something standard then it is better to use that, but many times the architecture envisioned by some lead architect, the rigid environment or some similar reason does not make it possible to use JOOQ, Hibernate, Jackson, JAX or something like that. In such a situation, as it happened to me a few years ago, we have to convert the objects to some proprietary format being string or binary and the first step towards that direction is to convert the object to a Map.

Eventually, the conversion is more complex than just

Map myMap =  (Map)myObject;

because these objects are almost never are maps by their own. What we really need in the conversion is to have a Map where each entry corresponds to a field in the “MyObject” class. The key in the entry is the name of the field, and the value is the actual value of the field possibly converted to a Map itself.

One solution is to use reflection and reflectively read the fields of the object and create the map from it. The other approach is to create a toMap() method in the class that needs to be converted to a Map that simply adds each field to the returned map using the name of the field. This is somewhat faster than the reflection-based solution and the code is much simpler.

When I was facing this problem in a real application a few years ago I was so frustrated writing the primitive but numerous toMap() methods for each data object that I created a simple reflection-based tool that to do it just for any class we wanted. Did it solve the problem? No.

This was a professional environment where not only the functionality matters but also the quality of the code and the quality of my code, judged by my fellow programmers, was not matching. They argued that the reflection-based solution is complex and in case it becomes part of the code base then the later joining average developers will not be able to maintain it. Well, I had to admit that they were correct. In a different situation, I would have said that the developer has to learn reflection and programming in Java on a level that is needed by the code. In this case, however, we were not speaking about a specific person, but rather somebody who comes and joins the team in the future, possibly sometime when we have already left the project. This person was assumed to be an average developer, which seemed to be reasonable as we did not know anything about this person. In that sense, the quality of the code was not good, because it was too complex. The quorum of the developer team decided that maintaining the numerous manually crafted toMap() method was going to be cheaper than finding senior and experienced developers in the future.

To be honest, I was a bit reluctant to accept their decision but I accepted it even though I had the possibility to overrule it based simply on my position in the team. I tend to accept the decisions of the team even if I do not agree with that, but only if I can live with those decisions. If a decision is dangerous, terrible and threatens the future of the project then we have to keep discussing the details until we get to an agreement.

Years later I started to create Java::Geci as a side project that you can download from http://github.com/verhas/javageci

Java::Geci is a code generation tool that runs during the test phase of the Java development life cycle. Code generation in Java::Geci is a “test”. It runs the code generation and in case all the generated code stays put then the test was successful. In case anything in the code base changed in a way that causes the code generator to generate different code than before and thus the source code changes then the test fails. When a test fails you have to fix the bug and run build, including tests again. In this case, the test generates the new, by now fixed code, therefore all you have to do is only to run the build again.

When developing the framework I created some simple generators to generate equals() and hashCode(), setters and getters, a delegator generator and finally I could not resist but I created a general purpose toMap() generator. This generator generates code that converts the object to Map just as we discussed before and also the fromMap() that I did not mention before, but fairly obviously also needed.

Java::Geci generators are classes that implement the Generator interface. The Mapper generator does that extending the abstract class AbstractJavaGenerator. This lets the generator to throw any exception easing the life of the generator developer, and also it already looks up the Java class, which was generated from the currently processed source. The generator has access to the actual Class object via the parameter klass and the same time to the source code via the parameter source, which represents the source code and provides methods to create Java code to be inserted into it.

The third parameter global is something like a map holding the configuration parameters that the source code annotation @Geci defines.

package javax0.geci.mapper;

import ...

public class Mapper extends AbstractJavaGenerator {

...

    @Override
    public void process(Source source, Class<?> klass, CompoundParams global)
                                                             throws Exception {
        final var gid = global.get("id");
        var segment = source.open(gid);
        generateToMap(source, klass, global);
        generateFromMap(source, klass, global);

        final var factory = global.get("factory", "new {{class}}()");
        final var placeHolders = Map.of(
                "mnemonic", mnemonic(),
                "generatedBy", generatedAnnotation.getCanonicalName(),
                "class", klass.getSimpleName(),
                "factory", factory,
                "Map", "java.util.Map",
                "HashMap", "java.util.HashMap"
        );
        final var rawContent = segment.getContent();
        try {
            segment.setContent(Format.format(rawContent, placeHolders));
        } catch (BadSyntax badSyntax) {
            throw new IOException(badSyntax);
        }
    }

The generator itself only calls the two methods generateToMap() and generateFromMap(), which generate, as the names imply the toMap() and fromMap() methods into the class.

Both methods use the source generating support provided by the Segment class and they also use the templating provided by Jamal. It is also to note that the fields are collected calling the reflection tools method getAllFieldsSorted() which returns all the field the class has in a definitive order, that does not depend on the actual JVM vendor or version.

    private void generateToMap(Source source, Class<?> klass, CompoundParams global) throws Exception {
        final var fields = GeciReflectionTools.getAllFieldsSorted(klass);
        final var gid = global.get("id");
        var segment = source.open(gid);
        segment.write_r(getResourceString("tomap.jam"));
        for (final var field : fields) {
            final var local = GeciReflectionTools.getParameters(field, mnemonic());
            final var params = new CompoundParams(local, global);
            final var filter = params.get("filter", DEFAULTS);
            if (Selector.compile(filter).match(field)) {
                final var name = field.getName();
                if (hasToMap(field.getType())) {
                    segment.write("map.put(\"%s\", %s == null ? null : %s.toMap0(cache));", field2MapKey(name), name, name);
                } else {
                    segment.write("map.put(\"%s\",%s);", field2MapKey(name), name);
                }
            }
        }
        segment.write("return map;")
                ._l("}\n\n");
    }

The code selects only the fields that are denoted by the filter expression.

Extending abstract classes with abstract classes in Java

The example issue

When I was creating the Java::Geci abstract class AbstractFieldsGenerator and AbstractFilteredFieldsGenerator I faced a not too complex design issue. I would like to emphasize that this issue and the design may seem obvious for some of you, but during my recent conversation with a junior developer (my son, Mihály specifically, who also reviews my articles because his English is way better than mine) I realized that this topic may still be of value.

Anyway. I had these two classes, fields and filtered fields generator. The second class extends the first one

abstract class AbstractFilteredFieldsGenerator
                  extends AbstractFieldsGenerator {...

adding extra functionality and the same time it should provide the same signature for concrete implementation. What does it mean?

These generators help to generate code for a specific class using reflection. Therefore the input information they work on is a Class object. The fields generator class has an abstract method process(), which is invoked for every field. It is invoked from an implemented method that loops over the fields and does the invocation separately for each. When a concrete class extends AbstractFieldsGenerator and thus implements this abstract method then it will be called. When the same concrete class is changed so that it extends AbstractFilteredFieldsGenerator then the concrete method will be invoked only for the filtered method. I wanted a design so that the ONLY change that was needed in the concrete class is to change the name.

Diff between the two versions of the concrete class

Abstract class problem definition

The same problem described in a more abstract way: There are two abstract classes A and F so that F extends A and F provides some extra functionality. Both declare the abstract method m() that a concrete class should implement. When the concrete class C declaration is changed from C extends A to C extends F then the invocation of the method m() should change, but there should be no other change in the class C. The method m() is invoked from method p() defined in class A. How to design F?

What is the problem with this?

Extending A can be done in two significantly different ways:

  • F overrides m() making it concrete implementing the extra functionality in m() and calls a new abstract method, say mx()
  • F overrides the method p() with a version that provides the extra functionality (filtering in the example above) and calls the still abstract method m()

The first approach does not fulfill the requirement that the signature to be implemented by the concrete class C should remain the same. The second approach throws the already implemented functionality of A to the garbage and reimplements it a bit different way. In practice this is possible, but it definitely is going to be some copy/paste programming. This is problematic, let me not explain why.

The root of the problem

In engineering when we face a problem like that, it usually means that the problem or the structure is not well described and the solution is somewhere in a totally different area. In other words, there are some assumptions driving our way of thinking that are false. In this case, the problem is that we assume that the abstract classes provide ONE extension “API” to extend them. Note that the API is not only something that you can invoke. In the case of an abstract class, the API is what you implement when you extend the abstract class. Just as libraries may provide different APIs for different ways to be used (Java 9 HTTP client can send() and also sendAsync()) abstract (and for the matter of fact also non-abstract) classes can also provide different ways to be extended for different purposes.

There is no way to code F reaching our design goal without modifying A. We need a version of A that provides different API to create a concrete implementation and another, not necessarily disjunct/orthogonal one to create a still abstract extension.

The difference between the APIs in this case is that the concrete implementation aims to be at the end of a call-chain while the abstract extension wants to hook on the last but one element of the chain. The implementation of A has to provide API to be hooked on the last but one element of the call-chain. This is already the solution.

Solution

We implement the method ma() in the class F and we want p() to call our ma() instead of directly calling m(). Modifying A we can do that. We define ma() in A and we call ma() from p(). The version of ma() implemented in A should call m() without further ado to provide the original “API” for concrete implementations of A. The implementation of ma() in F contains the extra functionality (filtering in the example) and then it calls m(). That way any concrete class can extend either A or F and can implement m() with exactly the same signature. We also avoided copy/paste coding with the exception that calling m() is a code that is the same in the two versions of ma().

If we want the class F extendable with more abstract classes then the F::ma implementation should not directly call m() but rather a new mf() that calls m(). That way a new abstract class can override mf() giving again new functionality and invoke the abstract m().

Takeaway

  1. Programming abstract classes is complex and sometimes it is difficult to have a clear overview of who is calling who and which implementation. You can overcome this challenge if you realize that it may be a complex matter. Document, visualize, discuss whatever way may help you.
  2. When you cannot solve a problem (in the example, how to code F) you should challenge the environment (the class A we implicitly assumed to be unchangeable by the wording of the question: “How to implement F?”).
  3. Avoid copy/paste programming. (Pasta contains a lot of CH and makes your code fat, the arteries get clogged and finally, the heart of your application will stop beating.)
  4. Although not detailed in this article, be aware that the deeper the hierarchy of abstraction is the more difficult it is to have a clear overview of who calls whom (see also point number 1).

Reflection selector expression

Java::Geci is a code generator that runs during unit test time. If the generated code fits the actual version of the source code then the test does not fail. If there is a need for any modification then the tests modify the source code and fail. For example, there is a new field that needs a setter and getter then the accessor generator will generate the new setter and getter and then it fails. If there is no new field then the generated code is just the one that is already there, no reason to touch the source code: the test that started the generator finishes successfully.

Because Java::Geci generators run as tests, which is run-time and because they need access to the Java code structures for which they generate code Java reflection is key for these generators.

To help the code generators to perform their tasks there are a lot of support methods in the javageci-tools module.

com.javax0.geci
javageci-tools
1.1.1

In this article, I will write one class in this module: Selector that can help you select a field, method or class based on a logical expression.

Introduction

The class javax0.geci.tools.reflection.Selector is a bit like the regular expression class Pattern. You can create an instance invoking the static method compile(String expression). On the instance, you can invoke match(Object x) where the x object can be either a Field a Method or a Class or something that can be cast of any of those (Let’s call these CFoMs). The method match() will return true if x fits the expression that was compiled.

Selector expression

The expression is a Java String. It can be as simple as true that will match any CFoM. Similarly false will not match anything. So far trivial. There are other conditions that the expression can contain. public, private volatile and so on can be used to match a CFoM that has any of those modifiers. If you use something like volatile on a CFoM that cannot be volatile (class or method) then you will get IllegalArgumentException.

For classes you can have the following conditions:

  • interface when the class is interface
  • primitive when it is a primitive type
  • annotation when it is an annotation
  • anonymous
  • array
  • enum
  • member
  • local

Perhaps you may look up what a member class is and what a local class is. It is never too late to learn a bit of Java. I did not know it was possible to query that a class is a local class in reflection until I developed this tool.

These conditions are simple words. You can also use pattern matching. If you write extends ~ /regex/ it will match only classes that extend a class that has a name matching the regular expression regex. You can also match the name, simpleName and canonicalName against a regular expression. In case our CFoM x is a method or field then the return type is checked, except in case of name because they also have a name.

Conditions

There are many conditions that can be used, here I list only a subset. The detailed documentation that contains all the words is at https://github.com/verhas/javageci/blob/master/FILTER_EXPRESSIONS.md

Here is an appetizer though:

protected, package, static, public, final, synthetic,
synchronized, native, strict, default, vararg, implements,
overrides, void, transient, volatile, abstract

Expression Structure

Checking one single thing would not be too helpful. And also calling the argument of the method compile() to be an “expression” suggests that there is more.

You can combine the conditions to full logical expression. You can create a selector Selector.compile("final | volatile") to match all fields that are kind of thread safe being either final or volatile or both (which is not possible in Java, but the selector expression would not mind). You can also say Selector.compile("public &amp; final &amp; static") to match only those fields that are public, final and static. Or you can Selector.compile("!public &amp; final &amp; static") to match the final and static fields that are private, protected or package private, also as “not public”. You can also apply parenthesis and with those, you can build up fairly complex logical expressions.

Use

The usage can be any application that heavily relies on reflection. In Java::Geci the expression can be used in the filter parameter of any generator that generates some code for the methods or for the fields of a class. In that case, the filter can select which fields or methods need code generation. For example, the default value for the filter in case of the accessor generator is true: generate setters and getter for all the fields. If you need only setters and getters for the private fields you can specify filter="private". If you want to exclude also final fields you can write `filter=”!final & private”. In that case, you will not get a getter for the final fields. (Setters are not generated for final fields by default and at all. The generator is clever.)

Using streams it is extremely easy to write expressions, like

Arrays.stream(TestSelector.class.getDeclaredFields())
.filter(Selector.compile("private & primitive")::match)
.collect(Collectors.toSet());

that will return the set of the fields that are private and primitive. Be aware that in that case, you have some selector compilation overhead (only once for the stream, though) and in some cases, the performance may not be acceptable.

Experiment and see if it suits your needs.

I just forgot to add: You can also extend the selector during run-time calling the selector(String,Function) and/or selectorRe(String,Function) methods.

Generating setters and getters using Java::Geci

In the article , we created very simple hello-world generators to introduce the framework and how to generate generators generally. In this article, we will look at the accessor generator, which is defined in the core module of Java::Geci and which is a commercial grade and not a demo-only generator. Even though the generator is commercial grade, using the services of the framework it has simple code so that it can be represented in an article.

What does an accessor generator

Accessors are setters and getters. When a class has many fields and we want to help encapsulation we declare these fields to be private and create setters and getters, a pair for each field that can set the value for the field (the setter) and can get the value of the field (the getter). Note that contrary to what many juniors think creating setters and getters is not encapsulation by itself, but it may be a tool to do proper encapsulation. And the same time note that it also may NOT be a tool for proper encapsulation. You can read more about it in “Joshua Bloch: Effective Java 3rd Edition” Item 16.

Read it with a bit of caution though. The book says that it was updated for Java 9. That version of Java contains the module system. The chapter Item 16 does not mention it and even this edition still says to use private members with setters and getters for public classes, which in case of Java 9 may also mean classes in packages that the module does not export.

Many developers argue that setters and getters are inherently evil and a sign of bad design. Don’t make a mistake! They do not advocate to use the raw fields directly. That would even be worse. They argue that you should program with a more object-oriented mindset. In my opinion, they are right and still in my professional practice I have to use a lot of classes maintaining legacy applications using legacy frameworks containing setters, getters, which are needed by the programming tools around the application. Theory is one thing and real life is another. Different integrated development environments and many other tools like generate setters and getters for us unless we forget to execute them when a new field was added.

A setter is a method that has an argument of the same type as the field and returns void. (A.k.a. does not return any value.) The name of the setter, by convention, is set and the name of the field with the first letter capitalized. For the field businessOwner the setter is usually setBusinessOwner. The setter sets the value of the field to that of the argument of the setter.

The getter is also a method which does not have any argument but returns the argument value and hence it has the same return type as the type of the field. The name of the getter, by convention, is get and again the name of the field capitalized. That way the getter will be getBusinessOwner.

In case of boolean or Boolean type fiels the getter may have the is prefix, so isBusinessOwner could also be a valid name in case the field is some boolean type.

An accessor generates setter and getter for all the fields it has to.

How to generate accessors

The accessor generator has to generate code for some of the fields of the class. This generator is the ideal candidate for a filtered field generator in Java::Geci. A filtered field generator extends the AbstractFilteredFieldsGenerator class and its process() method is invoked once for each filtered field. The method also gets the Field as a third parameter in addition to the usual Source and CompoundParams parameter that we already saw in the article a few weeks ago.

The class AbstractFilteredFieldsGenerator uses the configuration parameter filter to filter the fields. That way the selection of which field to take into account is the same for each generator that extends this class and the generators should not care about field filtering: it is done for them.

The major part of the code of the generator is the following:

public class Accessor extends AbstractFilteredFieldsGenerator {

    ...

    @Override
    public void process(Source source, Class<?> klass, 
                        CompoundParams params, 
                        Field field) throws Exception {
        final var id = params.get("id");
        source.init(id);
        var isFinal = Modifier.isFinal(field.getModifiers());
        var name = field.getName();
        var fieldType = GeciReflectionTools.typeAsString(field);
        var access = check(params.get("access", "public"));
        var ucName = cap(name);
        var setter = params.get("setter", "set" + ucName);
        var getter = params.get("getter", "get" + ucName);
        var only = params.get("only");
        try (var segment = source.safeOpen(id)) {
            if (!isFinal && !"getter".equals(only)) {
                writeSetter(name, setter, fieldType, access, segment);
            }
            if (!"setter".equals(only)) {
                writeGetter(name, getter, fieldType, access, segment);
            }
        }
    }
}

The code at the place of the ellipsis contains some more methods, which we will look at later. The first call is to get the parameter id. This is a special parameter and in case it is not defined then default params.get("id") returns is the mnemonic of the generator. This is the only parameter that has such a global default value.

The call to source.init(id) ensures that the segment will be treated as “touched” even if the generator does not write anything to that segment. It may happen in some cases and when writing a generator it never hurts calling source.init(id) for any segment that the generator intends to write into.

The code looks at the actual field to check if the field is final. If the field is final then it has to get the value by the time the object is created and after that, no setter can modify it. In this case, only a getter will be created for the field.

The next thing the setter/getter generator needs is the name of the field and also the string representation of the type of the field. The static utility method GeciReflectionTools.typeAsString() is a convenience tool in the framework that provides just that.

The optional configuration parameter access will get into the variable of the same name and it will be used in case the access modifier of the setter and the getter needs to be different from public. The default is public and this is defined as the second argument to the method params.get(). The method check() is part of the generator. It checks that the modifier is correct and prevents in most cases generation of syntax errored code (e.g.: creating setters and getter with access modifier pritected). We will look at that method in a while.

The next thing is the name of the getter and the setter. By default is set/get+ capitalized name of the field, but it can also be defined by the configuration parameter setter and getter. That way you can have isBusinessOwner if that is an absolute need.

The last configuration parameter is the key only. If the code specifies only='setter' or only='getter' then only the setter or only the getter will be generated.

The segment the generator wants to write into is opened in the head of the try-with-resources block and then calls local writeSetter and writeGetter methods. There are two different methods to open a segment from a source object. One is calling open(id), the other one if safeOpen(id). The first method will try to open the segment and if the segment with the name is not defined in the class source file then the method will return null. The generator can check the nullity and it has the possibility to use a different segment name if it is programmed so. On the other hand safeOpen() throws a GeciException if the segment cannot be opened. This is the safer version to avoid later null pointer exceptions in the generator. Not nice.

Note that the setter is only written if the field is not final and if the only configuration key was NOT configured to be getter (only).

Let’s have a look at these two methods. After all, these are the real core methods of the generators that do actually generate code.

    private static void writeGetter(String name, String getterName,
                                    String type, String access, Segment segment) {
        segment.write_r(access + " " + type + " " + getterName + "(){")
                .write("return " + name + ";")
                .write_l("}")
                .newline();
    }

    private static void writeSetter(String name, String setterName,
                                    String type, String access, Segment segment) {
        segment.write_r(access + " void " + setterName + "(" +
                type + " " + name + "){")
                .write("this." + name + " = " + name + ";")
                .write_l("}")
                .newline();
    }

The methods get the name of the field, the name of the accessor, the type of the field as a string, the access modifier string and the Segment the code has to be written into. The code generators do not write directly into the source files. The segment object provided by the framework is used to send the generated code and the framework inserts the written lines into the source code if that is needed.

The write(), write_l() and write_r() methods of the segment can be used to write code. They work very much like String.format if there are more than one parameters, but they also care about the proper tabulating. When the code invokes write_r() then the segment will remember that the lines following it have to be tabulated four spaces to the right more. When the code calls write_l() then the segment knows that the tabulation has to be decreased by four characters (even for the actual written line). They also handle multi-line strings so that they all will be properly tabulated.

Generated code should also be readable.

The final non-trivial method is the access modifier check.

    private static final Set<String> accessModifiers =
            Set.of("public", "private", "protected", "package");
...

    private String check(final String access) {
        if (!access.endsWith("!") && !accessModifiers.contains(access)) {
            throw new GeciException("'"+access+"' is not a valid access modifier");
        }
        final String modifiedAccess;
        if( access.endsWith("!")){
            modifiedAccess = access.substring(0,access.length()-1);
        }else {
            modifiedAccess = access;
        }
        if( modifiedAccess.equals("package")){
            return "";
        }
        return modifiedAccess;
    }

The purpose of this check is to protect the programmer from mistyping the access modifier. It checks that the access modifier is either private (I do not see a real use case for this one though), protected, public or package. The last one is converted to an empty string, as the package protected access is the default for class methods. The same time using the empty string in the configuration to denote package private access is not really readable.

That way if the code is configured pritected including a typo the code generator will throw an exception and refuses to generate code that is known to contain syntax error. On the other hand, the access modifier can also be more complex. In some rare cases, the program may need synchronized getters and setters. We do not try to figure out automatically anything like that checking if the field is volatile or such, because these are border cases. However, the generator provides a possibility to overcome the limited syntax checking and that way just to provide any string as access modifier. If the access modifier string ends with an exclamation mark then it means the programmer using the generator takes full responsibility for the correctness of the access modifier and the generator will use it as it is (without the exclamation mark of course).

What is left are the methods mnemonic and cap:

    private static String cap(String s) {
        return s.substring(0, 1).toUpperCase() + s.substring(1);
    }

    @Override
    public String mnemonic() {
        return "accessor";
    }

The method mnemonic() is used by the framework to identify the sources that need the service of this generator and also to use it as a default value for the configuration parameter id. All generators should provide this. The other one is cap that capitalizes a string. I will not explain how it works.

Sample use

@Geci("accessor filter='private | protected'")
public class Contained1 {

    public void callMe() {

    }

    private final String apple = "";
    @Geci("accessors only='setter'")
    private int birnen;

    int packge;

    @Geci("accessor access='package' getter='isTrue'")
    protected boolean truth;
    @Geci("accessor filter='false'")
    protected int not_this;

    public Map<String,Set<Map<Integer,Boolean>>> doNothingReally(int a, Map b, Set<Set> set){
        return null;
    }

    //<editor-fold id="accessor" desc="setters">

    //</editor-fold>

}

The class is annotated with the Geci annotation. The parameters is accessor filter='private | protected' that defines the name of the generator to be used on this source file and configures the filter. It says that we need setters and getters for the fields that are private and protected. The logical expression should be read: “filter the field is it is private or protected”.

Some of the fields are also annotated. birnen will get only a setter, truth setter and getter will be package protected and the getter will be named isTrue(). The field not_this will not get a setter or getter because the filter expression is overridden in the field annotation and it says: false that will never be true, which is needed to be processed by the generator.

The field apple is not annotated and will be processed according to the class level configuration. It is private therefore it will get accessor and because it is final it will get only a getter.

The code between the

    //<editor-fold id="accessor" desc="setters">

    //</editor-fold>

will contain the generated code. (You have to run the code to see it, I did not copy it here.)

Summary

In this article, we looked at a generator, which is a real life, commercial grade generator in the Java::Geci framework. Walking through the code we discussed how the code works, but also some other, more general aspects of writing code generators. The next step is to start a project using Java::Geci as a test dependency, use the accessor generator instead of the IDE code generator (which lets you forget to re-execute the setter getter generation) and later, perhaps you can create your own generators for even more complex tasks than just setters and getters.

Box old objects to be autoclosable

Since Java 7 we can use try-with-resources and have any object automatically closed that implements the Autocloseable interface. If the resource is Autocloseable. Some of the classes need some wrap-up but are not Autocloseable. These are mainly old classes in some legacy framework that still get in our way to make us trip up. Nobody is using Struts any more, but still, there are enough old frameworks that are there lurking in the dark and with which we have to live. I recently had that experience and I was so motivated that I created a simple AutoCloser class.

We may have a legacy class (in the example this is a mocking inner class of the testing class)

    public class NotAutoclosable {
        public NotAutoclosable() {
            opened = true;
        }

        public void dispose() {
            opened = false;
        }
    }

which is not auto-closeable as the name also implies. It does not implement the Autocloseable interface and it does not have a close() method. It has to be disposed calling the aptly named method dispose(). (The boolean field opened is used to check later in the unit test to assert the correct functioning of the AutoCloser class.)

The use of the class looks as follows:

    @Test
    void test() {
        final NotAutoclosable notAu;
        try (final var s = AutoCloser.useResource(new NotAutoclosable())
                .closeWith(sp -> sp.get().dispose())) {
            Assertions.assertTrue(opened);
        }
        Assertions.assertFalse(opened);
    }

We create the resource using the constructor of the inner class and we also define a Consumer that will “close” the resource. This consumer will get the same Supplier that is stored in the variable s.

Side note: this functional argument has to be a consumer and cannot be a Runnable using the variable s because that variable is not initialized when the lambda expression is evaluated as a lambda expression. When it is going to be used it will already be defined but that is too late for the Java compiler, it does not trust the programmer that much and usually, it does it with good reason.

The AutoCloser class is the following:

public class AutoCloser<T> {

    private final T resource;

    private AutoCloser(T resource) {
        this.resource = resource;
    }

    public static <T> AutoCloser<T> useResource(T resource) {
        return new AutoCloser<>(resource);
    }

    public AutoClosableSupplier closeWith(Consumer<Supplier<T>> closer){
        return new AutoClosableSupplier(closer);
    }

    public class AutoClosableSupplier implements Supplier<T>, AutoCloseable {
        private final Consumer<Supplier<T>> closer;

        private AutoClosableSupplier(Consumer<Supplier<T>> closer) {
            this.closer = closer;
        }

        @Override
        public T get() {
            return resource;
        }

        @Override
        public void close() {
            closer.accept(this);
        }

    }
}

The inner AutoClosableSupplier class is used because we do not want the programmer accidentally forget to specify the lambda that will finally close the resource.

This is nothing really serious. It is just a programming style that moves the closing of the resource close to the opening of the resource a bit like the deferred statement in the Go language.