Monthly Archives: March 2014

Java 8 default methods: what can and can not do?

What default method is

With the release of Java 8 you can modify interfaces adding new methods so that the interface remains compatible with the classes that implement the interface. This is very important in case you develop a library that is going to be used by several programmers from Kiev to New York. Until the dawn of Java 8 if you published an interface in a library you could not add a new method without risking that some application implementing the interface will break with the new version of the interface.

With Java 8 is this fear gone? No.

Adding a default method to an interface may render some class unusable.

Let’s see first the fine points of the default method.

In Java 8 a method can be implemented in an interface. (Static methods can also be implemented in an interface as of Java8, but that is another story.) The method implemented in an interface is called default method and is denoted by the keyword default as a modifier. When a class implements an interface it may, but does not need to implement a method implemented already in the interface. The class inherits the default implementation. This is why you may not need touch a class when an interface it implements changes.

Multiple inheritance?

The things start to get complicated when a concrete class implements more than one (say two) interfaces and the interfaces implement the same default method. Which default method will the class inherit? The answer is none. In such a case the class has to implement the method itself (directly or by inheritance from a higher class).

This is also true when only one of the interfaces implement the default method and the other one only declares it as abstract. Java 8 tries to be disciplined and avoid “implicit” things. If the methods are declared in more than one interfaces then no default implementation is inherited, you get a compile time error.

However you can not get a compile time error if you have your class already compiled. This way Java 8 is not consistent. It has its reason, which I do not want to detail here or get into debate for various reasons (e.g.: the release is out, debate time is long over and was never on this platform).

  • Say you have two interfaces, and a class implementing the two interfaces.
  • One of the interfaces implement a default method m().
  • You compile all the interfaces and the class.
  • You change the interface not containing the method m() to declare it as an abstract method.
  • Compile the modified interface only.
  • Run the class.

default method multiple inheritance
In this case the class runs. You can not compile it again with the modified interfaces, but if it was compiled with the older version: it still runs. Now

  • modify the interface having the abstract method m() and create a default implementation.
  • Compile the modified interface.
  • Run the class: failure.

When there are two interfaces providing default implementation for the same method the method can not be invoked in the implementing class unless implemented by the class (again: either directly or inherited from another class).
invalid multiple inharitance of default methods
The class is compatible. It can be loaded with the new interface. It can even start execution so long as long there is no invocation to the method having default implementation in both interfaces.

Sample code

directory structure of test

To demonstrate the above I created a test directory for the class C.java and three subdirectories for the interfaces in files I1.java and I2.java. The root directory of the test contains the source code for the class C in file C.java. The directory base contains the interface version that is good for execution and compilation. I1 contains the method m() with default implementation. The interface I2 does not contain any method for now.

The class contains a main method so we can execute it in our test. It tests if there is any command line argument so we can easily execute it with and without invoking the method m().

~/github/test$ cat C.java 
public class C implements I1, I2 {
  public static void main(String[] args) {
    C c = new C();
    if( args.length == 0 ){
      c.m();
    }
  }
}
~/github/test$ cat base/I1.java 
public interface I1 {
  default void m(){
    System.out.println("hello interface 1");
  }	
}
~/github/test$ cat base/I2.java 
public interface I2 {
}

We can compile and run the class using the command lines:

~/github/test$ javac -cp .:base C.java
~/github/test$ java -cp .:base C
hello interface 1

The directory compatible contains a version of the interface I2 that declares the method m() abstract, and for technical reasons it contains I1.java unaltered.

~/github/test$ cat compatible/I2.java 

public interface I2 {
  void m();
}

This can not be used to compile the class C:

~/github/test$ javac -cp .:compatible C.java 
C.java:1: error: C is not abstract and does not override abstract method m() in I2
public class C implements I1, I2 {
       ^
1 error

The error message is very precise. Even though we have the C.class from the previous compilation and if we compile the interfaces in the directory compatible we will have two interfaces that can still be used to run the class:

~/github/test$ javac compatible/I*.java
~/github/test$ java -cp .:compatible C
hello interface 1

The third directory, wrong contains a version of I2 that also defines the method m():

~/github/test$ cat wrong/I2.java 
public interface I2 {
  default void m(){
    System.out.println("hello interface 2");
  }
}

We should not even bother to compile it. Even though the method is double defined the class still can be executed so long as long it does not invoke the method, but it fails as soon as we try to invoke the method m(). This is what we use the command line argument for:

~/github/test$ javac wrong/*.java
~/github/test$ java -cp .:wrong C
Exception in thread "main" java.lang.IncompatibleClassChangeError: Conflicting default methods: I1.m I2.m
	at C.m(C.java)
	at C.main(C.java:5)
~/github/test$ java -cp .:wrong C x
~/github/test$

Conclusion

When you start to move your library to Java 8 and you modify your interfaces adding default implementations, you probably will not have problems. At least that is what Java 8 library developers hope adding functional methods to collections. Applications using your library are still relying on Java 7 libraries that do not have default methods. When different libraries are used and modified, there is a slight chance of conflict. What to do to avoid it?

Design your library APIs as before. Do not go easy relying on the possibility of default methods. They are last resort. Choose names wisely to avoid collision with other interfaces. We will learn how Java programming will develop using this feature.

Advertisements

Documenting API using Concordion

Concordion is an open source tool for writing automated acceptance tests in Java.” It is a handy little tool, simple to use and even the source code of the tool is good style. You describe the tests using HTML with special markups and when you run your special unit tests using the ConcordionRunner it processes the HTML and replaces the special tags with the actual values fetched from the tests. In the end you get an HTML colored with red and green spots where some tests failed or succeeded respectively. This is a result good for the eyes, BAs and easy to spot any error. This way it is similar to fitnesse and GreenPepper.

Even though the tool was designed for automated acceptance test — which one could argue is a non-sense term — I wanted to use it to document API.

The usual way to document API is JavaDoc. JavaDoc includes the signature of the methods, and comments. The comments are supposed to depict the way the method should be used. This is fairly good approach but has some shortages:

  • The comments become outdated. Developers change the way the method is to be used, but forget to update the JavaDoc.
  • A method many times should be used in different ways together with other objects and methods.

For this reason many developers believe that unit tests are the real documentation of an API. Two approaches with good and bad aspects. How could we leverage the best of the both world?

I decided to write a small library that can be included into Concordion fixtures using delegation and which can read the source code of the fixture or just any other source code, cut off some lines from the code and return them as string. Referencing the method Concordion output can include (presumably preformatted) Java code. This way the resulting HTML will contain living documentation including actual code without manual copy pasting that decreases the danger of the documentation getting outdated.

The project is available from GitHub and also from Sonatype repo. The documentation of the project was also created this way.

Object Interning

Java stores the string constants appearing in the source code in a pool. In other words when you have a code like

String a = "I am a string";
String b = "I am a string";

the variables a and b will hold the same value. Not simply two strings that are equal but rather the very same string. In Java words a == b will be true. However this works only for Strings and small integer and long values. Other objects are not interned thus if you create two objects that hold exactly the same values they are usually not the same. They may and probably be equal but not the same objects. This may be a nuisance some time. Probably when you fetch some object from some persistence store. If you happen to fetch the same object more than one time you probably would like to get the same object instead of two copies. In other words I may also say that you only want to have one single copy in memory of a single object in the persistence. Some persistence layers do this for you. For example JPA implementations follow this pattern. In other cases you may need to perform caching yourself.

In this example I will describe a simple intern pool implementation that can also be viewed on the stackoverflow topics. In this article I also explain the details and the considerations that led to the solution depicted there (and here as well). This article contains more detailed tutorial information than the original discussion.

Object pool

Interning needs an object pool. When you have an object and you want to intern that object you essentially look in the object pool to see if there is already an object equal to the one in hand. In case there is one we will use the one already there. If there is no object equal to the actual one then we put the actual object into the pool and then use this one.

There are two major issues we have to face during implementation:

  • Garbage Collection
  • Multi-thread environment

When an object is not needed anymore it has to be removed from the pool. The removal can be done by the application but that would be a totally outdated and old approach. One of the main advantage of Java over C++ is the garbage collection. We can let GC collect these objects. To do that we should not have strong references in the object pool to the pooled objects.

Reference

If you know what soft, weak and phantom references, just jump to the next section.

You may noticed that I did not simply say “references” but I said “strong references”. If you have learned that GC collects objects when there are no references to the object then it was not absolutely correct. The fact is that it is a strong reference that is needed for the GC to treat an object untouchable. To be even more precise the strong reference should be reachable travelling along other strong references from local variables, static fields and similar ubiquitous locations. In other word: the (strong) references that point point from one dead object to another does not count, they together will be removed and collected.

So if these are strong references, then presumably there are not so strong references you may think. You are right. There is a class named java.lang.ref.Reference and there are three other classes that extend it. The classes are

  1. PhantomReference
  2. WeakReference and
  3. SoftReference

in the same package. If you read the documentation you may suspect that what we need is the weak one. Phantom is out of question for use to use in the pool, because phantom references can not be used to get access to the object. Soft reference is an overkill. If there are no strong references to the object then there is no point to keep it in the pool. If it comes again from some source, we will intern it again. It will certainly be a different instance but nobody will notice it since there is no reference to the previous one.

Weak references are the ones that can be use to get access to the object but does not alter the behavior of the GC.

WeakHashMap

Weak reference is not the class we have to use directly. There is a class named WeakHashMap that refers to the key objects using soft references. This is actually what we need. When we intern an object and want to see if it is already in the pool we search all the objects to see if there is any equal to the actual one. A map is just the thing that implements this search capability. Holding the keys in weak references will just let the GC collect the key object when nobody needs it.

We can search so far, which is good. Using a map we also have to get some value. In this case we just want to get the same object, so we have to put the object into the map when it is not there. However putting there the object itself would ruin what we gained keeping only weak references for the same object as a key. We have to create and put a weak reference to the object as a key.

WeakPool

After that explanation here is the code. It just says if there is an object equal to the actual one then get(actualObject) should return it. If there is none, get(actualObject) will return null. The method put(newObject) will put a new object into the pool and if there was any equal to the new one, it will overwrite the place of the old one with the new.

public class WeakPool<T> {
  private final WeakHashMap<T, WeakReference<T>> pool = new WeakHashMap<T, WeakReference<T>>();
  public T get(T object){
      final T res;
      WeakReference<T> ref = pool.get(object);
      if (ref != null) {
          res = ref.get();
      }else{
          res = null;
      }
      return res;
  }
  public void put(T object){
      pool.put(object, new WeakReference<T>(object));
  }
}

InternPool

The final solution to the problem is an intern pool, that is very easy to implement using the already available WeakPool. The InternPool has a weak pool inside, and there is one single synchronized method in it intern(T object).

public class InternPool<T> {
  private final WeakPool<T> pool = new WeakPool<T>();
  public synchronized T intern(T object) {
    T res = pool.get(object);
    if (res == null) {
        pool.put(object);
        res = object;
    }
    return res;
  }
}

The method tries to get the object from the pool and if it is not there then puts it there and then returns it. If there is a matching object already there then it returns the one already in the pool.

Multi-thread

The method has to be synchronized to ensure that the checking and the insertion of the new object is atomic. Without the synchronization it may happen that two threads check two equal instances in the pool, both of them find that there is no matching object in it and then they insert their version into the pool. One of them, the one putting its object later will be the winner overwriting the already there object but the looser also thinks that it owns the genuine single object. Synchronization solves this problem.

Racing with the Garbage Collector

Even though the different threads of the java application using the pool can not get into trouble using the pool at the same time we still should look at it if there is any interference with the garbage collector thread.

It may happen that the reference gets back null when the weak reference get method is called. This happens when the key object is reclaimed by the garbage collector but the weak hash map in the weak poll implementation still did not delete the entry. Even if the weak map implementation checks the existence of the key whenever the map is queried it may happen. The garbage collector can kick in between the call of get() to the weak hash map and to the call of get() to the weak reference returned. The hash map returned a reference to an object that existed by the time it returned but, since the reference is weak it was deleted until the execution of our java application got to the next statement.

In this situation the WeakPool implementation returns null. No problem. InternPool does not suffer from this also.

If you look at the other codes in the before mentioned stackoverflow topics, you can see a code:

public class InternPool<T> {

    private WeakHashMap<T, WeakReference<T>> pool = 
        new WeakHashMap<T, WeakReference<T>>();

    public synchronized T intern(T object) {
        T res = null;
        // (The loop is needed to deal with race
        // conditions where the GC runs while we are
        // accessing the 'pool' map or the 'ref' object.)
        do {
            WeakReference<T> ref = pool.get(object);
            if (ref == null) {
                ref = new WeakReference<T>(object);
                pool.put(object, ref);
                res = object;
            } else {
                res = ref.get();
            }
        } while (res == null);
        return res;
    }
}

In this code the author created an infinite loop to handle this situation. Not too appealing, but it works. It is not likely that the loop will be executed infinite amount of time. Likely not more than twice. The construct is hard to understand, complicated. The morale: single responsibility principle. Focus on simple things, decompose your application to simple components.

Conclusion

Even though Java does interning only for String and some of the objects that primitive types are boxed to it is possible and sometimes desirable to do interning. In that case the interning is not automatic, the application has to explicitly perform it. The two simple classes listed here can be used to do that using copy paste into your code base or you can

        <dependency>
          <groupId>com.javax0</groupId>
          <artifactId>intern</artifactId>
          <version>1.0.0</version>
        </dependency>

import the library as dependency from the maven central plugin. The library is minimal containing only these two classes and is available under the Apache license. The source code for the library is on GitHub.

Poll

After we managed to have a pool, now lets to have a poll! Please answer the following questions, honestly:

Logging or debugging

Debugging is lame. You should debug log.

If your code is structured you do not need debug logging.

These are two opinions from the two ends of the line. I am, as usually, standing in the middle, and I will tell you why.

First of all, there is no principal difference between debugging versus logging. They are just two different implementations of the same thing: observation of your execution engine state in time dimension.

Issue with debugging

When you debug you step your program forward in time and at any point the execution stops you can examine the value of any variable. The shortage is that you can not step back in time. At some points you realize that you would just like to see what the value of a certain variable was just before some method was called, some object was created or whatsoever happened in the system. What you actually do in such a situation is to restart the code and hoping it behaves deterministic try to catch the execution at the earlier stage that you are interested in. And this is another shortage of debugging. You can not effectively debug a code that does not behave deterministic. And trust me: most bugs behave non deterministic.
Debug versus Logging

Issue with logging

With logs the major issue is different. It is not the time but rather the breadth of states, variables that you can look at is the problem. You insert log statements into your code dumping the values of variables into a log file at a certain point of execution. When you examine the log file you can scroll back and forth. However if you did not print out the value of a certain variable at a certain execution point, there is no way to get it from the log file. The solution is the same as with debugging: execute the code again, this time extended with the new log statements. If, however, you have enough information in your log files, then you will just get enough information to track down a bug even if that is not deterministic. Only ‘if you have’ …

Solution: logging all the states all the times?

The ideal solution would be to dump all variables into a possibly binary log file at each state of the execution and examine the content of the file afterwards. The examination would essentially look like a debugger, except that the change of the variables comes from the recorded log file instead of from on the fly calculation. It would be like a playback of a recorded execution and as such you could replay it several times. I do not know if there is any tool like that for the JVM.

You just can not define what is “each state” effectively in a multi thread execution environment like the JVM is. This is one of the issues. The other thing is that if you’d start dumping the JVM memory after each command (forgetting the issues of multi-thread) it would require enormous amount of bandwidth and disk space.

Dreaming about the ideal solution not deliverable is sort of no use. What is the solution that can practically be executed?

Practical approach

You can debug when it is appropriate. Full stop. You just did that so far, keep doing that. I tend to use log statements even when I debug some code and if the environment allows it I do it on the fly. When I find the root cause of the issue I am hunting I review the log statements and I delete them. They did the job while debugging, they are not needed anymore. At least that was my practice unit I found myself writing log statements that I have already created before. Why? Because fixing one bug does not mean that I have fixed all of them. There is nothing like all bugs fixed. But the log items littered the log file and that just increased the work to find the needed information hunting the next bug. In other words the log file is full of noise and that is why I deleted these items the first place. But for the same reason I could also delete the unit tests that already pass. It would save a lot of time during compilation, wouldn’t it? We do not do that.

Summary in one sentence? Log and debug the way it fits you and the issue you are hunting.