Monthly Archives: September 2013

Java memory leak

Java has garbage collector and therefore there is no such things as memory leak.

WRONG

This is wrong on many different levels. Although this is true that there is a garbage collector (GC) that collects the memory chunks that are not used anymore this is still not the stone of the philosophers. GC offloads a huge amount of error prone job from the programmers shoulder, but does not solve every problem related to memory allocation. To make the things a bit worse there are constructs in the Java environment that may “trick” the GC to keep some allocated memory as allocated even though our program is not using it any more. After 20 years of programming of C and 7 years of Java (some overlapping) I can state that Java is far better in this aspect than C or C++. Still there is some room for improvement. Until those improvements become reality programmers better know the nuts and bolts of memory handling and the usual pitfalls not to fall into the traps. But first thing first.

What is a memory leak?

Memory leak is the repetitive allocation of memory without consequential release of it when no longer used, leading to the consumption of ever increasing memory limited by external measures not controlled by the program possibly rendering the execution to a degraded state.

In good old C programming time we talked about memory leak when the program was loosing reference to an allocated memory segment and did not release it. In such a situation the program has no means to get a grab to any handle or pointer to that memory segment to call the run-time function free and as such the memory segment remains allocated, it can not be reused by the program and this way it is totally wasted. The memory is reclaimed by the OS when the process exits, though.

This is a very typical memory leak, but the definition I gave above is wider than that. It may happen that the code still has a pointer to the allocated memory but it does not release the memory and at the same time it does not uses it anymore. A programmer may build up a linked list hooking up all memory segments allocated calling malloc still never calling free has the same result. Since the result is the same, it is not really interesting if there is a possibility to get access to the memory pointer which is needed to release it or not if we do not release it anyway. It only affects the way to fix the bug, but in either case bugfixing needs code modification.

If we look at Java and the GC you can see that this is nearly impossible to produce the classical memory leak where the program looses all references to the allocated memory and thus looses the possibility to release the memory. In that case the GC recognizes the loose of all references to the allocated memory and does the release process. As a matter of fact, that is the standard way to release the memory in Java: just loose all references to an object and GC will collect it. There are no garbage cans, no selective bins. Just throw it away and they will collect it. This is the very reason why many programmers believe that there is no memory leak when programming in Java. From the practical point of view this is close to correct: there are much less hassle hunting memory leaks when programming in Java than it is when programming C, C++ or any other language that does not have a garbage collector.

This is the point where we reach to the question: how can memory leak happen in Java?

Thread and ThreadLocal storage is a very good candidate for memory leak. You can get a memory leaking applications in five easy steps: (List was composed by Daniel Pryden in a stackoverflow post.)

  1. The application creates a long-running thread (or use a thread pool to leak even faster).
  2. The thread loads a class via an (optionally custom) ClassLoader.
  3. The class allocates a large chunk of memory (e.g. new byte[1000000]), stores a strong reference to it in a static field, and then stores a reference to itself in a ThreadLocal. Allocating the extra memory is optional (leaking the Class instance is enough), but it will make the leak work that much faster.
  4. The thread clears all references to the custom class or the ClassLoader it was loaded from.
  5. Repeat.

Since you have no reference to the class and the loader of it you can not get access to the thread local storage and thus you can not get access to the allocated memory (unless you are desperate enough to use reflection). Still the thread local storage has reference and does not allow GC to collect the memory. Thread local storage is not weak. (Btw: why isn’t is weak?)

If you have never experienced anything like that you may think that this is an extremely artificial scenario composed by an evil brain. The truth is that the pattern was created by nature (well, programmers, but not with the intention to create memory leak) and was distilled to the above simple form debugging applications running in Tomcat. Those are very common in the Java word. Redeploying applications without restarting the Tomcat instance many times caused slow degradation of memory because of exactly the above pattern and there are not too much Tomcat can do against it. The applications should be careful using thread local.

You should also be careful when storing large data referenced by static variables. Better avoid static variables if ever you can and better rely on containers you program runs in. They are more flexible than the Java class loader hierarchy. If you store large amount of data in a Map or Set why not to use the weak version of the map or set? If you do not have the key, will you even need the value attached to it?

And now the hash maps and sets. If you use objects as keys that does not implement, or implement the methods equals() and hashCode wrong then calling put() will throw your data into a sink hole. You will never be able to recover it from the hash set/map and what is worse you will get duplicates (or better multiplicates) just as many times you put an object into the structure. You just throw your memory into a sinkhole.

There are numerous examples of possible memory leaks in Java. Even though they are magnitude less frequent than they are in C, or C++. Usually it is better to have a GC than not having.

Advertisement

Something you did not know about the ternary operator

Recently I had a little surprise. It started me compiling some Java class and it did not work the way I expected. This is not the surprise itself: this is just the way I live. Develop, debug, release cycles. So the development cycle started. I was looking to find a bug in my code and as I tightened the bug boundary it came down to a ternary operator. The following sample demonstrates the situation:

public class Bug{
    public static Number q(Number in) {
        return in instanceof Long ? 1L : 1.0;
    }
}

It was supposed to return a Long or a Double value of one, depending on the type of the argument. However the method returned a 1.0 Double.

I looked at the code generated by javac using the javap disassembler and I saw that the case is really the one I explained above:

public static java.lang.Number q(java.lang.Number);
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0       
         1: instanceof    #2                  // class java/lang/Long
         4: ifeq          11
         7: dconst_1      
         8: goto          12
        11: dconst_1      
        12: invokestatic  #3                  // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
        15: areturn       

No matter what type the argument has, the code loads a constant Double on the operational stack executing the bytecode (dconst_1). This code is not good. But it does not mean that there is a bug in the javac compiler. This is not a bug, this is a feature.

The next thing I consulted was the Java language specification. You can read it from the URL http://docs.oracle.com/javase/specs/jls/se7/html/index.html and it talks about the ternary operator:

Otherwise, if the second and third operands have types that are convertible (§5.1.8) to numeric types, then there are several cases:

Otherwise, binary numeric promotion (§5.6.2) is applied to the operand types, and the type of the conditional expression is the promoted type of the second and third operands.

In our case numeric promotion happens according to §5.6.2 :

If either operand is of type double, the other is converted to double.

It explains why this code above is useless. in situations like that use the good old ‘if’ ‘then’ ‘else’ construct.

Update

Recently Lukas Eder posted a similar article on his JOOQ blog worth following. This article is brief and highlights a different effect of the same issue.

Don’t write boilerplate, use scriapt

A good programmer never writes boilerplate code. Instead he/she recognizes the repetitive patterns after writing some, or even before it gets written, and creates some abstract code that serves the purpose, creates a new class, a new method and instead of copy paste calls the method and/or uses instances of the new class.

In modern languages this is possible to a wide extent. When I started Java I was missing the good old C preprocessor a lot. But this craving passed away, I learned the language and I know much better how to use it proficiently without repeating myself. I do not usually write boilerplate code because I can avoid that and I deliberately want to avoid that because writing boilerplate code is boring as such it is source of error. Since I have not been writing boilerplate code for a long time I got recently annoyed when I had to create a huge Java enum, of a few hundred codes that were reflecting the business domain record names.

These record names appeared in XML files traveling in JMS and in JSON format traveling over https and had to appear many times at different locations in out Java code. When we started the project we realized that using the strings as key in Maps is error prone: a changing one 1etter in the name of a key can cause bugs hard to find. If we maintain a huge enum that has a symbolic name for each of the keys and the key associated to it, then any typo is identified by the Eclipse IDE signaling the syntax error. (Did you recognize that in the previous sentence I wrote ‘1etter’ instead of ‘letter’? Many do not, and this is the problem.)

Maintaining the enum is a bit tedious. It looks something like this:

package com.javax0.scriapt.sample;

public enum DomainEnum {
	FIX_4_2("FIX.4.2"), A9("9"), A35("35"), A49_PHLX("49=PHLX"), A56_PERS(
			"56=PERS"), A20071123_05_30_00_000("20071123-05:30:00.000"), ATOMNOCCC9990900(
			"ATOMNOCCC9990900"), PHLX_EQUITY_TESTING("PHLX EQUITY TESTING"), DEUT(
			"DEUT"), DE("DE"), FF("FF"), DK("DK"), KK("KK"), ;
	final String name;

	DomainEnum(final String s) {
		name = s;
	}
}

Even though this is not the real example you can see that the keys can not be used as identifiers, not too meaningful unless you know the business domain well (which may not be the case this time because this is a made up example). The allocation of the identifiers for each key is simple and algorithmic:

  1. If the key starts with a non-alpha character prepend it with the letter ‘A’.
  2. Replace any non alphanumeric character in the key with ‘_’ underscore.

A junior can be assigned to the task to maintain this file, but even then this is unreadable and tedious and for these reasons: error prone.

There was also many boilerplate code written defining the classes for the business domain, mainly different type of field (usually Strings) with setters and getters and converters that convert the XML and the JSON files to domain model objects and the other way around (marshaling).

To omit the setters and the getters you can use groovy that does a good job in things like these, and there are also various solutions to solve the marshaling problem especially when this is such a wide spread and ubiquitous format as JSON and XML. Unfortunately groovy is out of question when the programming language of the project is Java and full stop. Management decided and that is it.

General marshaling is based on reflection and run time structure parsing of the JSON/XML as well as the Java structure. The JSON and the XML can not be parsed before but the Java structure is there during compile time. If something can be done during compile time then it should not be done using tools that operate during run time without good reason. I do not argue against the current marshaling frameworks: having a mature framework, that just does the job and there is no performance bottleneck in the system can be a good reason.

However in our case, as an experiment we decided to eliminate the boilerplate code writing a JavaScript that generates the enumeration. Code to generate the domain objects and the various marshalers can be written in a similar way. The script can be executed manually, and whenever there is some modification in the script containing the list of the fields and also the Java generating code it has to be executed. This is a manual process, even though much less effort than maintaining the source files manually. However I wanted to eliminate this manual process as well totally.

The first thing that came to my mind was to write a maven plugin, but the second thought was to create an annotation processor instead. If an annotation processor is used to execute the script, then we are totally independent of the build tool. It can be maven, ant, graddle, buildr or even make. All we need is some annotation on some class that triggers the execution of the annotation processor.

I created an annotation processor before, the fluflu fluent API generator and thus had some experience. This time the annotation processor was even simpler. It is so simple that it should not take more than ten minutes for an experienced Java programmer to understand. Here it is.

If a class is annotated by the annotation CompileScript it will trigger the execution of the script. The script can be JavaScript or any other script that can be invoked using the JSR223 standard. It can be python, groovy or even my little child ScriptBasic for Java. The only requirement is that the interpreter should be available on the classpath during compile time. In case of JavaScript this is guaranteed when you use Java version 6 or later, in the form of the Rhino interpreter.

The actual trigger class as you can see in the example repo on github scriapt samples is simple as this:

package com.javax0.scriapt.sample;

import com.javax0.scriapt.CompileScript;

@CompileScript(value="compilescripts/generateEnums.js")
public class EnumGeneratorTriggerClass {
}

It does nothing, it is only to trigger the script execution during the compilation phase.

And then comes the big question: what else can we use it for? We have a general purpose weapon to generate java code freely during compilation phase. The possibilities are endless.

What would You use it for?

JavaDoc the Elephant

Clean code says that you should not write inline comments, and you also should not write java doc for things that are trivial. I, myself also get mad seeing comments like

    /**
     * Calculates a/b
     * @param a the nominator
     * @param b the denominator
     * @return the calculated value
     */
    double div(final double a, final double b) {
...

This is absolutely pointless. However on the other end there are examples when non-commenting gets to the extreme. The following is an imagined situation.


  • Joe: We need to transport branches. Did you prepare the library for the purpose?
  • Aliz: Yes, as agreed. Here you have the code, the compiled version is in the repository. Have a good day.
  • Joe: Wait! Wait! Where is the documentation?
  • Aliz: Documentation? You have the code. This is self explanatory. We follow clean code. There is no reason to repeat in English that is best described by the code.
  • Joe: Hm… Ok. We will see.
  • Aliz: You are experienced programmers, it should not be a problem to understand our code.

Joe then starts to read the code and this is what he sees:

This is an elephant.

“Great! Aliz delivered us an elephant to transport branches. We need a tamer, harness. Not a big cost to produce, still a bit inconvenient. But on the other hand we can also use this as a shower. When the elephant is in good mood it is playful and sprinkles water on people, which is fun and can also serve as a kind of cleaning process in certain situations, so we can transport branches cleaned.”

So Joe, and his troop start to use the transport appliance, and at the same time complains that Aliz and her team deliver the harness and the tamer in the next release.

  • Joe: So you delivered the next release of the library to transport branches.
  • Aliz: Yes, and we listened to your complaints and you do not need any tamer anymore neither have you the burden to fix a harness on the appliance.
  • Joe: And how can we use the new version?
  • Aliz: Just as the previous one. Just put on the branches and have them transported. There may be some small differences in the usage, but you, as senior programmer will easily solve that. Read the code!

Joe then starts to read the code and this is what he sees:

This is a van.

“We do not need a tamer, but we need a driver, which is certainly more common these days and is a simpler task to provide. Although this has to be a truck driver and not a system driver, it is still manageable. But how can we use this thing to clean the branches?”

Since Joe and his team already have the tamer, and the harness, and since they already have a well tamed elephant, they feel reluctant to use the new version. Also the sprinkler is fun, though that is not the main purpose but still: it makes them like the version 1.0.0 of the transport appliance as opposed to the version 1.1.0

Time passes and after a few month Aliz sees that the team of Joe still reports bugs (no matter how often the elephant is cleaned) of the version 1.0.0 and had not started to use the second version. What is the problem?


The problem is that there was no documentation clearly defining what is delivered as guaranteed function and what is delivered by “accident”, internal methods and packages that in Java world can not be totally hidden. If the elephant were declared as a transport appliance and the method to sprinkle water were documented clearly as an internal functionality then the team of Joe would not have used that. Every responsible developer knows that non-supported methods, classes, packages should not be used from outside of the library or else there are consequences: typically version lock in.

For example ORACLE (and formerly SUN) engineers clearly documented that the packages sun.* are for internal use and no real developer dared to use those classes in production code. Or am I wrong?