Java Deep

Pure Java, what else

Outsourcing, Do It Right

Most of the times outsourcing is a nightmare. Companies outsource the activities that are not their core activity nevertheless needed for business to get the job done as cheap as possible. They look at it as some necessary evil, something that would better be not to know all about, forget the details and have it been done. Many times IT is outsourced with this mindset and causes disaster. The problem does not lie in the fact that IT itself is outsourced. It is the mindset. IT can be outsourced and in many cases this is rational to do. The problem is that management many times does not realize that only the T (technology) is what can be outsourced but no I (information).

When you outsource Information Technology running your business outsource only Technology but never Information!


Note that in the expression “information technology” the work “information” is an adjective to “technology” therefore when we speak IT outsourcing we do speak about the outsourcing of the technology and only the handling of the information.

If you are a manufacturer IT is not your core business. If you are an insurance company, or a bank IT is not your core business. If you are doing anything except IT, probably IT is not your core business. You have some system that does the book keeping, resource planning, customer relationship management and so on. This is the technology. Your business can be running fine even if you do not own the knowledge that runs these systems. Does it help you to be a better manufacturer, bank or whatever you are if you own the information technology? Probably no. It does not give you business advantage. If your book keeping is as good as your competitor’s then this is ok as a foundation to compete on other grounds.

The information, however, how you run your business, what makes you a better manufacturer, bank, whatever than others is important to own. That you should not outsource. How you produce , what are the best business processes in your company, how you can adapt to market changes are core business. If you do not care about that and you outsource these core activities to a software technology company they most probably will not provide the best fitting solution for you and the cost may increase, profitability lowers, and your general competitiveness weakens. This task needs business knowledge and companies can outsource this only to companies that have the knowledge and skills. Some software companies do, but in this case these companies are not only software companies but rather consulting-and-software companies. Sometimes you can not tell where the border is. You can hire a consulting company, or you can hire experts as employees. The difference is paper work, sometime taxation. You need the knowledge and you should get it to serve your business goals.

Having all that said, let’s focus on the technology outsourcing assuming we know where business information ends and technology starts.

The major point to outsource IT is to save cost. The outsourcing company can do it cheaper. I have seen many times that on the decline huge companies cut off the IT department, formed a new company, sold and contracted this company for IT outsourcing. The next Monday everybody was sitting at the same desk, doing the same work. How could it be cheaper than some nasty taxation tricks, less benefits from the new company to the employees and similar effects? In some cases such IT companies cut off the body of a huge corporation can survive, but I have never seen any. They inherit the company culture of the corporate that I do not say is wrong, but may just not be the best and most competitive for a small IT company. They stay alive for many years and know one that is alive for more than a decade and is a success story. Whenever you see such success stories remember that

To have a success story you need success and story. One of them is optional.

Really successful IT companies, as I see, come from small and grow big. They learn, as they go how to do IT in a professional way. Focusing the core, working for multiple clients gives the real advantages. The solution at the end of the day can be cheaper using the resources more effectively, having better distribution and better skill matching.

If your problem needs 7.5 person, you can not hire that amount. People are not really effective when chopped into half. You should hire at least 8 persons and find some occupation for the “half” person. An IT company can find that occupation more effectively. They can also easily allocate 15 persons each half time if that may be more effective. If somebody does not fit the task you need not fire the person, the outsource company will find the person that fits best and a task for the person that fits his/her skills. These are possibilities when outsourcing is done right.

If the people of the IT company work only for you and they never submerge into the culture of the mother company then you loose the advantage that comes from knowledge gathered. If the outsourcing company just hires people and sells them out right away they are simply slave traders. That is not the way. If outsourcing is “Let’s get others to do the work that does not mean much to us as cheaply as possible.”, then it is wrong. If it does not mean much to “us”, we should not do it. We should not outsource it. It should rather be eliminated. If it can not be eliminated, then it does mean much.

Just as much as it means for the outsourcing company. Your business depends on it.

Why do we mock?

I do Java interviews. During the interviews I ask technical question that I know the answer for. You may think this is boring. To be honest: sometimes it is. But sometimes it is interesting to learn what misconcepts there are. I happened to ask during the interview what you can read in the title: “Why do we mock?”. The answer was interesting. I can not quote word by word and I also would not like to do that for editorial, ethical and many other reasons. I also would like to stress that a misconcept does not qualify the person clever or stupid or anything. It is just a misconcept that comes from someone’s personal experience. Here is what she/he was saying.

We use mocks so that we can write tests for units that use heavy services that may not be available when we run the test. It is also important to mock so that the tests can run fast even when the services would make the testing slow. It may also happen in an enterprise environment that some of the services are not available when we develop, and therefore testing would be impossible if we did not use mocks.

Strictly speaking the above statements are true. I would not argue about why you or anybody else uses mocks. But as a Java professional I would argue about what the major and first goal we use mocks for.

We use mocks to separate the modules from each other during testing so that we can tell from the test results which module is faulty and which passed the tests.

This is what unit tests were invented for the first place. There are side effects, like those mentioned above. There are other side effects. For example the unit tests are great documentations. If formulated well they explain better how the unit works and what interfaces it needs and provides than any javadoc. Especially that javadocs tend to be outdated while Junit tests fail during build if get outdated.

Another side effect is that you write testable code if you craft the unit tests first and this generally improve your coding style.

Saying it simply: unit testing is testing units. Units bare without dependencies. And this can be done with mocks. The other things are side effects. We like them, but thy are not the main reason to mock when we unit test.

Unit test life?

You can not program without testing. You write unit tests first and then you write your code. (Well, I know you don’t but just let’s focus on best practice.) When there is an error in the code, first you write a new unit test that demonstrates the bug and then you fix it. After the unit test runs fine the same bug should never ever happen again without being immediately signaled by the unit test.

Later the integration and user tests come. They test the application and in case there is some error the developer fixes the code. If possible and feasible we create new unit tests to cover the case that was not discovered earlier so that the same bug will not slip to more expensive integration and user tests. In other cases unit testing is not possible or would just be too cumbersome and not worth paying the cost when the bug is strongly related to integration or user experience.

This is a working practice that was developed for software creation during the last twenty or so years.

Real life is, however, not that simple. There may be no time to create the new unit tests after a bug was discovered during integration or user tests. You fix the bug and test the functionality of the application manually and omit the new unit, or non-unit but still automated test because creating that is too expensive, would require too long time and project constraints are tight. The bug you fix is serious, high level, show-stopper and this is already the last few days of the testing period. So you focus on doing the right thing: get the job done, fix the bug.

  • Have you started to fix the bug we reported yesterday?
  • No. We had no time to deal with it. We had to attend to higher level issues.
  • Higher Level? What are you talking about? There can no fix be more important than this!
  • Then why did you report it as a cosmetic in the first place?
  • Cosmetic????? COSMIC!!!!

Sometimes even drop-down lists do not prevent erroneous user input.

Later, over a calm weekend perhaps, you start to think about the case. How come that a serious bug is only discovered at the end of the testing period? Isn’t there a bug in the testing process?

Unless the bug was introduced during the recent bugfixes, in which case there can also be some issue with how the developers fix bugs, there is a bug in the testing process. The tests cover all significant functionality of the application and the test cases that assert the correct behavior are ordered by severity. The test cases that are supposed to discover severe issues should be executed sooner, and less important, cosmetic issues should be tested later. If this was not the case: it is a bug.

How do you fix this bug? Move the test case that was executed late to it’s proper position. It will not fix the bug manifestation that has already happened, the very same way as a bug fix does not remedy the lost money caused by program malfunction. Fix will just prevent causing more damage. During the next release period (well, yes, think about good old waterfall) the regression testing will discover the same bug sooner.

But why do not we create a unit test for the testing process? Why don’t we have unit tests for the processes of the corporate? How could we generalize the idea and practice of unit testing over all aspects of the life?

Some Sentences about Java

There is nothing new in this article. I just collected some trivial statements which may not be trivial for some of the junior programmers programmers. Boring old stuff.

If you happen all of these things you know more about Java than the average house wife. I do not know if there is point to know all of these. You can be a fairly good Java programmer if you do not know some of these features. However a lot of new information in this article probably indicates you have room to develop.

There are 4 different protection types

in Java (not three). These are private, package private, protected and public. If you do not specify any protection modifier when you define an element in a class it will be package private (and not public and not protected).

4 levels

There are four levels of protection in Java.

On the other hand if you do not specify protection modifier in front of a method declaration in an interface: it will be public. You may specify it to be explicitly public but it does not have effect on Java and SONAR will not like you doing so.
Protection is Transitive

Protection is Transitive

My opinion about Java allowing you to optionally write public in front of a method in an interface is that this is a technology mistake.

Similarly you can write final in front of a field in an interface, or even static. It may imply that they could be non-static or non-final: not true. Fields of an interface are final and static. Always.

Protected and package private are not the same

Package private (or default) protection will let other classes of the same package access to the method or field. Protected methods and fields can be used from
classes in the same package (so far the same as package private) and in addition to that it can be used from other classes that extend the class containing the protected field or method.

Protected is transitive

If there are three packages a, b and c, each containing a class named A, B and C so that B extends A and C extends B then the class C can access the protected fields and methods of A.

package a;

public class A {
	protected void a() {

	}
}
package b;

import a.A;

public class B extends A {
	protected void b() {
		a();
	}
}
package c;

import b.B;

public class C extends B {
	protected void c() {
		a();
	}
}

Interface can not define protected methods

Many thinks that you can also define protected methods in an interface. When programming the compiler makes it obvious fast and brutally: you can not. Btw: this is why I think that allowing the public keyword in an interface is a technology mistake: it makes people think that it could also be something else as well.

Private is the new public

Private is the new public

If you want to declare a protected method in an interface, you probably did not understand encapsulation.

Private is not that private

Private variables and methods are visible inside the compilation unit. If that sounds too cryptic: in the same Java file (almost). This is a bit more than “in the class where they are defined”. They can also be seen from classes and interfaces that are in the same compilation unit. Inner and nested classes can see private fields and methods of the class enclosing them. However enclosing classes can also see the private methods and fields of the classes they enclose down to any depth.

package a;

class Private {
	private class PrivateInPrivate {
		private Object object;
	}

	Object m() {
		return new PrivateInPrivate().object;
	}
}

This latter is not widely known. As a matter of fact it is rarely useful.

Private is class level not object

If you can access a variable or method you can access it no matter which object it belongs to. If this.a is accessible then another.a is also accessible assumed that another is an instance of the same class. Objects that are instances of the same class can fool around with each others variables or methods. Rarely makes sense to have such a code though. A real life exception is equals() (as generated by Eclipse, lines 15 and 18):

package a;

public class PrivateIsClass {
	private Object object;

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		PrivateIsClass other = (PrivateIsClass) obj;
		if (object == null) {
			if (other.object != null)
				return false;
		} else if (!object.equals(other.object))
			return false;
		return true;
	}
}

Static classes may have many instances

Protection is not object level. It is class level.

Protection is not object level. It is class level.

Classes that are not supposed to have any instances are usually called utility classes. They contain only static fields and static methods and the only constructor is private, not invoked from any of the static methods of the class. In Java 8 you can have such a beasts implemented in interfaces, since Java 8 interfaces can have static methods in it. I am not convinced that we should use that feature instead of utility classes. I am not absolutely convinced that we should use utility classes at all.

Static classes are always inside in another class (or interface). They are nested classes. They are static and just as static methods can not access instance methods and fields of the class similarly a static nested class can not access the instance methods and fields of the embedding class. That is because nested classes do not have a reference (pointer if you like) to an instance of the embedding class. Inner classes, as opposed to nested classes are non static and can not be created without an instance of the embedding class. Each instance of an inner class has a reference to exactly one instance of the embedding class and thus an inner class can access instance methods and fields of the embedding class.

Because of this you can not create an inner class without an instance of the surrounding class. You need not specify it though if this is the current object, a.k.a this. In that case you can write new, which is, in this case, just a short form for this.new. In a static environment, for example from a static method you have to specify which instance of the enclosing class should the inner class created with. See the line 10:

package a;

class Nesting {
	static class Nested {}
	class Inner {}
	void method(){
		Inner inner = new Inner();
	}
	static void staticMethod(){
		Inner inner = new Nesting().new Inner();
	}
}

Anonymous classes can access only final variables

Variable has to be effective final

Variable has to be effective final


When an anonymous class is defined inside a method, it can access local variables if they are final. But saying that is vague. They have to be declared final and they also have to be effective final. This is what is released a bit in Java 8. You need not declare such variables as final but they still have to be effective final.
Java 8 does not require final, only effective final

Java 8 does not require final, only effective final

Why do you need to declare something final, when it has to be checked being that anyway? Like method arguments: they also have to be final. You say that this is not a requirement of Java? Well, you are right. It is a requirement of programming in good style.

Do all business logic on the client using JavaScript?

History look

The first applications were running on “the” computer. There was nothing like “client” and “server”. There was the computer and the punch card input, printer output. Later the mainframe came and the clients were terminals. Looking back, we can say that this was a two tier architecture with lightweight clients. This was practically the same with the mini computers (PDP, VAX and their clones) until the PC was introduced.

The PC architecture introduced the two tier paradigm that contained a fat client and file server or database server. Three tier architecture was supposed to replace the two tier architecture to have a no-so-fat client, server supporting business logic and storage on the third tier. Then came the web and the browser and the client was replaced by web page. Thin client was invented. At least the terminology was born that time. And during the last 20 years the web technology evolved and with HTML5 we have something on the client side that is very powerful and in functionality and in capability comparable to the thick client of the early PC era. If we consider the memory and CPU power we currently have an HTML5 capable browser fairly overcomes any early PC fat client.

Time to think about what is implemented in what tier. As we see the functionality, storage and code were dangling back and forth between the client and various server components. The optimal distribution of functionality was determined by the distribution and availability of CPU power in the different tiers, storage capabilities and network latency as well as bandwidth. Technology evolving changing the ratio between these changed where we put parts of the functionality.

Today

Today clients are so powerful that we are tempted to put everything on the client. It is not only the available top-notch client, which is so powerful. We can also expect good cpu power, memory and bandwidth on the average client. Why not to put all functions on the client? Could we do all business logic on the client? Almost. There are some features, not mentioned above, that a client does not implement and is not likely to implement in the near future with the current architectures. Some of these are features like:

  • persistence
  • search
  • transactions
  • trust

Let us look at these features.

Persistence

You can not store data reliably on the client. Backup, archive, audit logging are functions that are naturally live on a server. If ever any of these are implemented on a client machine that machine is operated as a server some way. If not impossible then probably expensive to do it on a client. Individual clients can not simply backed up and they have to communicate with each other to see the same state of the data that reflects the actual state of the modeled world.

Search

Search is also something not likely to be implemented on the client. Search needs the data to search in. In some cases the data set can be copied to the client and thus the search can be implemented on the client, but in most of the cases the client will work only with a subset of the data, therefore search is implemented on the server.

Transaction

Transactions are tied to data. If you sell airplane tickets you may not want to have a system that asks all other client terminals if the seat A in row 13 is still free on a certain line. That would be a total noise like a room full of people. Perhaps something like the old stock trading rooms may resemble to that.

Trust

Clients are owned by the person having physical access to the machine. Steve Halls’ talking moose said: “Never trust a computer you can not lift.” When it comes to physical security it is crucial the other way around: Never trust a computer somebody else can lift. In an application you can not trust any communication that comes from a certain client unless you established some trust. Password, card access whatever. But the trust to maintain is up to the server, which some bad guy can not nick.

Where does it come?

Single page applications contain static HTML, CSS and JavaScript. The server communicates with the client using Ajax, REST, JSON. The users identify themselves using some authentication, probably OAuth. After that the client application displays whatever the functionality of the application needs and the server provides only data functionality. All the server should do is CRUD with access control. Whenever a client wants to access some data the server has to check if the said user has the rights to read and/or to write the data. Other than that the server need not know anything about the business functionality.

Persistence services, like MySQL or MongoDB (to name one of SQL and NoSQL) are providing REST interface and the interfaces obviously do require authentication. However the authorization scheme of these interfaces are very weak. The usual approach is: if you authenticate you are authorized. There is no document or record level access control, which I see will come in the future. First we will have applications that work as a front-end to these persistence applications that check the authorization and based on that let or deny the operation to perform. However this solution is not optimal from the performance point of view. The logical implementation of such logic is where the data is: in the database.

If and when the databases will support such a record level authorization schema we can write all our code, or almost all code in JavaScript running on the client. Applications will be developed in JavaScript. Java will only remain for enterprise integration, connection handling to legacy systems. Java will be what Cobol is today. Will it?

When null checking miserably fails

Disclaimer

Before going on I have to state that the techniques described in this article serve no practical purpose when we program Java. It is like a crossword or puzzle. It may train your brain in logical thinking, may develop your Java language knowledge or even your thinking skills. It is like a trick a magician performs. At the end you realize that nothing is what it looks like. Never do such tricks in real life programming that you may need to apply to solve this mind twister.

The Problem

I recently read an article that described the debugging case when

if(trouble != null && !trouble.isEmpty()) {
  System.out.println(“fine here: ” + trouble);
  }else{
  System.out.println(“not so fine here: ” + trouble);
}

was printing out

fine here: null

The actual bug was that the string contained “null”, a.k.a. the characters ‘n’, ‘u’, ‘l’ and ‘l’. May happen in real life especially when you concatenate strings without checking the nullity of a variable.

Then I started to think about other similar strange code and debug situation. Could I make it so that the variable is not only “null” string with these characters but really null? Seems to be crazy? Have a look at the code:

package com.javax0.blog.nullisnotnull;

public class NullIsNotNull {

	public static void troubled(){
		String trouble = new String("hallo");
		Object z = trouble != null && !trouble.toString().isEmpty() ? 
                                                          trouble.toString() : "";
		if (z == null) {
			System.out.println("z is really " + z + "?");
		}
	}
}

Will it ever print out the

z is really null?

question. The fact is that you can create a Java class containing a public static void main() so that starting that class as a Java application the sentence will be printed when main() invokes the method troubled(). In other words: I really invoke the method troubled() and the solution is not that main() prints the sentence.

In this case the variable z is not only printed as “null” but it really is null.

Hints

The solution should not involve

  • reflection
  • byte code manipulation
  • calling JNI
  • special class loaders
  • java agent
  • annotation processor

These are too heavy tools. You do not need such armory for the purpose.

Hint #1

If I change the code so that the variable z is String it does not even compile:

This is what I see in Eclipse

This is what I see in Eclipse

If it confused you even more, then sorry. Read on!

Hint #2

In the Java language String is an identifier and not a keyword. The Java Language Specification section 3.9 may give more information on the significance of this.

Hint #3

The method toString() in class Object has a return type java.lang.String. You may want to read my article about the difference between the name, simple name and canonical name of a class. It may shed some light and increase the hit count of the article.

Hint #4

To use a class declared in the same package you need not import that package.

Solution

The solution is to create a class named String in the same package. In that case the compiler will use this class instead of java.lang.String. The ternary operator in the code is simple magician trick. Something to diverge your attention from the important point. The major point is that String is not java.lang.String in the code above. If you still can not find out how to create the trick class, click on the collapsed source code block to see it in all glory:

package com.javax0.blog.nullisnotnull;

class String {
	private java.lang.String jString;
	private boolean first = true;

	public String(java.lang.String s) {
		jString = s;
	}

	public boolean isEmpty() {
		return jString.isEmpty();
	}

	@Override
	public java.lang.String toString() {
		if( first ){
			first = false;
			return jString;
		}
		return null;
	}

	public static void main(java.lang.String[] args) {
		NullIsNotNull.troubled();
	}
}

Do not use immutable in your API

Why should an API define methods that accept Guava Immutable collection types? It should not. The intent of the author of such an API is clear: she wants to declare and be safe that the method does not modify the collection that the caller passes. The problem is that it forces the caller to use Guava immutable collections and the caller can not just pass a hash or tree map, a hash set or whatever is a matching type for the actual collection. I know that this is not a big deal to convert a map or set to the immutable counterpart, but let me, as a user of a library have the freedom to decide if I want to do that.

It does not mean that I do not need guarantee. When an API documents that a collection that the caller passes will not be altered, I expect the implementation (a.k.a. library) not to do that. I expect it working just like any other feature, which is documented. However, the implementation is the internal responsibility of the library and has to be well coded during development time. It simply should not call any of the mutating methods. To assess that the development time tools have to be used. Unit tests. Code analysis.

If the library wants to be on the safe side, it can use the Guava library to “copy” the value passed as argument to the API from a collection interface to the appropriate immutable implementation. Or it can use the JDK built in Collections.unmodifiableXXX methods. Whether it does it or not is implementation detail. The API, the declaration of the classes, methods, arguments are the “interface”. The API user should only face the interface and not the implementation details.

Therefore I say: do not use immutable implementations in the method parameter list. Use them in the implementation only.

Why to use String

Recently I was tutoring juniors during a training session. One of the task was to write a class that can dwarwle maps based on some string key. The result one of the juniors created contained the following method:

	void dwarwle(HashMap<String,Dwarwable> mapToDwarwle, String dwarwleKey){
		for( final Entry<String, Dwarwable> entry : mapToDwarwle.entrySet()){
			dwarwle(entry.getKey(),entry.getValue(),dwarwleKey);
		}
	}

The code was generally ok. The method to dwarwle an individual dwarwable entry using the actual key it is assigned to in the hash map and the dwarwle key is factored to a separate method. It is so simple that I do not list here. The variable names are also meaningful so long as long you know what actually dwarwling is. The method is short and readable, but the argument list expects a HashMap instead of a Map. Why do we want to restrict the caller to use a HashMap? What if the caller has a TreeMap and for good reason. Do we want to have separate method that can dwarwle TreeMap? Certainly not.

Expect the interface, pass the implementation.

The junior changed the code replacing HashMap to Map but after five minutes or so this clever lady raised her hand and had the following question:

“If we changed HashMap to Map, why did not we change String to CharSequence?”

It is not so easy to answer a question like that when it comes out of the blue. The first thing that came up in my mind is that the reason is that we usually do it that way and that is why. But that is not a real argument, at least I would not accept anything like that and I also except my students not to accept such answer. It would be very dictator style anyway.

The real answer is that the parameter is used as a key in a map and the key of a map should be immutable (at least mutation should be resilient to equals and hashcode calculation). CharSequence is an interface and an interface in Java (unfortunately) can not guarantee immutability. Only implementation can. String is a good, widely known and well tested implementation of this interface and therefore can be a good choice. There is a good discussion about it on stackoverflow.

In this special case we expect the implementation because we need something immutable and we “can not” trust the caller to pass a character sequence implementation that is immutable. Or: we can, but it has a price. If a StringBuilder is passed and modified afterwards our dwarwling library may not work and a blame war may start. When we design an API and a library we should also think about not only the possible but also about the actual, average use.

A library is as good as it is used and not as it could be used.

This can also be applied to other products, not only libraries but this may lead too far (physics and weapon).

Autoboxing

Autoboxing is clear for all Java developers since Java 1.5 Well, I may be too optimistic. At least all developers are supposed to be ok with autoboxing. After all there is a good tutorial about it on the page of ORACLE.

Autoboxing is the phenomena when the Java compiler automatically generates code creating an object from a primitive type when it is needed. For example you can write:

Integer a = 42;

and it will automatically generate JVM code that puts the value int 42 into an Integer object. This is so nice of the compiler to do it for us that after a while we, programmers just tend to forget about the complexity behind it and from time to time we run against the wall.

For example we have double.class and Double.class. Both of them are objects (as being a class and each class itself is an object in permgen or just on the heap in post-permgen version of JVM). Both of these objects are of type Class. What is more: since Java 1.5 both of them are of type Class<Double>.

If two objects have the same type, they also have to be assignment compatible aren’t they. Seems to be an obvious statement. If you have object O a and object O b then you can assign a = b.

Looking at the code, however we may realize being oblivious instead of obvious:

public class TypeFun {
    public static void main(String[] args) {
        // public static final Class<Double>   TYPE = (Class<Double>)Class.getPrimitiveClass("double");
        System.out.println("Double.TYPE == double.class: " + (Double.TYPE == double.class));
        System.out.println("Double.TYPE == Double.class: " + (Double.TYPE == Double.class));
        System.out.println("double.class.isAssignableFrom(Double.class): " + (double.class.isAssignableFrom(Double.class)));
        System.out.println("Double.class.isAssignableFrom(double.class): " + (Double.class.isAssignableFrom(double.class)));
    }
}

resulting

Double.TYPE == double.class: true
Double.TYPE == Double.class: false
double.class.isAssignableFrom(Double.class): false
Double.class.isAssignableFrom(double.class): false

This means that the primitive pair of Double is double.class (not surprising). Even though one can not be assigned from the other. We can look at the source at least of the one of the them. The source of the class Double is in the RT.jar and it is open source. There you can see that:

public static final Class<Double>	TYPE = (Class<Double>) Class.getPrimitiveClass("double");

Why does it use that weird Class.getPrimitiveClass("double") instead of double.class? That is the primitive pair of the type Double.

The answer is not trivial and you can dig deep into the details of Java and JVM. Since double is not a class, there is nothing like double.class in reality. You can still use this literal in the Java source code though and this is where the Java language, compiler and the run-time has some strong bondage. The compiler knows that the class Double defines a field named TYPE denoting the primitive type of it. Whenever the compiler sees double.class in the source code it generates JVM code Double.TYPE. (Give it a try and then use javap to decode the generated code!) For this very reason the developer of the RT could not write

public static final Class<Double>	TYPE = double.class;

into the source of the class Double. It would compile to the code equivalent:

public static final Class<Double>	TYPE = TYPE;

How is autoboxing going on then? The source

Double b = (double)1.0;

results

         0: dconst_1      
         1: invokestatic  #2                  // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
         4: astore_1 

however if we replace the two ‘d’ letters:

double b = (Double)1.0;

then we get

         0: dconst_1      
         1: invokestatic  #2                  // Method java/lang/Double.valueOf:(D)Ljava/lang/Double;
         4: invokevirtual #3                  // Method java/lang/Double.doubleValue:()D
         7: dstore_1    

, which ineed explains a lot of things. The instances of the class double.class the class Double.class are not assign compatible. Autoboxing solves this. Java 4 was a long time ago and we, luckily forgot it.

Your homework: reread what happens related to autoboxing when you have overloaded methods that have arguments of the “class” type and the corresponding primitive type.

A classloading mystery solved

Facing a good old problem

I was struggling with some class loading issue on an application server. The libraries were defined as maven dependencies and therefore packaged into the WAR and EAR file. Some of these were also installed into the application server, unfortunately of different version. When we started the application we faced the various exceptions that were related to these types of problems. There is a good IBM article about these exceptions if you want to dig deeper.

Even though we knew that the error was caused by some double defined libraries on the classpath it took more than two hours to investigate which version we really needed, and what JAR to remove.

Same topic by accident on JUG the same week

A few days later we participated the Do you really get Classloaders? session of Java Users’ Society in Zürich. Simon Maple delivered an extremely good intro about class loaders and went into very deep details from the very start. It was an eye opening session for many. I also have to note that Simon works Zero turnaround and he evangelizes for JRebel. In such a situation a tutorial session is usually biased towards the actual product that is the bread for the tutor. In this case my opinion is that Simon was absolutely gentleman and ethic keeping an appropriate balance.

Creating a tool, to solve mystery

just to create another one

A week later I had some time to hobby program that I did not have time for a couple weeks by now and I decided to create a little tool that lists all the classes and JAR files that are on the classpath so investigation can be easier to find duplicates. I tried to rely on the fact that the classloaders are usually instances of URLClassLoader and thus the method getURLs() can be invoked to get all the directory names and JAR files.

Unit testing in such a situation can be very tricky, since the functionality is strongly tied to the class loader behavior. To be pragmatic I decided to just do some manual testing started from JUnit so long as long the code is experimental. First of all I wanted to see if the concept is worth developing it further. I was planning to execute the test and look at the log statements reporting that there were no duplicate classes and then executing the same run but second time adding some redundant dependencies to the classpath. I was using JUnit 4.10 The version is important in this case.

I executed the unit test from the command line and I saw that there were no duplicate classes, and I was happy. After that I was executing the same test from Eclipse and surprise: I got 21 classes redundantly defined!

12:41:51.670 DEBUG c.j.c.ClassCollector - There are 21 redundantly defined classes.
12:41:51.670 DEBUG c.j.c.ClassCollector - Class org/hamcrest/internal/SelfDescribingValue.class is defined 2 times:
12:41:51.671 DEBUG c.j.c.ClassCollector -   sun.misc.Launcher$AppClassLoader@7ea987ac:file:/Users/verhasp/.m2/repository/junit/junit/4.10/junit-4.10.jar
12:41:51.671 DEBUG c.j.c.ClassCollector -   sun.misc.Launcher$AppClassLoader@7ea987ac:file:/Users/verhasp/.m2/repository/org/hamcrest/hamcrest-core/1.1/hamcrest-core-1.1.jar
...

Googling a bit I could discover easily that JUnit 4.10 has an extra dependency as shown by maven

$ mvn dependency:tree
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building clalotils 1.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ clalotils ---
[INFO] com.verhas:clalotils:jar:1.0.0-SNAPSHOT
[INFO] +- junit:junit:jar:4.10:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.1:test
[INFO] +- org.slf4j:slf4j-api:jar:1.7.7:compile
[INFO] \- ch.qos.logback:logback-classic:jar:1.1.2:compile
[INFO]    \- ch.qos.logback:logback-core:jar:1.1.2:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.642s
[INFO] Finished at: Wed Sep 03 12:44:18 CEST 2014
[INFO] Final Memory: 13M/220M
[INFO] ------------------------------------------------------------------------

This is actually fixed in 4.11 so if I change the dependency to JUnit 4.11 I do not face the issue. Ok. Half of the mystery solved. But why maven command line execution does not report the classes double defined?

Extending the logging, logging more and more I could spot out a line:

12:46:19.433 DEBUG c.j.c.ClassCollector - Loading from the jar file /Users/verhasp/github/clalotils/target/surefire/surefirebooter235846110768631567.jar

What is in this file? Let’s unzip it:

$ ls -l /Users/verhasp/github/clalotils/target/surefire/surefirebooter235846110768631567.jar
ls: /Users/verhasp/github/clalotils/target/surefire/surefirebooter235846110768631567.jar: No such file or directory

The file does not exit! Seemingly maven creates this JAR file and then deletes it when the execution of the test is finished. Googling again I found the solution.

Java loads the classes from the classpath. The classpath can be defined on the command line but there are other sources for the application class loaders to fetch files from. One such a source is the manifest file of a JAR. The manifest file of a JAR file can define what other JAR files are needed to execute the classes in the JAR file. Maven creates a JAR file that contains nothing else but the manifest file defining the JARs and directories to list the classpath. These JARs and directories are NOT returned by the method getURLs(), therefore the (first version) of my little tool did not find the duplicates.

For demonstration purposes I was quick enough to make a copy of the file while the mvn test command was running, and got the following output:

$ unzip /Users/verhasp/github/clalotils/target/surefire/surefirebooter5550254534465369201\ copy.jar 
Archive:  /Users/verhasp/github/clalotils/target/surefire/surefirebooter5550254534465369201 copy.jar
  inflating: META-INF/MANIFEST.MF    
$ cat META-INF/MANIFEST.MF 
Manifest-Version: 1.0
Class-Path: file:/Users/verhasp/.m2/repository/org/apache/maven/surefi
 re/surefire-booter/2.8/surefire-booter-2.8.jar file:/Users/verhasp/.m
 2/repository/org/apache/maven/surefire/surefire-api/2.8/surefire-api-
 2.8.jar file:/Users/verhasp/github/clalotils/target/test-classes/ fil
 e:/Users/verhasp/github/clalotils/target/classes/ file:/Users/verhasp
 /.m2/repository/junit/junit/4.10/junit-4.10.jar file:/Users/verhasp/.
 m2/repository/org/hamcrest/hamcrest-core/1.1/hamcrest-core-1.1.jar fi
 le:/Users/verhasp/.m2/repository/org/slf4j/slf4j-api/1.7.7/slf4j-api-
 1.7.7.jar file:/Users/verhasp/.m2/repository/ch/qos/logback/logback-c
 lassic/1.1.2/logback-classic-1.1.2.jar file:/Users/verhasp/.m2/reposi
 tory/ch/qos/logback/logback-core/1.1.2/logback-core-1.1.2.jar
Main-Class: org.apache.maven.surefire.booter.ForkedBooter

$ 

It really is nothing else than the manifest file defining the classpath. But why does maven do it? Sonatype people, some of whom I also know personally are clever people. They don’t do such a thing just for nothing. The reason to create a temporary JAR file to start the tests is that the length of the command line is limited on some of the operating systems that the length of the classpath may exceed. Even though Java (since Java 6) itself resolves wildcard characters in the classpath it is not an option to maven. The JAR files are in different directories in the maven repo each having long name. Wildcard resolution is not recursive, there is a good reason for it, and even if it were you just would not like to have all your local repo on the classpath.

Conclusion

  • Do not use JUnit 4.10! Use something older or newer, or be prepared for surprises.
  • Understand what a classloader is and how it works, what is does.
  • Use an operating system that has huge limit for the maximum size of a command line length.
    Or just live with the limitation.

Something else? Your ideas?

Follow

Get every new post delivered to your Inbox.

Join 946 other followers