Category Archives: Java

New Regex Features in Java 9

I recently received my complimentary copy of the book “Java 9 Regular Expressions” from Anubhava Srivastava published by Packt. The book is a good tutorial and introduction to anyone who wants to learn what regular expressions are and start from scratch. Those who know how to use regex the book may still be interesting to reiterate the knowledge and to deepen into a more complex features like zero length assertions, back references and alikes.

In this article I will focus on the regular expression features that are specific to Java 9 and were not available in earlier version of the JDK. There is not many, though.

Java 9 Regular Expression Module

The JDK in Java 9 is split up into modules. One could rightfully expect that there is a new module for the regular expression handling packages and classes. Actually there is none. The module java.base is the default module on which all other modules depend on by default and thus the classes of the exported packages are always available in Java applications. The regular expression package java.util.regex is exported by this module. This makes the development a bit simpler: there is no need to explicitly ‘require’ a module if we want to use regular expressions in our code. It seems that regular expressions are so essential to Java that it got included in the base module.

Regular Expression Classes

The package java.util.regex contains the classes

  • MatchResult
  • Matcher
  • Pattern and
  • PatternSyntaxException

The only class that has changed API is Matcher.

Changes in class Matcher

The class Matcher adds five new methods. Four of those are overloaded version of already existing methods. These are:

  • appendReplacement
  • appendTail​
  • replaceAll​
  • replaceFirst​
  • results​

The first four exists in earlier versions and there is only change in the types of the arguments (after all that is what overloading means).

appendReplacement/Tail

In case of appendReplacement and appendTail the only difference is that the argument can also be a StringBuilder and not only StringBuffer. Considering that StringBuilder introduced in Java 1.5 something like 13 years ago nobody should say that this is an inconsiderate act.

It is interesting though how the currently online version of the API JDK documents the behaviour of appendReplacement for StringBuilder argument. The older, StringBuffer argumented method explicitly documents that the replacement string may contain named references that will be replaced by the corresponding group. The StringBuilder argumented version misses this. The documentation seems like copy/paste and then edited. The text replaces “buffer” to “builder” and alike and the text documenting the named reference feature is deleted.

I tried the functionality using Java 9 build160 and the outcome is the same for these two method versions. This should not be a surprise since the source code of the two methods is the same, a simple copy/paste in the JDK with the exception of the argument type.

Seems that you can use

    @Test
    public void testAppendReplacement() {

        Pattern p = Pattern.compile("cat(?<plural>z?s?)");
        //Pattern p = Pattern.compile("cat(z?s?)");
        Matcher m = p.matcher("one catz two cats in the yard");
        StringBuilder sb = new StringBuilder();
        while (m.find()) {
            m.appendReplacement(sb, "dog${plural}");
            //m.appendReplacement(sb, "dog$001");
        }
        m.appendTail(sb);
        String result = sb.toString();
        assertEquals("one dogz two dogs in the yard", result);
    }

both the commented lines or the line above each. The documentation, however speaks only about the numbered references.

replaceAll/First

This is also an “old” method that replaces matched groups with some new strings. The only difference between the old version and the new is how the replacement string is provided. In the old version the string was given as a String calculated before the method was invoked. In the new version the string is provided as a Function<MatchResult,String>. This function is invoked for each match result and the replacement string can be calculated on the fly.

Knowing that the class Function was introduced only 3 years ago in Java 8 the new use of it in regular expressions may be a little slap-dash. Or, perhaps … may be we should see this as a hint that ten years from now, when the class Fuction will be 13 years old, we will still have Java 9?

Lets dig a bit deeper into these two methods. (Actually only to replaceAll because replaceFirst is the same except that it replaces only the first matched group.) I tried to create some not absolutely intricate examples when such a use could be valuable.

The first sample is from the JDK documentation:

    @Test
    public void demoReplaceAllFunction() {
        Pattern pattern = Pattern.compile("dog");
        Matcher matcher = pattern.matcher("zzzdogzzzdogzzz");
        String result = matcher.replaceAll(mr -> mr.group().toUpperCase());
        assertEquals("zzzDOGzzzDOGzzz", result);
    }

It is not too complex and shows the functionality. The use of a lambda expression is absolutely adequate. I can not imagine a simpler way to uppercase the constant string literal “dog”. Perhaps only writing “DOG”. Okay I am just kidding. But really this example is too simple. It is okay for the documentation where anything more complex would distract the reader from the functionality of the documented method. Really: do not expect less intricate examples in a JavaDoc. It describes how to use the API and not why the API was created an designed that way.

But here and now we will look at some more complex examples. We want to replace in a string the # characters with the numbers 1, 2, 3 and so on. The string contains numbered items and in case we insert a new one into the string we do not want to renumber manually. Sometimes we group two items, in which case we write ## and then we just want to skip a serial number for the next #. Since we have a unit test the code describes the functionality better than I can put it into words:

    @Test
    public void countSampleReplaceAllFunction() {
        AtomicInteger counter = new AtomicInteger(0);
        Pattern pattern = Pattern.compile("#+");
        Matcher matcher = pattern.matcher("# first item\n" +
                "# second item\n" +
                "## third and fourth\n" +
                "## item 5 and 6\n" +
                "# item 7");
        String result = matcher.replaceAll(mr -> "" + counter.addAndGet(mr.group().length()));
        assertEquals("1 first item\n" +
                "2 second item\n" +
                "4 third and fourth\n" +
                "6 item 5 and 6\n" +
                "7 item 7", result);
    }

The lambda expression passed to replaceAll gets the counter and calculates the next value. If we used one # then it increases it by 1 if we used two, then it adds two to the counter and so on. Because a lambda expression can not change the value of a variable in the surrounding environment (the variable has to be effectively final) the counter can not be an int or Integer variable. We need an object that holds an int value and can be changed. AtomicInteger is exactly that even if we do not use the atomic feature of it.

The next example goes even further and does some mathematical calculation. It replaces any floating point formatted number in the string to the sine value of it. That way it corrects our sentence since sin(pi) is not even close to pi, which can not be precisely expressed here. It is rather close to zero:

    @Test
    public void calculateSampleReplaceAllFunction() {
        Pattern pattern = Pattern.compile("\\d+(?:\\.\\d+)?(?:[Ee][+-]?\\d{1,2})?");
        Matcher matcher = pattern.matcher("The sin(pi) is 3.1415926");
        String result = matcher.replaceAll(mr -> "" + (Math.sin(Double.parseDouble(mr.group()))));
        assertEquals("The sin(pi) is 5.3589793170057245E-8", result);
    }

We will also play around a bit with this calculation for the demonstration of the last method in our list, which is a brand new one in the Matcher class.

Stream results()

The new method results() returns a stream of the matching results. To be more precise it returns a Stream of MatchResult objects. In the example below we use it to collect any floating point formatted number from the string and print their sine value comma separated:

    @Test
    public void resultsTest() {
        Pattern pattern = Pattern.compile("\\d+(?:\\.\\d+)?(?:[Ee][+-]?\\d{1,2})?");
        Matcher matcher = pattern.matcher("Pi is around 3.1415926 and not 3.2 even in Indiana");
        String result = String.join(",",
                matcher
                        .results()
                        .map(mr -> "" + (Math.sin(Double.parseDouble(mr.group()))))
                        .collect(Collectors.toList()));
        assertEquals("5.3589793170057245E-8,-0.058374143427580086", result);
    }

Summary

The new regular expression methods introduced in the Java 9 JDK are not essentially different from what was already available. They are neat and handy and in some situation they may ease programming. There is nothing that could have not been introduced in earlier version. This is just the way of Java to make such changes to the JDK slow and well thought. After all that is why we love Java, don’t we?

The whole code copy paste from the IDE can be found and downloaded from the following gist

Advertisements

Hacking the IntegerCache in Java 9

Five years ago I published an article in Hungarian about how to alter the IntegerCahe in the JDK. Doing that is essentially hacking the Java run-time and there is no practical advantage unless while you develop the hacking code you get a better understanding how reflection works and how the Integer class is implemented.

The Integer class has a private nested class named IntegerCache that contains Integer objects for the int values -127 to 128. When the code has to box an int to Integer and the value is within this range then the run-time uses the cache instead of creating new Integer object. This is done for speed optimization reasons bearing in mind that the int values in programs are many times in this range (think about array indexing).

The side effect of this is that many times using the identity operator to compare two Integer objects may work so long as long the value is in the range. This is typically during unit test. In operational mode, when some of the values get bigger than 128 the code fails.

Hacking the IntegerCache using reflection may also lead to mysterious side effects and note that this is something that has its effect on the whole JVM. If a servlet redefines the small Integer cached values then all other servlets running in the same Tomcat under the same JVM will suffer.

There are other articles about it on the net from Lukas Eder

Add Some Entropy to Your JVM

and on the excellent blog site Sitepoint:

https://www.sitepoint.com/10-things-you-didnt-know-about-java/

Now that I started to play around with Java 9 early access version it came to my mind if I can do the same hacking with the new version of Java. Before starting that let’s refresh what we did with Java 8.

Lukas in his article displays a sample code, I copy here:

import java.lang.reflect.Field;
import java.util.Random;
 
public class Entropy {
  public static void main(String[] args) 
  throws Exception {
 
    // Extract the IntegerCache through reflection
    Class<?> clazz = Class.forName(
      "java.lang.Integer$IntegerCache");
    Field field = clazz.getDeclaredField("cache");
    field.setAccessible(true);
    Integer[] cache = (Integer[]) field.get(clazz);
 
    // Rewrite the Integer cache
    for (int i = 0; i < cache.length; i++) {
      cache[i] = new Integer(
        new Random().nextInt(cache.length));
    }
 
    // Prove randomness
    for (int i = 0; i < 10; i++) {
      System.out.println((Integer) i);
    }
  }
}

The code gets access to the IntegerCache via reflection and then fills the cache with random values. Naughty.

We can try to execute the same code in Java 9. Do not expect much fun though. Java 9 is more serious when somebody tries to violate it.

Exception in thread "main" java.lang.reflect.InaccessibleObjectException:
  Unable to make field static final java.lang.Integer[]
  java.lang.Integer$IntegerCache.cache
  accessible: module java.base does not "opens java.lang" to unnamed module @1bc6a36e

We get an exception that did not exist in Java 8. It says that object is not accessible because the module java.base, which is the part of the run-time of the JDK that is automatically imported by each Java program does not ‘opens’ (sic) the module to the unnamed module. It is thrown from the line where we try to set the field accessible.

The object we could easily access in Java 8 is not accessible any more, because the module system protects it. A code can only access fields, methods, and other things using reflection only if the class is in the same module, or if the module opens the package for reflective access for the world or for the module that needs the access. This is done in the module-info.java module definition file, like

module myModule {
    exports com.javax0.module.demo;
    opens com.javax0.module.demo;
}

The module java.base does not open itself for reflective access for us and especially not for the unnamed module, which is the code that we run. If we create a module for our code and name it then the error message will contain the name of that module.

Can we open the module programmatically? There is an addOpens method in the java.lang.reflect.Module module. Will it work?

Bad news for the hackers that it will not. It can only open a package in a module to another module if that package is already opened for the module calling this method. That way modules can pass on to other modules the right that they already have to access some packages in a reflective way but can not open things that are not open.

But the same time it is a good news. Java 9 is not so easily hackable like Java 8 was. At least this little hole is closed. It seems that Java starts to be professional grade and not to be a toy. Soon the time will come when we can migrate serious programs from RPG and COBOL to Java. (Sorry for the joke.)

UPDATE

After the article was republished on DZONE Peter Runge pointed out that the module system in this case still can be neglected using sun.misc.Unsafe class. Based on his suggestion the whole hack is here:

public class IntegerHack {

    public static void main(String[] args)
            throws Exception {
        // Extract the IntegerCache through reflection
        Class usf = Class.forName("sun.misc.Unsafe");
        Field unsafeField = usf.getDeclaredField("theUnsafe");
        unsafeField.setAccessible(true);
        sun.misc.Unsafe unsafe = (sun.misc.Unsafe)unsafeField.get(null);
        Class<?> clazz = Class.forName("java.lang.Integer$IntegerCache");
        Field field = clazz.getDeclaredField("cache");
        Integer[] cache = (Integer[])unsafe.getObject(unsafe.staticFieldBase(field), unsafe.staticFieldOffset(field));
        // Rewrite the Integer cache
        for (int i = 0; i < cache.length; i++) {
            cache[i] = new Integer(
                    new Random().nextInt(cache.length));
        }

        // Prove randomness
        for (int i = 0; i < 10; i++) {
            System.out.println((Integer) i);
        }
    }
}

Final volatile

I was writing my book over the weekend and I was looking for some simple example that could demonstrate the real need of volatile modifier in multi-thread code. Years ago when I last time demonstrated the multi-thread capability Java was still 32-bit, or at least there was 32-bit Java available. On 32 bits you could concurrently increment long variables and because the lower and upper 32 bits were handled in different processor shift there was a chance that two threads garbled some way the non-volatile variable. Now with Java 9 this is not the case. Now Java is 64-bit and I had to demonstrate the need for a volatile on 64-bit before anyone comes up the stupid idea that it was only needed for 32-bit. (I could tell stories, but I try to keep it a professional blog. Not with much success, but still.)

I was searching stackoverflow and found this page that contains many meaningless, or less than usable answer (which clearly demonstrates that the topic is not simple) but it also contains a sample from Jed Wesley-Smith that inspired my demonstrating code for the book:

package packt.java9.by.example.thread;

public class VolatileDemonstration implements Runnable {
    private Object o = null;
    private static final Object NON_NULL = new Object();
    @Override
    public void run() {
        while( o == null );
        System.out.println("o is not null");
    }
    public static void main(String[] args)
                           throws InterruptedException {
        VolatileDemonstration me = new VolatileDemonstration();
        new Thread(me).start();
        Thread.sleep(1000);
        me.o = NON_NULL;
    }
}

This code will never finish, unless you convert the field o volatile. We also need the 1000ms sleep to allow the JIT to optimize the code of the method run() after which it never reads the variable o ever again. The JIT assumes intra-thread semantics and takes the liberty to optimize the code that way. (Java Language Specification 17.4.7)

But what happens if you have a field that you can not convert to volatile? What? Can’t you just write the keyword volatile in front of the type Object? Perhaps I was giving too much hint in the title of the article…

A final field can not be volatile. Of course: a final can not change, there is no point to re-read it from the main memory and waste CPU cycles for the synchronization of any change of it between the CPU caches. But that is not true.

Final variables can be changed once.

This is something that novice Java developers tend to forget. When an object is created any final field has the zero value. In case of an object this value is null. The field has to get its final value until the end of the initialization process, that is until the end of the execution of the constructor (any constructor). Look at the following code:

package packt.java9.by.example.thread;

public class VolatileDemonstration implements Runnable {
    private final Object o;
    private static final Object NON_NULL = new Object();
    @Override
    public void run() {
        while( o == null );
        System.out.println("o is not null");
    }

    public VolatileDemonstration() throws InterruptedException {
        new Thread(this).start();
        Thread.sleep(1000);
        this.o = NON_NULL;
    }
}

The constructor starts the new thread, sleeps and then sets the field that can not be volatile. What is the solution?

What solution? There is no solution! This is a demonstration code. Just don’t write code that does things like this: that is the solution. OK?

Takeaway

What can we learn from this? Not all of the followings can be directly implied from the above, but they are all related to the phenomenon. I could write a longer article leading to any of the followings but it would have only abused your patience.

Juniors

  • Final fields can be changed once. It is not true that they are not changing never (sic).
  • A thread may read the value of a final field once and it may not read it ever again. If the JVM runs for years the thread may keep the value in the thread context in some registry or CPU cache for years as long as it likes.
  • Never let this escape from the constructor.
  • Among other more trivial things the “never let this escape from the constructor” also means not to pass it as argument to a method that can be overridden or not under the control of the programmer, who is responsible for the current class.
  • Write well behaving code or else you will suffer the slings and arrows of your outrageous program.

Seniors

  • See the takeaways for juniors and teach them.
  • You have a nice brain twister code for education.
  • Java is not a perfect language allowing such constructs. But do not tell juniors. When they realize it they are already seniors and then it is just too late.
  • The solution is a liquid mixture in which the minor component is uniformly distributed within the major component.