Split a File as Stream

Last week I discussed that the new (@since 1.8) method splitAsStream in the class Pattern works on the character sequence reading from it only as much as needed by the stream and not running ahead with the pattern matching creating all the possible elements and returning it as a stream. This behavior is the true nature of streams and it is the way it has to be to support high performance applications.

In this article, as I promised last week, I will show a practical application of splitAsStream where it really makes sense to process the stream and not just split the whole string into an array and work on that.

The application as you may have guessed from the title of the article is splitting up a file along some tokens. A file can be represented as a CharSequence so long (or so short) as long it is not longer than 2GB. The limit comes from the fact that the length of a CharSequence is an int value and that is 32-bit in Java. File length is long, which is 64-bit. Since reading from a file is much slower than reading from a string that is already in memory it makes sense to use the laziness of stream handling. All we need is a character sequence implementation that is backed up by a file. If we can have that we can write a program like the following:

    public static void main(String[] args) throws FileNotFoundException {
        Pattern p = Pattern.compile("[,\\.\\-;]");
        final CharSequence splitIt = 
            new FileAsCharSequence(
                   new File("path_to_source\\SplitFileAsStream.java"));
        p.splitAsStream(splitIt).forEach(System.out::println);
    }

This code does not read any part of the file, that is not needed yet, assumes that the implementation FileAsCharSequence is not reading the file greedy. The class FileAsCharSequence implementation can be:

package com.epam.training.regex;

import java.io.*;

public class FileAsCharSequence implements CharSequence {
    private final int length;
    private final StringBuilder buffer = new StringBuilder();
    private final InputStream input;

    public FileAsCharSequence(File file) throws FileNotFoundException {
        if (file.length() > (long) Integer.MAX_VALUE) {
            throw new IllegalArgumentException("File is too long to handle as character sequence");
        }
        this.length = (int) file.length();
        this.input = new FileInputStream(file);
    }

    @Override
    public int length() {
        return length;
    }

    @Override
    public char charAt(int index) {
        ensureFilled(index + 1);
        return buffer.charAt(index);
    }


    @Override
    public CharSequence subSequence(int start, int end) {
        ensureFilled(end + 1);
        return buffer.subSequence(start, end);
    }

    private void ensureFilled(int index) {
        if (buffer.length() < index) {
            buffer.ensureCapacity(index);
            final byte[] bytes = new byte[index - buffer.length()];
            try {
                int length = input.read(bytes);
                if (length < bytes.length) {
                    throw new IllegalArgumentException("File ended unexpected");
                }
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            try {
                buffer.append(new String(bytes, "utf-8"));
            } catch (UnsupportedEncodingException ignored) {
            }
        }
    }
}

This implementation reads only that many bytes from the file as it is needed for the last, actual method call to charAt or subSequence.

If you are interested you can improve this code to keep only the bytes in memory that are really needed and delete bytes that were already returned to the stream. To know what bytes are not needed a good hint is from the previous article is that the splitAsStream never touches any character that has smaller index than the first (start) argument of the last call to subSequence. However, if you implement the code in a way that it throws the characters away and fail if anyone wants to access a character that was already thrown then it will not truly implement the CharSequence interface, though it still may work well with splitAsStream so long as long the implementation does not change and it starts needed some already passed characters. (Well, I am not sure, but it may also happen in case we use some complex regular expression as a splitting pattern.)

Happy coding!

Advertisements

Split as stream

I am preparing a regular expression tutorial update for the company I work for. The original tutorial was created in 2012 and Java has changed a wee bit since then. There are new Java language releases and though the regular expression handling is still not perfect in Java (nb. it still uses non-deterministic FSA) there are some new features. I wrote about some of those in a previous post focusing on the new Java 9 methods. This time however I have to look at all the features that are new since 2012.

splitAsStream since 1.8

This way I found splitAsStream in the java.util.regex.Pattern class. It is almost the same as the method split except that what we get back is not an array of String objects but a stream. The simplest implementation would be something like

public Stream<String> splitAsStream(final CharSequence input) {
    return Arrays.stream(p.split(input));
}

I could see many such implementations when a library tried to keep pace with the new winds and support streams. Nothing is simpler then converting the array or the list available from some already existing functionality to a stream.

The solution, however, is sub-par losing the essence of streams: doing only as much work as needed. And this, I mean “doing only as much work as needed” should happen while the stream is processed and not while the developer converts the array or collection returning method to a stream returning one. Streams deliver the results in a lean way, just in time. You see how many expressions we have for being lazy.

The JDK implementation leverages the performance advantages of streams. If you look at the source code you can see immediately that the implementation is slightly more complex than the before mentioned simple solution. Lacking time I could devote to the study of the implementation and perhaps lacking interest, I used another approach to demonstrate that the implementation respects the stream laziness.

The argument to the method is a CharSequence and not a String. CharSequence is an interface implemented by String but we can also implement it. To have a feeling how lazy the stream implementation in this case is I created an implementation of CharSequence that debug prints out the method calls.

class MyCharSequence implements CharSequence {

    private String me;

    MyCharSequence(String me) {
        this.me = me;
    }

    @Override
    public int length() {
        System.out.println("MCS.length()=" + me.length());
        return me.length();
    }

    @Override
    public char charAt(int index) {
        System.out.println("MCS.charAt(" + index + ")=" + me.charAt(index));
        return me.charAt(index);
    }

    @Override
    public CharSequence subSequence(int start, int end) {
        System.out.println("MCS.subSequence(" + start + "," + end + ")="
                                              + me.subSequence(start, end));
        return me.subSequence(start, end);
    }
}

Having this class at hand, I could execute the following simple main method:

public static void main(String[] args) {
    Pattern p = Pattern.compile("[,\\.\\-;]");
    final CharSequence splitIt =
              new MyCharSequence("one.two-three,four;five;");
    p.splitAsStream(splitIt).forEach(System.out::println);
}

The output shows that the implementation is really lazy:

MCS.length()=24
MCS.length()=24
MCS.length()=24
MCS.charAt(0)=o
MCS.charAt(1)=n
MCS.charAt(2)=e
MCS.charAt(3)=.
MCS.subSequence(0,3)=one
one
MCS.length()=24
MCS.charAt(4)=t
MCS.charAt(5)=w
MCS.charAt(6)=o
MCS.charAt(7)=-
MCS.subSequence(4,7)=two
two
MCS.length()=24
MCS.charAt(8)=t
MCS.charAt(9)=h
MCS.charAt(10)=r
MCS.charAt(11)=e
MCS.charAt(12)=e
MCS.charAt(13)=,
MCS.subSequence(8,13)=three
three
MCS.length()=24
MCS.charAt(14)=f
MCS.charAt(15)=o
MCS.charAt(16)=u
MCS.charAt(17)=r
MCS.charAt(18)=;
MCS.subSequence(14,18)=four
four
MCS.length()=24
MCS.charAt(19)=f
MCS.charAt(20)=i
MCS.charAt(21)=v
MCS.charAt(22)=e
MCS.charAt(23)=;
MCS.subSequence(19,23)=five
five
MCS.length()=24

The implementation goes ahead and when it finds the first element for the stream, it returns it. We can process the string “one” and it processes further characters only when we get back for further elements. Why does it have to call the method length three times at the start? I have no idea. Perhaps it wants to be very sure that the length of the sequence is not magically changes.

Morale

This is a good example how a library has to be extended to support streams. It is not a problem if the application just converts the collection or array to a stream in the first version but if analysis shows that the performance pays back the investment then the real stream laziness should be implemented.

Side note

The implementation of CharSequence is mutable, but the processing requires that it remains constant otherwise the result is undefined. I can confirm that.

Next week I will show a possible use of the splitAsStream that makes use of the feature that it does not read further in the character sequence than it is needed.

noException in stream operation

This article is about some simple coding practice. Nothing really fancy. It is also discussed on StackOverflow.

You just refactored a huge and complex loop to a more readable stream expression forgetting that some of the method calls throw exception. The method containing this code throws this exception, it is declared in the method head. You do not want to deal with this exception on this level. It is cared about on higher levels of the call stack. And you get that annoying error in the code like a splinter under the nail.

Say you want to convert strings to IP addresses.

private static final String[] allowed = {"127.0.0.1", "::1"};

...

Arrays.stream(allowed)
      .map(InetAddress::getByName)
      .collect(Collectors.toSet());

The problem is that getByName(String host) throws UnknownHostException. This is not a RuntimeException so it has to be checked but the method map() needs a Function as an argument and Function does not throw any exception. We need a version of getByName that does not throw exception (or we need to use a different language that is more lame with exceptions).

Arrays.stream(allowed)
       .map(s -> {
                   try {
                     return InetAddress.getByName(s);
                     } catch (UnknownHostException e) {
                     throw new RuntimeException(e);
                     }
                 }).collect(Collectors.toSet());

This is just more ugly and messier than the original loop was. Could this try/catch whatever thing be put into a utility class and call some lame static method that wraps the actual call? Kind of yes. Import the following method statically:

    public interface ExceptionalSupplier<T> {
        T apply() throws Exception;
    }
...
    public static <T> T lame(ExceptionalSupplier<T> z) {
        try {
            return z.apply();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

After the import you can write

Arrays.stream(allowed)
      .map(s -> lame(() -> InetAddress.getByName(s)))
      .collect(Collectors.toSet());

the catch is that you can not just lame( ... ) the call. You have to convert it to an exceptional supplier. A functional interface that has the same look-alike as Supplier but it allows exceptions.

Still not ideal. (Well, it is Java, so what did you expect?) Okay. It is Java, but it still can be made better. What if instead of converting the expression through a supplier to an expression that is not throwing the exception we could convert the “Function” that throws the exception into one that is not throwing the exception. We need a method that accepts an exceptional function and returns a normal function. That way we can save the () -> noise in our code. Readability rulez.

    public interface ExceptionalFunction<T, R> {
        R apply(T r) throws Exception;
    }
...
    public static <T, R> Function<T, R> lame(ExceptionalFunction<T, R> f) {
        return (T r) -> {
            try {
                return f.apply(r);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        };
    }

With that utility the “final” expression will be

Collection<InetAddress> allowedAddresses =
        Arrays.stream(allowed)
              .map(lame(InetAddress::getByName))
              .collect(Collectors.toSet());

The actual utility class in the GIST defines a WrapperException extending RuntimeException so that you can catch the exception somewhere in the method, like

public myMethod() throws IOException {
try{
    ... do whatever here we do ...
   } catch (RuntTimeExceptionWrapper.WrapperException we) {
       throw (IOException) we.getCause();
   }

That way the method will throw the exception but if anywhere there is another RuntimeException that will be throwing up uncaught.

This is just a simple, nice and little trick that helps you keep up with Java, which is backward compatible instead of starting development with some other language that is modern, clutter-free and let’s you focus more on the functionality you need to code instead of coding techniques.

New Regex Features in Java 9

I recently received my complimentary copy of the book “Java 9 Regular Expressions” from Anubhava Srivastava published by Packt. The book is a good tutorial and introduction to anyone who wants to learn what regular expressions are and start from scratch. Those who know how to use regex the book may still be interesting to reiterate the knowledge and to deepen into a more complex features like zero length assertions, back references and alikes.

In this article I will focus on the regular expression features that are specific to Java 9 and were not available in earlier version of the JDK. There is not many, though.

Java 9 Regular Expression Module

The JDK in Java 9 is split up into modules. One could rightfully expect that there is a new module for the regular expression handling packages and classes. Actually there is none. The module java.base is the default module on which all other modules depend on by default and thus the classes of the exported packages are always available in Java applications. The regular expression package java.util.regex is exported by this module. This makes the development a bit simpler: there is no need to explicitly ‘require’ a module if we want to use regular expressions in our code. It seems that regular expressions are so essential to Java that it got included in the base module.

Regular Expression Classes

The package java.util.regex contains the classes

  • MatchResult
  • Matcher
  • Pattern and
  • PatternSyntaxException

The only class that has changed API is Matcher.

Changes in class Matcher

The class Matcher adds five new methods. Four of those are overloaded version of already existing methods. These are:

  • appendReplacement
  • appendTail​
  • replaceAll​
  • replaceFirst​
  • results​

The first four exists in earlier versions and there is only change in the types of the arguments (after all that is what overloading means).

appendReplacement/Tail

In case of appendReplacement and appendTail the only difference is that the argument can also be a StringBuilder and not only StringBuffer. Considering that StringBuilder introduced in Java 1.5 something like 13 years ago nobody should say that this is an inconsiderate act.

It is interesting though how the currently online version of the API JDK documents the behaviour of appendReplacement for StringBuilder argument. The older, StringBuffer argumented method explicitly documents that the replacement string may contain named references that will be replaced by the corresponding group. The StringBuilder argumented version misses this. The documentation seems like copy/paste and then edited. The text replaces “buffer” to “builder” and alike and the text documenting the named reference feature is deleted.

I tried the functionality using Java 9 build160 and the outcome is the same for these two method versions. This should not be a surprise since the source code of the two methods is the same, a simple copy/paste in the JDK with the exception of the argument type.

Seems that you can use

    @Test
    public void testAppendReplacement() {

        Pattern p = Pattern.compile("cat(?<plural>z?s?)");
        //Pattern p = Pattern.compile("cat(z?s?)");
        Matcher m = p.matcher("one catz two cats in the yard");
        StringBuilder sb = new StringBuilder();
        while (m.find()) {
            m.appendReplacement(sb, "dog${plural}");
            //m.appendReplacement(sb, "dog$001");
        }
        m.appendTail(sb);
        String result = sb.toString();
        assertEquals("one dogz two dogs in the yard", result);
    }

both the commented lines or the line above each. The documentation, however speaks only about the numbered references.

replaceAll/First

This is also an “old” method that replaces matched groups with some new strings. The only difference between the old version and the new is how the replacement string is provided. In the old version the string was given as a String calculated before the method was invoked. In the new version the string is provided as a Function<MatchResult,String>. This function is invoked for each match result and the replacement string can be calculated on the fly.

Knowing that the class Function was introduced only 3 years ago in Java 8 the new use of it in regular expressions may be a little slap-dash. Or, perhaps … may be we should see this as a hint that ten years from now, when the class Fuction will be 13 years old, we will still have Java 9?

Lets dig a bit deeper into these two methods. (Actually only to replaceAll because replaceFirst is the same except that it replaces only the first matched group.) I tried to create some not absolutely intricate examples when such a use could be valuable.

The first sample is from the JDK documentation:

    @Test
    public void demoReplaceAllFunction() {
        Pattern pattern = Pattern.compile("dog");
        Matcher matcher = pattern.matcher("zzzdogzzzdogzzz");
        String result = matcher.replaceAll(mr -> mr.group().toUpperCase());
        assertEquals("zzzDOGzzzDOGzzz", result);
    }

It is not too complex and shows the functionality. The use of a lambda expression is absolutely adequate. I can not imagine a simpler way to uppercase the constant string literal “dog”. Perhaps only writing “DOG”. Okay I am just kidding. But really this example is too simple. It is okay for the documentation where anything more complex would distract the reader from the functionality of the documented method. Really: do not expect less intricate examples in a JavaDoc. It describes how to use the API and not why the API was created an designed that way.

But here and now we will look at some more complex examples. We want to replace in a string the # characters with the numbers 1, 2, 3 and so on. The string contains numbered items and in case we insert a new one into the string we do not want to renumber manually. Sometimes we group two items, in which case we write ## and then we just want to skip a serial number for the next #. Since we have a unit test the code describes the functionality better than I can put it into words:

    @Test
    public void countSampleReplaceAllFunction() {
        AtomicInteger counter = new AtomicInteger(0);
        Pattern pattern = Pattern.compile("#+");
        Matcher matcher = pattern.matcher("# first item\n" +
                "# second item\n" +
                "## third and fourth\n" +
                "## item 5 and 6\n" +
                "# item 7");
        String result = matcher.replaceAll(mr -> "" + counter.addAndGet(mr.group().length()));
        assertEquals("1 first item\n" +
                "2 second item\n" +
                "4 third and fourth\n" +
                "6 item 5 and 6\n" +
                "7 item 7", result);
    }

The lambda expression passed to replaceAll gets the counter and calculates the next value. If we used one # then it increases it by 1 if we used two, then it adds two to the counter and so on. Because a lambda expression can not change the value of a variable in the surrounding environment (the variable has to be effectively final) the counter can not be an int or Integer variable. We need an object that holds an int value and can be changed. AtomicInteger is exactly that even if we do not use the atomic feature of it.

The next example goes even further and does some mathematical calculation. It replaces any floating point formatted number in the string to the sine value of it. That way it corrects our sentence since sin(pi) is not even close to pi, which can not be precisely expressed here. It is rather close to zero:

    @Test
    public void calculateSampleReplaceAllFunction() {
        Pattern pattern = Pattern.compile("\\d+(?:\\.\\d+)?(?:[Ee][+-]?\\d{1,2})?");
        Matcher matcher = pattern.matcher("The sin(pi) is 3.1415926");
        String result = matcher.replaceAll(mr -> "" + (Math.sin(Double.parseDouble(mr.group()))));
        assertEquals("The sin(pi) is 5.3589793170057245E-8", result);
    }

We will also play around a bit with this calculation for the demonstration of the last method in our list, which is a brand new one in the Matcher class.

Stream results()

The new method results() returns a stream of the matching results. To be more precise it returns a Stream of MatchResult objects. In the example below we use it to collect any floating point formatted number from the string and print their sine value comma separated:

    @Test
    public void resultsTest() {
        Pattern pattern = Pattern.compile("\\d+(?:\\.\\d+)?(?:[Ee][+-]?\\d{1,2})?");
        Matcher matcher = pattern.matcher("Pi is around 3.1415926 and not 3.2 even in Indiana");
        String result = String.join(",",
                matcher
                        .results()
                        .map(mr -> "" + (Math.sin(Double.parseDouble(mr.group()))))
                        .collect(Collectors.toList()));
        assertEquals("5.3589793170057245E-8,-0.058374143427580086", result);
    }

Summary

The new regular expression methods introduced in the Java 9 JDK are not essentially different from what was already available. They are neat and handy and in some situation they may ease programming. There is nothing that could have not been introduced in earlier version. This is just the way of Java to make such changes to the JDK slow and well thought. After all that is why we love Java, don’t we?

The whole code copy paste from the IDE can be found and downloaded from the following gist

What is private in Java 9?

When doing interviews I experience that most of the candidates do not know what private modifier in Java really means. They know something about it that is enough for every day coding, but far from complete. It is not a problem. Knowing enough is, well… enough. But it is still interesting to know some of the inner working of Java. In some rare cases it may shed light on some details. If nothing else then it is entertaining .orElse(whyDoYouReadIt) ?


By the way: mentioning interviews is a good opportunity to write rants even if the statements and implications related to my person are, in my view, false. After all, my person is not important and distancing myself from the fact that it criticizes me I find that article interesting and the conclusions about the interviews are important and actually totally in line with my opinion.

This article is to describe some of the Java facts hopefully in a bit more readable way than reading the language standard.

So what is private?

private is an access modifier in Java. If you have a private member (method, field, inner or nested class or a nested interface) of a class it can only be used by code, which is in the same class. The interesting question is: what happens when there are more than one classes that the private method is in? How can it be in more than one class? In case there is a class that contains another class and there is a private method inside the inner/nested class then it is inside the inner/nested class and also in the top level class.

Can a private method inside an enclosed class called from the outer class? Can a code inside an enclosed class call a private method in the outer class? The answer is yes in both cases. The sample code

package javax0.package1;

class TopLevelClass {

  void topMethod(){
    NestedClass nc = new NestedClass();
    nc.method();
  }
  
  private int z;

  interface NestedInterface {
    default void method(){
      TopLevelClass tlc = new TopLevelClass();
      tlc.z++;
    }
  }

  static class NestedClass {
    private int k;

    private void method() {
      TopLevelClass tlc = new TopLevelClass();
      k = tlc.z;
    }
  }
}

clearly shows this situation: the nested class NestedClass and the nested interface NestedInterface both contain code that can access the outer class contained private field z. Similarly the top level class code can call the private method inside the nested class. The fact that this sample code does not actually perform anything reasonable is not important in this case.

If we compile this single source file we get three class files:

  1. TopLevelClass$NestedClass.class
  2. TopLevelClass$NestedInterface.class
  3. TopLevelClass.class

That is because the JVM does not know what is top level and what is nested. The JVM does not know anything about nested and top level classes. For JVM a class is just a class. A top level class if you insist. That is mainly because the Java language 1.0 did not have nested and inner classes and the JVM was designed according to that. When inner and nested classes were introduced in Java 1.1 the compilation was modified only instead of the JVM so that the inner and nested classes remained a language feature but not handled by the JVM directly.

How can the top level class access a private method in another class that was nested in the source code, but when it is compiled it is just another “top level” class. They are on the same level. If the accessibility were changed to public then we could also access it from other classes, but we can not. The compiler will not allow any other code in other classes to access the private method and even if we did some trick to overcome the compiler the generated class fill will make the JVM to throw an exception. Private in Java is private.

What really happens is that the compiler generates special getter and setter methods to get access to the field z.

Such a bridge method is created for every private field or method that is accessed from a different class inside the same top level class. If the private whatever is not accessed from the enclosing class then the method is not generated. If the field is only read then only the getter is generated, if it is only set from outside then only the setter is generated.

This is also an interesting failure believing that a private field (or whatever) is accessible only from within the same object. That is the usual way we use these members when we program, but if the code has a reference of another instance of the same type then through that reference we can access the private fields of the other object just as good as we can access “our own” fields. Is this a rare case? You may think because you rarely program it. But in reality it is extremely frequent: the IDE usually generated the code for us and that is why some developer does not think about that. Without this it would hardly be possible to code the equals(Object other) method of classes.

What about Java 9?

So far there is nothing specific to Java 9 in this article and these days every Java article should be about Java 9 (or 10 already?).

If we look at access control generally then we have to talk about JPMS, and there are many great articles about that. codeFx has a good list of articles about it. Stephen Colebourne has nice articles.

Soon you will be able even to buy books about Java module systems from different publishers. I am in a lucky position that I can already read one in draft from Packt as a reviewer and I love it. But JPMS does not change “private” on this level. Still there will be nested classes and inner classes and bridge methods exactly the same way as before.

The little difference is that Java 9 now has private methods inside interfaces. This means that now we should be prepared to have syntethic bridge methods not only in inner and nested classes, but also in interfaces.

Takeaway …

Sometimes the simplest things are not as simple as they seem. After all the whole IT technology, science, engineering is nothing else but a bunch of zeroes and ones. It is just that we have a lot of them. Really a lot. If there was something new to you in this article then it should tell you that there are areas in the Java language and in the JVM that you may be interested to examine a bit more. For example:

  • What is the difference between a nested and an inner class?
  • Can you have a nested interface inside a class and similarly can you have an inner interface inside a class?
  • What about classes or interfaces inside an interface? Can you have an inner class in an interface? How about a nested class?
  • Can you write a code using reflection that list all the methods a class has? Will it list the synthetic methods? What modifiers will it have?
  • When you compile an inner class it will have the compiled name Outer$Inner.class, which is a legitimate name. But what happens if there is a Outer$Inner.java source file? Figure it out!
  • The generated synthetic methods also have legitim names. What happens if you define a method with that name? Is it Java specification or implementation specific what you see?
  • How deep can you nest inner and nested classes and/or interfaces? Can a nested class contain an inner class? Can an inner class contain a nested class?
  • What is your guess, why there is no symbolic name in the JDK for the synthetic modifier? Why can the actual modifier value be the same as the value for volatile fields?
  • Can you have a static field, class or method inside a nested class?

The answer to those questions and the knowledge is not practical, I know. I have never ever seen any code or project where knowing that an inner class can not have a static field was giving any advantage. On the other hand thinking about these, getting the answers may give you some joy, like solving crosswords if that is your taste and a knowledge that still may be useful aiding to the understanding the technology in a way that we do not recognize. In some situation one person just finds a bug faster than other because she “feels” the technology. That is when you can not tell what was whispering the solution to your ears but something, knowledge like the above did. But it will only if you love to dig into those fine bits of the technology.

Last a trick question, even less practical than those above just for entertainment, if you like:

Puzzle

We know that it is not possible to have a static field inside an inner (not nested) class. Is it still possible to have a compiled class file generated by the Java compiler from an inner class that has a static method?

Process Handling in Java 9

Managing operating system processes in Java was a daunting task all times. The reason for that is the poor tooling and poor API that are available. To be honest that is not without reason: Java was not meant for the purpose. If you wanted to manage OS processes, you had the shell, perl script whatever you wanted. For larger applications that faced tasks that are more complex, you were supposed to program the issue in C or C++.

When you really had to manage processes from Java you had to create operating system dependent code. It was possible, you could query some environment variables and then you could implement different behavior depending on the operating system. This approach works until Java 8 but it has several drawbacks. Testing costs more, development is more complex. As Java became more and more nature and widespread the demand for this type of applications arose. We can clearly see for example that the question https://stackoverflow.com/questions/6356340/killing-a-process-using-java put up on StackOverflow in 2011 had more than hundred thousand of views. Some application and thus some developers need a solution for this problem, which is really a solution and not a workaround.

In this case providing an API in the JDK is a solution. It will not make process-handling OS independent. The operating systems differ and process handling is an area very much tied to the OS. The system dependent part of the code is however, moves to the JDK run time and Java development team tests it and not the applications separately. It eases the burden of testing on their side. In addition, the development becomes cheaper as the API is already there and we do not need to program it separately for BSD, OSX, Linux and Windows not to mention OpenVMS. Finally, the application may run faster. Again an example. If we needed the list of the running processes then we had to start an external process that dumps the list of the processes to the standard output. The output of this process had to be captured and analyzed as string. Now, with the advent of Java 9 we will have a simple call for that, which is implemented invoking the appropriate operating system call and it does not need the execution of a separate process, nor the parsing of a string output for an information that was already there just not available in Java.
To read about all the details of process handling of Java 9 you can read the documentation currently available on the URL http://download.java.net/java/jdk9/docs/api/overview-summary.html or you can soon read the book Mastering Java 9 from Packt https://www.packtpub.com/application-development/mastering-java-9 in which I wrote the chapter about process handling. In this article I will talk about some issues why we need the new class ProcessHandle It may not be that evident for some developers who are not that much experienced with operating system processes and how the operating system works.

ProcessHandle

In short an instance of ProcessHandle represents an operating system process. All operating systems identify alive processes using PIDs which is a TLA abbreviating Process Identifier. These are small (or not that small) integer numbers. Some operating system could use something else, like names, or some cryptic strings but they do not. There is no benefit and it happens that all of them use numbers to identify processes.

When we program in OO manner we abstract the problem so that it better explains the problem we model. There is a rule however, that we should not make our model more abstract than the problem itself. That just introduces unnecessary complexity to the application increasing cost. In this case it seems to be obvious (or rather oblivious) to use int to identify a process. If the operating system does not do it more abstract then why should we? Just because in Java everything is an object? (Btw: not true.)

The reason for that is there is no one to one match between PIDs and ProcessHandle instances. Let’s re-read the first two sentences of this section:

“… ProcessHandle represents an operating system process. All operating systems identify alive processes using PIDs …”

There is that little word “alive” in the second sentence and believe me that makes a difference. Being alive is very different from being dead, although we do not have firsthand direct comparison. A ProcessHandle instance may keep a reference to a process that is already wiped off from memory. Imagine the situation that you look at the list of the processes on Linux issuing the ‘ps –ef’ command and then you see that Tomcat is eating the CPU and consumes ever increasing memory most likely because the application you deployed has a bug looping. You decide to kill the process so you look at the pid displayed and issue the command ‘kill -9 666’ if the pid happens to be 666. By that time, the process has eaten up all the memory it could have from the OS and because you did not configure any swap file on the machine, the JVM disappears without trace. The kill process will complain that there is no process for the defined pid. It may also happen that the operating system has already started a totally different process that happen to have that pid. Has it ever happened? Now you shake your head and that is, because it has never happened in your practice. On Linux by default he maximum number that can be a pid is 32768. When will that ever wrap around? Actually not a long time, but usually not so far so that the pid is reused between issuing the ‘ps’ and ‘kill’ commands. And what happens if a small embedded system sets the /proc/sys/kernel/pid_max smaller. Say much smaller, like 16 that fits to four bits? It may not be a big problem when you issue the command interactively because you are there and if the system crashes you can restarts the process or the whole system if needed. You can do the corrective action if you made a “mistake”. Java application are not that intelligent and we should not have the chance even in an embedded system to kill a process we did not want to.

process handling based on pid

To handle that situation Java has the interface ProcessHandle. Instead of pids we have ProcessHandles. If we need the ProcessHandle of the currently running process (the JVM) then we can call the static method ProcessHandle::current (note that I used the nice Java 8 method handle notation). You can get the pid of the current process calling getPid() on that instance of ProcessHandle but after a while you will not do it. It is just an old habit wanting the pid of a process. You do not need it, when you have the handle.

When you have a process handle, say processHandle you can get a Stream calling processHandle.children(). This will list the immediate offspring processes. If you want a “transitive closure”, so you want to list not only the children but also the children of children and so on you have to call processHandle.descendants(). But what if you are really greedy and want to get a hand(le) on all processes. Then you should call the static method ProcessHandle::allProcesses.
Streams are famous for being populated lazy creating the next element only when needed. In case of process list it would lead to interesting results, therefore in this case the dataset backing the stream of processes is a snapshot created when one of the children(), descendants() or allProcesses() was called.
Now we have a handle to a process. What can we do with it?

We can processHandle.destroy() it and we can also call processHandle.destroyForcibly(). That is what everybody was wanting, as per the cited stack overflow article. We can also check if the process the handle is assigned to is still alive calling processHandle.isAlive(). You can also get access to the parent process handle calling processHandle.parent(). Note that not all processes have parent process. One of them never had and any other process may be orphan when the parent process has terminated. For this reason, the return value of this method is Optional. Java 9 has new features in the Optional class we well, but that is a different story, here we focus on the processes.

If the process is still alive but we want to wait for the termination of the process, we can do it in a modern, asynchronous way. We can get a CompletableFuture from the process handle calling processHandle.onExit() that will complete when the process terminates. Java 9 has new features in the CompletableFuture class as well, but that is a different story, here we focus on the processes. Do I repeat myself?

There is an interface inside the interface ProcessHandle called Info. We can get an instance of the information from the process handle calling processHandle.info(). Through this instance we can get access to the arguments as an optional string array, to the command line as an optional string, to the command as a string and to the user the process belongs to also as an optional string. We can also get information about when the process was started and also about the total CPU usage in form of optional Instant and optional Duration. These new classes were introduced in Java 8 and Java 9 has new features … Okay it starts to be boring.

Summary

What can we do with all these features? In the book I mention I created a simple process controlling application. A similar one I had to create around 2006 in perl. It starts processes as described in a configuration file and if any of them fails it restarts. But this is only one example. There are other scenarios where process handling can be handy. You want to fill in forms and convert them to PDF. To do that you start some word processor with command line parameters to do that. The tasks are queueing and they are started one after the other to keep reasonable performance you convert at most a configurable n document in n processes. If a process takes too long you kill it, send a message about it to the person who started the request to your conversion server and schedule it to run during the night or some less busy period.

We can develop such programs in Java without using external shell, python or perl scripts, and it simply makes the project simpler and cheaper.

Who needs Java modules after all ?

Oleg Selajev asked on twitter


Jigsaw questions for 1000. I as an X want JPMS modules. What is X if it’s not a platform developer?

My answer is that X is a human being (minus platform developers, because that was a condition). We all need module system to have safer code resulting in more reliable systems resulting in better performance in the business resulting in better economy resulting in human happiness. Perhaps I went a bit too far with the conclusions but the point is that module system is needed by everyone in the industry whether they’re aware of it or not. You will get it. First of all we should start with the ob(li)vious answer to the question.

Nicolai Parlog said: Every library developer whose types are not all public.

Very true. As a library developer I want to design my libraries so that I separate the API. I wan to separate the public interface from the implementation. This is what we, programmers call “encapsulation”. This is soooo good to encapsulate. We love to do it! We do it, because it is hilarious! We love it!

On the second thought though we do it because it is a tool to create reliable bug free (he he he) software.

It helps the code developers who use my code that encapsulates the internal state and implementation to write better code. It is a bit like raising children. I as parent disallow certain things that they would do. Eat lots of chocolate, stay up late and so on. And this is for their own good, even though they do not see or understand it at the moment. Later, of course, when they grow up and become parents themselves they will understand and do the same. It is not much different with library developers and library using programmers, except, perhaps, that programmers never grow up.

Similarly I as a library developer need JPMS for the sake of the developers who are going to use my code. My library will not be better or worse just because I encapsulate. (Side note: it will be better, but not because of the lessened number of bugs in it.) I can easily get the implication that I as a library developer want JPMS the least. Who cares if you, dear programmer, shoot off your testicles using my library? It is your responsibility to call only the public API and not some frequently modified internal class and method. Or is it?

Not really. It is also my responsibility to create a library that is easy to use and hard to make mistakes using it. This is what we call usability. This is where JPMS module system comes into the picture. Without JPMS I can document what packages are public and which are implementation specific. The contract between me and the users of the library is that they will not use the privates of my library and for this well behaviour I will keep the public part of the library stable so that they do not need to change their code from release to release. (Btw. has anyone ever realized how literally bloody this name, jPMS is? What the fly? PMS, really? Not to mention seeing currently all the cramps related to it: nomen est omen. It was not a lucky baptism.)

Let’s get back on our rail: why do we need module system for that? Developers are disciplined (he he he) people and they do not want to harm themselves. They should not and they will not use internal code of the library. That is not good for them on the long run and they are well aware of that. The catch is the long run thingy. In the long run we are all dead. There will be a point during the development, typically a few days before release date, when some of the internal APIs of a library just seem lucrative to be used. In some weird way those internal calls are just exactly what you need. You know that you are not supposed to use them, and there is a good, well-mannered solution, but that needs more time to develop. It is just that with the release date approaching you do not have the time to follow that pattern. Not to mention feeling proud about the “I can do that”, “how well I know these tools” thoughts, instead of feeling shame tampering with the parts of the library that are private.

That is where Java Module System comes into the picture. You will not be able to do shortcuts. You will sigh remembering the good old days when Java was open to the whole world whatever there was on the classloader (not to mention FORTRAN programming, am I right or just the contrary I am right?) but you will follow the rules, because it just will not work otherwise.

You think that you are not vulnerable to such vanity as using the internal parts of a library. Here is a test: did you recognize that I used the expression “internal APIs of a library”? If not, then feel ashamed but don’t admit it. No need. Java Module System will help to forget things that do not exist, like internal API. Nonsense. API is public. There is no such thing as internal API. The resulting code will be better, easier to maintain, less prone to library upgrades and thus cheaper at the bottom line.

In the long run, when we are all dead our offspring will create better code and having module level encapsulation will be an obvious thing, just like world peace will be that time.

So I need Java Module System, you need it, and everybody else needs it for a better world and for the sake of world peace.

Disclaimer: The author of the article no speaks English as naive language 😉