Inspired by Actual Events: Java 6

Showing posts with label Java 6. Show all posts

Wednesday, February 18, 2015

Determining File Types in Java

Programmatically determining the type of a file can be surprisingly tricky and there have been many content-based file identification approaches proposed and implemented. There are several implementations available in Java for detecting file types and most of them are largely or solely based on files' extensions. This post looks at some of the most commonly available implementations of file type detection in Java.

Several approaches to identifying file types in Java are demonstrated in this post. Each approach is briefly described, illustrated with a code listing, and then associated with output that demonstrates how different common files are typed based on extensions. Some of the approaches are configurable, but all examples shown here use "default" mappings as provided out-of-the-box unless otherwise stated.

About the Examples

The screen snapshots shown in this post are of each listed code snippet run against certain subject files created to test the different implementations of file type detection in Java. Before covering these approaches and demonstrating the type each approach detects, I list the files under test and what they are named and what they really are.

File Name	File Extension	File Type	Type Matches Extension Convention?
actualXml.xml	xml	XML	Yes
blogPostPDF		PDF	No
blogPost.pdf	pdf	PDF	Yes
blogPost.gif	gif	GIF	Yes
blogPost.jpg	jpg	JPEG	Yes
blogPost.png	png	PNG	Yes
blogPostPDF.txt	txt	PDF	No
blogPostPDF.xml	xml	PDF	No
blogPostPNG.gif	gif	PNG	No
blogPostPNG.jpg	jpg	PNG	No
dustin.txt	txt	Text	Yes
dustin.xml	xml	Text	No
dustin		Text	No

Files.probeContentType(Path) [JDK 7]

Java SE 7 introduced the highly utilitarian Files class and that class's Javadoc succinctly describes its use: "This class consists exclusively of static methods that operate on files, directories, or other types of files" and, "in most cases, the methods defined here will delegate to the associated file system provider to perform the file operations."

The java.nio.file.Files class provides the method probeContentType(Path) that "probes the content type of a file" through use of "the installed FileTypeDetector implementations" (the Javadoc also notes that "a given invocation of the Java virtual machine maintains a system-wide list of file type detectors").

/**
 * Identify file type of file with provided path and name
 * using JDK 7's Files.probeContentType(Path).
 *
 * @param fileName Name of file whose type is desired.
 * @return String representing identified type of file with provided name.
 */
public String identifyFileTypeUsingFilesProbeContentType(final String fileName)
{
   String fileType = "Undetermined";
   final File file = new File(fileName);
   try
   {
      fileType = Files.probeContentType(file.toPath());
   }
   catch (IOException ioException)
   {
      out.println(
           "ERROR: Unable to determine file type for " + fileName
              + " due to exception " + ioException);
   }
   return fileType;
}

When the above Files.probeContentType(Path)-based approach is executed against the set of files previously defined, the output appears as shown in the next screen snapshot.

The screen snapshot indicates that the default behavior for Files.probeContentType(Path) on my JVM seems to be tightly coupled to the file extension. The files with no extensions show "null" for file type and the other listed file types match the files' extensions rather than their actual content. For example, all three files with names starting with "dustin" are really the same single-sentence text file, but Files.probeContentType(Path) states that they are each a different type and the listed types are tightly correlated with the different file extensions for essentially the same text file.

MimetypesFileTypeMap.getContentType(String) [JDK 6]

The class MimetypesFileTypeMap was introduced with Java SE 6 to provide "data typing of files via their file extension" using "the .mime.types format." The class's Javadoc explains where in a given system the class looks for MIME types file entries. My example uses the ones that come out-of-the-box with my JDK 8 installation. The next code listing demonstrates use of javax.activation.MimetypesFileTypeMap.

/**
 * Identify file type of file with provided name using
 * JDK 6's MimetypesFileTypeMap.
 *
 * See Javadoc documentation for MimetypesFileTypeMap class
 * (http://docs.oracle.com/javase/8/docs/api/javax/activation/MimetypesFileTypeMap.html)
 * for details on how to configure mapping of file types or extensions.
 */
public String identifyFileTypeUsingMimetypesFileTypeMap(final String fileName)
{    
   final MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
   return fileTypeMap.getContentType(fileName);
}

The next screen snapshot demonstrates the output from running this example against the set of test files.

This output indicates that the MimetypesFileTypeMap approach returns the MIME type of application/octet-stream for several files including the XML files and the text files without a .txt suffix. We see also that, like the previously discussed approach, this approach in some cases uses the file's extension to determine the file type and so incorrectly reports the file's actual file type when that type is different than what its extension conventionally implies.

URLConnection.getContentType()

I will be covering three methods in URLConnection that support file type detection. The first is URLConnection.getContentType(), a method that "returns the value of the content-type header field." Use of this instance method is demonstrated in the next code listing and the output from running that code against the common test files is shown after the code listing.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.getContentType().
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGetContentType(final String fileName)
{
   String fileType = "Undetermined";
   try
   {
      final URL url = new URL("file://" + fileName);
      final URLConnection connection = url.openConnection();
      fileType = connection.getContentType();
   }
   catch (MalformedURLException badUrlEx)
   {
      out.println("ERROR: Bad URL - " + badUrlEx);
   }
   catch (IOException ioEx)
   {
      out.println("Cannot access URLConnection - " + ioEx);
   }
   return fileType;
}

The file detection approach using URLConnection.getContentType() is highly coupled to files' extensions rather than the actual file type. When there is no extension, the String returned is "content/unknown."

URLConnection.guessContentTypeFromName(String)

The second file detection approach provided by URLConnection that I'll cover here is its method guessContentTypeFromName(String). Use of this static method is demonstrated in the next code listing and associated output screen snapshot.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.guessContentTypeFromName(String).
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromName(final String fileName)
{
   return URLConnection.guessContentTypeFromName(fileName);
}

URLConnection's guessContentTypeFromName(String) approach to file detection shows "null" for files without file extensions and otherwise returns file type String representations that closely mirror the files' extensions. These results are very similar to those provided by the Files.probeContentType(Path) approach shown earlier with the one notable difference being that URLConnection's guessContentTypeFromName(String) approach identifies files with .xml extension as being of file type "application/xml" while Files.probeContentType(Path) identifies these same files' types as "text/xml".

URLConnection.guessContentTypeFromStream(InputStream)

The third approach I cover that is provided by URLConnection for file type detection is via the class's static method guessContentTypeFromStream(InputStream). A code listing employing this approach and associated output in a screen snapshot are shown next.

/**
 * Identify file type of file with provided path and name
 * using JDK's URLConnection.guessContentTypeFromStream(InputStream).
 *
 * @param fileName Name of file whose type is desired.
 * @return Type of file for which name was provided.
 */
public String identifyFileTypeUsingUrlConnectionGuessContentTypeFromStream(final String fileName)
{
   String fileType;
   try
   {
      fileType = URLConnection.guessContentTypeFromStream(new FileInputStream(new File(fileName)));
   }
   catch (IOException ex)
   {
      out.println("ERROR: Unable to process file type for " + fileName + " - " + ex);
      fileType = "null";
   }
   return fileType;
}

All the file types are null! The reason for this appears to be explained by the Javadoc for the InputStream parameter of the URLConnection.guessContentTypeFromStream(InputStream) method: "an input stream that supports marks." It turns out that the instances of FileInputStream in my examples do not support marks (their calls to markSupported() all return false).

Apache Tika

All of the examples of file detection covered in this post so far have been approaches provided by the JDK. There are third-party libraries that can also be used to detect file types in Java. One example is Apache Tika, a "content analysis toolkit" that "detects and extracts metadata and text from over a thousand different file types." In this post, I look at using Tika's facade class and its detect(String) method to detect file types. The instance method call is the same in the three examples I show, but the results are different because each instance of the Tika facade class is instantiated with a different Detector.

The instantiations of Tika instances with different Detectors is shown in the next code listing.

/** Instance of Tika facade class with default configuration. */
private final Tika defaultTika = new Tika();

/** Instance of Tika facade class with MimeTypes detector. */
private final Tika mimeTika = new Tika(new MimeTypes());
his is 
/** Instance of Tika facade class with Type detector. */
private final Tika typeTika = new Tika(new TypeDetector());

With these three instances of Tika instantiated with their respective Detectors, we can call the detect(String) method on each instance for the set of test files. The code for this is shown next.

/**
 * Identify file type of file with provided name using
 * Tika's default configuration.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingDefaultTika(final String fileName)
{
   return defaultTika.detect(fileName);
}

/**
 * Identify file type of file with provided name using
 * Tika's with a MimeTypes detector.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingMimeTypesTika(final String fileName)
{
   return mimeTika.detect(fileName);
}

/**
 * Identify file type of file with provided name using
 * Tika's with a Types detector.
 *
 * @param fileName Name of file for which file type is desired.
 * @return Type of file for which file name was provided.
 */
public String identifyFileTypeUsingTypeDetectorTika(final String fileName)
{
   return typeTika.detect(fileName);
}

When the three above Tika detection examples are executed against the same set of files are used in the previous examples, the output appears as shown in the next screen snapshot.

We can see from the output that the default Tika detector reports file types similarly to some of the other approaches shown earlier in this post (very tightly tied to the file's extension). The other two demonstrated detectors state that the file type is application/octet-stream in most cases. Because I called the overloaded version of detect(-) that accepts a String, the file type detection is "based on known file name extensions."

If the overloaded detect(File) method is used instead of detect(String), the identified file type results are much better than the previous Tika examples and the previous JDK examples. In fact, the "fake" extensions don't fool the detectors as much and the default Tika detector is especially good in my examples at identifying the appropriate file type even when the extension is not the normal one associated with that file type. The code for using Tika.detect(File) and the associated output are shown next.

   /**
    * Identify file type of file with provided name using
    * Tika's default configuration.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingDefaultTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = defaultTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

   /**
    * Identify file type of file with provided name using
    * Tika's with a MimeTypes detector.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingMimeTypesTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = mimeTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

   /**
    * Identify file type of file with provided name using
    * Tika's with a Types detector.
    *
    * @param fileName Name of file for which file type is desired.
    * @return Type of file for which file name was provided.
    */
   public String identifyFileTypeUsingTypeDetectorTikaForFile(final String fileName)
   {
      String fileType;
      try
      {
         final File file = new File(fileName);
         fileType = typeTika.detect(file);
      }
      catch (IOException ioEx)
      {
         out.println("Unable to detect type of file " + fileName + " - " + ioEx);
         fileType = "Unknown";
      }
      return fileType;
   }

Caveats and Customization

File type detection is not a trivial feat to pull off. The Java approaches for file detection demonstrated in this post provide basic approaches to file detection that are often highly dependent on a file name's extension. If files are named with conventional extensions that are recognized by the file detection approach, these approaches are typically sufficient. However, if unconventional file type extensions are used or the extensions are for files with types other than that conventionally associated with that extension, most of these approaches to file detection break down without customization. Fortunately, most of these approaches provide the ability to customize the mapping of file extensions to file types. The Tika approach using Tika.detect(File) was generally the most accurate in the examples shown in this post when the extensions were not the conventional ones for the particular file types.

Conclusion

There are numerous mechanisms available for simple file type detection in Java. This post reviewed some of the standard JDK approaches for file detection and some examples of using Tika for file detection.

Monday, January 26, 2015

Reading Large Lines Slower in JDK 7 and JDK 8

I recently ran into a case where a particular task (LineContainsRegExp) in an Apache Ant build file ran considerably slower in JDK 7 and JDK 8 than it did in JDK 6for extremely long character lines. Based on a simple example adapted from the Java code used by the LineContainsRegExp task, I was able to determine that the slowness has nothing to do with the regular expression, but rather has to do with reading characters from a file. The remainder of the post demonstrates this.

For my simple test, I first wrote a small Java class to write out a file that includes a line with as many characters as specified on the command line. The simple class, FileMaker, is shown next:

FileMaker.java

import static java.lang.System.out;

import java.io.FileWriter;

/**
 * Writes a file with a line that contains the number of characters provided.
 */
public class FileMaker
{
   /**
    * Create a file with a line that has the number of characters specified.
    *
    * @param arguments Command-line arguments where the first argument is the
    *    name of the file to be written and the second argument is the number
    *   of characters to be written on a single line in the output file.
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length > 1)
      {
         final String fileName = arguments[0];
         final int maxRowSize = Integer.parseInt(arguments[1]);
         try
         {
            final FileWriter fileWriter = new FileWriter(fileName);
            for (int count = 0; count < maxRowSize; count++)
            {
               fileWriter.write('.');
            }
            fileWriter.flush();
         }
         catch (Exception ex)
         {
            out.println("ERROR: Cannot write file '" + fileName + "': " + ex.toString());
         }
      }
      else
      {
         out.println("USAGE: java FileMaker <fileName> <maxRowSize>");
         System.exit(-1);
      }
   }
}

The above Java class exists solely to generate a file with a line that has as many characters as specified (actually one more than specified when the \n is counted). The next class actually demonstrates the difference between the runtime behavior between Java 6 and Java 7. The code for this Main class is adapted from Ant classes that help perform the file reading functionality used by LineContainsRegExp without the regular expression matching. In other words, the regular expression support is not included in my example, but this class executes much more quickly for very large lines when run in Java 6 than when run in Java 7 or Java 8.

Main.java

import static java.lang.System.out;

import java.io.IOException;
import java.io.FileReader;
import java.io.Reader;
import java.util.concurrent.TimeUnit;

/**
 * Adapted from and intended to represent the basic character reading from file
 * used by the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
 */
public class Main
{
   private Reader in;
   private String line;

   public Main(final String nameOfFile)
   {
      if (nameOfFile == null || nameOfFile.isEmpty())
      {
         throw new IllegalArgumentException("ERROR: No file name provided.");
      }
      try
      {
         in = new FileReader(nameOfFile);
      }
      catch (Exception ex)
      {
         out.println("ERROR: " + ex.toString());
         System.exit(-1);
      }
   }
   

   /**
    * Read a line of characters through '\n' or end of stream and return that
    * line of characters with '\n'; adapted from readLine() method of Apache Ant
    * class org.apache.tools.ant.filters.BaseFilterReader.
    */
   protected final String readLine() throws IOException
   {
      int ch = in.read();

      if (ch == -1)
      {
         return null;
      }
        
      final StringBuilder line = new StringBuilder();

      while (ch != -1)
      {
         line.append ((char) ch);
         if (ch == '\n')
         {
            break;
         }
         ch = in.read();
      }

      return line.toString();
   }

   /**
    * Provides the next character in the stream; adapted from the method read()
    * in the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
    */
   public int read() throws IOException
   {
      int ch = -1;
 
      if (line != null)
      {
         ch = line.charAt(0);
         if (line.length() == 1)
         {
            line = null;
         }
         else
         {
            line = line.substring(1);
         }
      }
      else
      {
         for (line = readLine(); line != null; line = readLine())
         {
            if (line != null)
            {
               return read();
            }
         }
      }
      return ch;
   }

   /**
    * Process provided file and read characters from that file and display
    * those characters on standard output.
    *
    * @param arguments Command-line arguments; expect one argument which is the
    *    name of the file from which characters should be read.
    */
   public static void main(final String[] arguments) throws Exception
   {
      if (arguments.length > 0)
      {
        final long startTime = System.currentTimeMillis();
         out.println("Processing file '" + arguments[0] + "'...");
         final Main instance = new Main(arguments[0]);
         int characterInt = 0;
         int totalCharacters = 0;
         while (characterInt != -1)
         {
            characterInt = instance.read();
            totalCharacters++;
         }
         final long endTime = System.currentTimeMillis();
         out.println(
              "Elapsed Time of "
            + TimeUnit.MILLISECONDS.toSeconds(endTime - startTime)
            + " seconds for " + totalCharacters + " characters.");
      }
      else
      {
         out.println("ERROR: No file name provided.");
      }
   }
}

The runtime performance difference when comparing Java 6 to Java 7 or Java 8 is more pronounced as the lines get larger in terms of number of characters. The next screen snapshot demonstrates running the example in Java 6 (indicated by "jdk1.6" being part of path name of java launcher) and then in Java 8 (no explicit path provided because Java 8 is my default JRE) against a freshly generated file called dustin.txt that includes a line with 1 million (plus one) characters.

Although a Java 7 example is not shown in the screen snapshot above, my tests have shown that Java 7 has similar slowness to Java 8 in terms of processing very lone lines. Also, I have seen this in Windows and RedHat Linux JVMs. As the example indicates, the Java 6 version, even for a million characters in a line, reads the file in what rounds to 0 seconds. When the same compiled-for-Java-6 class file is executed with Java 8, the average length of time to handle the 1 million characters is over 150 seconds (2 1/2 minutes). This same slowness applies when the class is executed in Java 7 and also exists even when the class is compiled with JDK 7 or JDK 8.

Java 7 and Java 8 seem to be exponentially slower reading file characters as the number of characters on a line increases. When I raise the 1 million character line to 10 million characters as shown in the next screen snapshot, Java 6 still reads those very fast (still rounded to 0 seconds), but Java 8 requires over 5 hours to complete the task!

I don't know why Java 7 and Java 8 read a very long line from a file so much slower than Java 6 does. I hope that someone else can explain this. While I have several ideas for working around the issue, I would like to understand why Java 7 and Java 8 read lines with very large number of characters so much slower than Java 6. Here are the observations that can be made based on my testing:

The issue appears to be a runtime issue (JRE) rather than a JDK issue because even the file-reading class compiled with JDK 6 runs significantly slower in JRE 7 and JRE 8.
Both the Windows 8 and RedHat Linux JRE environments consistently indicated that the file reading is dramatically slower for very large lines in Java 7 and in Java 8 than in Java 6.
Processing time for reading very long lines appears to increase exponentially with the number of characters in the line in Java 7 and Java 8.

Thursday, August 16, 2012

Recent Java News - Mid-August 2012

Oracle has released significant announcements regarding Java in recent weeks. I summarize some of these in this post.

Java SE 6 End of Life Extended to 2013

Henrik Stahl announced in the post Java 6 End of Public Updates extended to February 2013 that "the Oracle JDK 6 End of Public Updates will be extended through February, 2013." As Stahl states in that post, this is additional time beyond the previously announced extension from July 2012 to November 2012. The Oracle Java SE Support Roadmap has been updated to reflect this new extension.

Java SE 7 Update 6 Released

The 14 August 2012 Oracle Press Release titled Oracle Releases New Java Updates - Java SE 7 Update 6, JavaFX 2.2 and JavaFX Scene Builder 1.0 announces the release of Java SE 7 Update 6 bundled with JavaFX 2.2.

The Java 6 Update 7 Release Notes list the significant new features of this release: JDK and JRE Support for Mac OS X, JDK for Linux on ARM, JavaFX SDK and JavaFX Runtime included in JDK 7u6 and JRE 7u6, Java Access Bridge included in JRE 7u6, Alternative Hash Function, and Changes to Security Warning Dialog Box for Trusted Signed and Self Signed Applications.

JavaOne 2012 Schedule Builder Goes Live

Tori Wieldt's post JavaOne Schedule Builder Live announces that Schedule Builder is now live and available for registered JavaOne 2012 attendees.

Oracle and Fortress

Although not technically Java-related, I found it interesting to read that Oracle is throwing in the towel on development of the Fortran-inspired programming language Fortress.

Monday, March 5, 2012

NetBeans 7.1's Internal Compiler and JDK 6 Respecting Return Type for Method Overloading

I was first exposed to Java after several years of C++ experience and so it seemed natural when I learned that Java does not allow method overloading based on return type. The Defining Methods section of the Classes and Objects lesson in the Java Language Tutorial states, "The compiler does not consider return type when differentiating methods, so you cannot declare two methods with the same signature even if they have a different return type." Indeed, as Vinit Joglekar has pointed out, "It is an accepted fact that Java does not support return-type-based method overloading." The StackOverflow thread Java - why no return type based method overloading? explains why this is the case in Java. Given this, I was surprised when a colleague showed me a code snippet with two overloaded methods with the same runtime signature that compiled in JDK 6 as long as the return types differed.

The following class compiles successfully with JDK 6, but not with JDK 7.

Compiles in JDK 6 But Not in JDK 7

package examples.dustin;

import java.util.Collection;

/**
 * Simple example that breaks in Java SE 7, but not in Java SE 6.
 * 
 * @author Dustin
 */
public class Main
{
   public static String[] collectionToArray(final Collection<String> strings)
   {
      return new String[] { "five" };
   }

   public static int[] collectionToArray(final Collection<Integer> integers)
   {
      return new int[] { 5 };
   }
   
   /**
    * Main function.
    * 
    * @param arguments The command line arguments; none expected.
    */
   public static void main(String[] arguments)
   {
   }
}

As described in Angelika Langer's What Is Method Overloading?, the above code should not compile. It doesn't in Java SE 7. In NetBeans 7.1, it doesn't. Or, more properly, it's a mixed bag.

As the screen snapshot below demonstrates, NetBeans 7.1 builds the source code above fine as shown in the Output Window when the version of Java associated with the project is Java SE 6. However, the NetBeans editor shows the red squiggly lines indicating compiler error. The next image shows what the error message is.

Although NetBeans 7.1 is able to build the code shown above when it's part of a project associated with Java SE 6 (Update 31 in this case), the code editor still reports the error shown above. This is because NetBeans uses a different version of the Java compiler internally than the one explicitly associated with the project being edited. If I change the version of Java associated with the NetBeans project for the source code above, it will no longer build in NetBeans. This is shown next.

There are a couple interesting things about this bug. First, the fact that this code compiles fine in Java SE 6 but is addressed and does not compile in Java SE 7 means that it is possible for code working in Java SE 6 to not work when the code base is moved to Java SE 7. I downloaded the latest version of JDK 6 available (Java SE 6 Update 31) and confirmed the original code shown above still builds in Java SE 6. It does not build in Java SE 7.

There are other versions of the code above that do not build in Java SE 6 or in Java SE 7. For example, if the code above is changed so that the methods return the same type, the code doesn't build even in Java SE 6. Similarly, if the Collection parameters to the two overloaded methods include a "raw" Collection (no parameterized type), it won't compile in Java SE 6 either. Of course, even if the return types are different, if the same Collection parameterized types are passed to both overloaded methods, even Java SE 6 won't compile this. These three situation are depicted in the following three screen snapshots.

The code that builds in Java SE 6 but not in Java SE 7 needs to have overloaded methods that differ in both return types and in terms of the parameterized types of the collections that make up their method parameters. It doesn't matter if a given return type matches or is related to the parameterized type of the method's parameter as long as they differ. If the return types are the same, Java SE 6 detects a compiler error. Java SE 6 also detects the error if the erased parameters boil down to the same collection after erasure and the return types are not different.

A second interesting thing about this bug is how its handled in NetBeans. Because NetBeans use its own internal compiler that does not necessarily match the version of the compiler that the developer has associated the IDE project to, you can run into situations like this where the code actually builds in the IDE, but the IDE's functionality such as code editors and project browsers indicate the code breaking.

Because NetBeans 7.1 uses its own internal Java compiler for the code editor, one might wonder if this means Java 7 features could be sneaked in and would work in the IDE but then would not build when attempted from the command line or when explicitly built in the IDE. The next screen snapshot demonstrates why that is not the case. In that snapshot, a Java 7 specific feature is in the code and NetBeans 7.1 properly warns that this is not compatible with the Java 1.6 source setting.

Bug 6182950 (methods clash algorithm should not depend on return type) has addressed the issue in JDK 7, but not in JDK 6. A related bug is Bug 6730568 ("Type erasure affects return types + type parameters"). Three additional references that provide sufficiently more background details are two StackOverflow threads (Differing behaviour between Java 5 & 6 when overloading generic methods and What is the concept of erasure in generics in java?) and the Java Tutorial entry on Type Erasure.

The colleague who showed me this issue realized its existence because NetBeans 7.1 reported the "name clash ... have the same erasure" even when he was working with Java SE 6 code. This discovery was "accidental" due to the newer version of NetBeans using Java SE 7 compiler internally, but he welcomed the opportunity to fix the issue now rather than when he migrates to Java SE 7.

I found this issue worth posting a blog post on because it provides a warning about a bug that may already be in some Java SE 6 code bases but will be made all too evident when the code base is moved to Java SE 7. I also posted this because I think it's important to be aware that modern versions of NetBeans use an internal compiler that may be of a different version than the compiler the developer has explicitly associated with his or her NetBeans project.

Wednesday, February 22, 2012

A Plethora of Java Developments in February 2012

There are several sites (Java.net, JavaWorld, JavaLobby/DZone, Java reddit, and Java Code Geeks) that I like to browse for the latest Java news. These sites are great and bring the best from around the web from the wider Java community. A nice complement to these sites is Oracle Technology Network's Java page. There are several Java stories available on the OTN Java page that originate from Oracle and its employees that are of interest to Java developers. I briefly summarize and link to a subset of these in this blog post.

New Java Language and Java VM Specifications

Alex Buckley's post JLS7 and JVMS7 online announces the availability of new versions of the Java Language Specification and of the Java Virtual Machine Specification. Besides announcing the availability of these new specifications associated explicitly with Java SE 7, the post also provides some interesting background regarding the history of these two specifications. For example, Buckley states, "Only a major Java SE release can change the Java language and JVM." I also find it interesting that these specifications no longer have names based on their edition (was Third Edition for JLS and Second Edition for JVMS). Instead, these two specifications are now named for the edition of Java SE they are associated with. To me, that's much clearer. You may wonder why this wasn't done in the first place and Buckley explains that, "Historically, the JLS and JVMS pre-date the Java Community Process so there was no Java SE platform to which they could be tied." The specifications are available in HTML or PDF format and it is anticipated that they will be published in printed book format in the future.

Java SE 6 End of Life Extended

Henrik Stahl uses the post Updated Java 6 EOL date to announce that the JDK6 "EOL date has been extended from July 2012 to November 2012, to allow some more time for the transition to JDK 7." He also highlights portions of the updated EOL policy. The Oracle Java SE Support Roadmap (AKA "Java SE EOL Policy") was updated on 15 February 2012 with this new EOL date.

New Java Updates

The Java SE News blog contains posts regarding newly available Java updates. The titles of the posts say it all: Java 7 Update 3 and Java 6 Update 31 have released!, 7u4 Developer Preview is now Available, and 6u32 Developer Preview is now Available.

JSR 354: Money and Currency API

The JCP Program Office blog features a post announcing JSR 354: Money and Currency API. This JSR proposal describes deficiencies with the already available java.util.Currency class that will be addressed by the JSR. The "proposed Specification" section states:

This JSR will provide a money and currency API for Java, targeted at all users of currencies and monetary amounts in Java. The API will provide support for standard ISO-4217 and custom currencies, and a representation of a monetary amount. It will support currency arithmetic, even across different currencies, and will support foreign currency exchange. Additionally, implementation details surrounding serialization and thread safety are to be considered.

It sounds like there is some optimism about this making it into Java SE 8.

JavaFX 2 Developer Community

Nicolas Lorain writes in JavaFX 2 and the developer community that "JavaFX 2 was only released in October 2011, but there's already a thriving developer community kicking the tires of the new kid on the block." He adds, "There's no denying that we've pretty much started from scratch with JavaFX 2." Lorain then provides evidence of the growing JavaFX 2 community that includes increasing number of discussion threads on the JavaFX 2.0 and Later forum, the developer community contributing roughly 20% of the bug reports related to JavaFX, an "increasing number of people interested in JavaFX are following me" (@javafx4you), and number of community blog posts on JavaFX (references JavaFX Links of the Week). Lorain concludes, "pretty much all the [metrics] I've seen show that JavaFX is growing in terms of popularity."

Incidentally, one of the co-authors of Pro JavaFX 2: A Definitive Guide to Rich Clients with Java Technology has provided some details about that book which will soon be in print and is already available in electronic format.

Conclusion

The Java development community seems more lively and more energetic in recent months (especially since JavaOne 2011) than it has been for years. After years of seeming stagnation, Java-related developments appear to be coming at us more quickly again. It is nice to have so many online forums to get information about these developments.

Saturday, February 5, 2011

Generating XML Schema with schemagen and Groovy

I have previously blogged on several utilitarian tools that are provided with the Java SE 6 HotSpot SDK such as jstack, javap, and so forth. I focus on another tool in the same $JAVA_HOME/bin (or %JAVA_HOME%\bin directory: schemagen. Although schemagen is typically used in conjunction with web services and/or JAXB, it can be useful in other contexts as well. Specifically, it can be used as an easy way to create a starting point XML Schema Definition (XSD) for someone who is more comfortable with Java than with XML Schema.

We'll begin with a simple Java class called Person to demonstrate the utility of schemagen. This is shown in the next code listing.

package dustin.examples;

public class Person
{
   private String lastName;

   private String firstName;

   private char middleInitial;

   private String identifier;

   /**
    * No-arguments constructor required for 'schemagen' to create XSD from
    * this Java class.  Without this "no-arg default constructor," this error
    * message will be displayed when 'schemagen' is attempted against it:
    *
    *      error: dustin.examples.Person does not have a no-arg default
    *      constructor.
    */
   public Person() {}

   public Person(final String newLastName, final String newFirstName)
   {
      this.lastName = newLastName;
      this.firstName = newFirstName;
   }

   public Person(
      final String newLastName,
      final String newFirstName,
      final char newMiddleInitial)
   {
      this.lastName = newLastName;
      this.firstName = newFirstName;
      this.middleInitial = newMiddleInitial;
   }

   public String getLastName()
   {
      return this.lastName;
   }

   public void setLastName(final String newLastName)
   {
      this.lastName = newLastName;
   }

   public String getFirstName()
   {
      return this.firstName;
   }

   public void setFirstName(final String newFirstName)
   {
      this.firstName = newFirstName;
   }

   public char getMiddleInitial()
   {
      return this.middleInitial;
   }
}

The class above is very simple, but is adequate for the first example of employing schemagen. As the comment on the no-arguments constructor in the above code states, a constructor without arguments (sometimes called a "default constructor") must be available in the class. Because other constructors are in this class, it is required that a no-args constructor be explicitly specified. I also intentionally provided get/set (accesor/mutator) methods for some of the fields, only an accessor for one of the fields, and neither for a field to demonstrate that schemagen requires get/set methods to be specified if the schema it generates includes a reference to those attributes.

The next screen snapshot demonstrates the most simple use of schemagen in which the generated XML schema file (.xsd) is generated with the default name of schema1.xsd (there is no current way to control this directly with schemagen) and is placed in the same directory from which the schemagen command is run (output location can be dictated with the -d option).

The generated XSD is shown next.

schema1.xsd

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="person">
    <xs:sequence>
      <xs:element name="firstName" type="xs:string" minOccurs="0"/>
      <xs:element name="lastName" type="xs:string" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

This is pretty convenient, but is even easier with Groovy. Suppose that one wanted to generate an XSD using schemagen and did not care about or need the original Java class. The following very simple Groovy class could be used. Very little effort is required to write this, but it's compiled .class file can be used with schemagen.

package dustin.examples;

public class Person2
{
   String lastName;

   String firstName;

   char middleInitial;

   String identifier;
}

When the above Groovy class is compiled with groovyc, its resulting Person2.class file can be viewed through another useful tool (javap) located in the same directory as schemagen. This is shown in the next screen snapshot. The most important observation is that get/set methods have been automatically generated by Groovy.

When the groovyc-generated .class file is run through schemagen, the XSD is generated and is shown next.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="person2">
    <xs:sequence>
      <xs:element name="firstName" type="xs:string" minOccurs="0"/>
      <xs:element name="identifier" type="xs:string" minOccurs="0"/>
      <xs:element name="lastName" type="xs:string" minOccurs="0"/>
      <xs:element name="middleInitial" type="xs:unsignedShort"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Because I did not explicitly state that Groovy's automatic get/set methods should not be applied, all attributes are represented in the XML. Very little Groovy, but XSD nonetheless.

It is interesting to see what happens when the attributes of the Groovy class are untyped. The next Groovy class listing does not explicitly type the class attributes.

package dustin.examples;

public class Person2
{
   def lastName;

   def firstName;

   def middleInitial;

   def identifier;
}

When schemagen is run against the above class with untyped attributes, the output XSD looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="person2">
    <xs:sequence>
      <xs:element name="firstName" type="xs:anyType" minOccurs="0"/>
      <xs:element name="identifier" type="xs:anyType" minOccurs="0"/>
      <xs:element name="lastName" type="xs:anyType" minOccurs="0"/>
      <xs:element name="middleInitial" type="xs:anyType" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Not surprisingly, the Groovy class with the untyped attributes leads to an XSD with elements of anyType. It is remarkably easy to generate Schema with schemagen from a Groovy class, but what if I don't want an attribute of the class to be part of the generated schema? Explicitly specifying an attribute as private communicates to Groovy to not automatically generate get/set methods and hence schemagen will not generate XSD elements for those attributes. The next Groovy class shows two attributes explicitly defined as private and the resultant XSD from running schemagen against the compiled Groovy class is then shown.

package dustin.examples;

public class Person2
{
   String lastName;

   String firstName;

   /** private modifier prevents auto Groovy set/get methods */
   private String middleInitial;

   /** private modifier prevents auto Groovy set/get methods */
   private String identifier;
}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="person2">
    <xs:sequence>
      <xs:element name="firstName" type="xs:string" minOccurs="0"/>
      <xs:element name="lastName" type="xs:string" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Groovy makes it really easy to generate an XSD. The Groovy code required to do so is barely more than a list of attributes and their data types.

Conclusion

The schemagen tool is a highly useful tool most commonly used in conjunction with web services and with JAXB, but I have found several instances where I have needed to create a "quick and dirty" XSD file for a variety of purposes. Taking advantage of Groovy's automatically generated set/get methods and other Groovy conciseness makes it really easy to generate a simple XSD.

Monday, January 31, 2011

Groovy, JMX, and the Attach API

One of the exciting new features of the (then Sun) HotSpot Java SE 6 release was support for the Attach API. The Attach API is "a Sun Microsystems extension that provides a mechanism to attach to a Java virtual machine." Tools such as JConsole and VisualVM "use the Attach API to attach to a target virtual machine and load its tool agent into that virtual machine." Custom Java and Groovy clients can likewise use the Attach API to monitor and manage JVMs.

There are several online resources that demonstrate Java client code that uses the Attach API. These include Daniel Fuchs's code listing for a JVMRuntimeClient, the "Setting up Monitoring and Management Programmatically" section of "Monitoring and Management Using JMX Technology," the Core Java Technology Tech Tip called "The Attach API," and the Javadoc API documentation for the class com.sun.tools.attach.VirtualMachine. These examples generally demonstrate using VirtualMachine.attach(String) to attach to the virtual machine based on its process ID in String form. This is generally followed by loading the appropriate agent with VirtualMachine.loadAgent(String), where the String parameter represents the path to the JAR file containing the agent. The VirtualMachine.detach() method can be called to detach from the previously attached JVM.

All of the previously mentioned examples demonstrate use of the Attach API from Java clients. In this post, I demonstrate use of the Attach API via Groovy. The three code listings that follow present three pieces of Groovy code but all work together as a single script. The main script is embodied in the Groovy file getJvmThreadInfo.groovy and is a simple script file that calls the other two Groovy script files (attachToVirtualMachine.groovy and displayMxBeanDerivedInfo.groovy) to attach to the virtual machine and to display details regarding that virtual machine via its MXBeans.

getJvmDetails.groovy

#!/usr/bin/env groovy
// getJvmDetails.groovy
//
// Main script for extracting JVM details via Attach API and JMX.
// Accepts single parameter which is the process ID (pid) of the Java application
// whose JVM is to be connected to.
//
import static attachToVirtualMachine.retrieveConnector
import static displayMxBeanDerivedInfo.*

def serverConnection = attachToVirtualMachine.retrieveServerConnection(args[0])

displayMxBeanDerivedInfo.displayThreadInfo(serverConnection)
displayMxBeanDerivedInfo.displayOperatingSystemInfo(serverConnection)
displayMxBeanDerivedInfo.displayRuntimeInfo(serverConnection)
displayMxBeanDerivedInfo.displayMemoryInfo(serverConnection)

attachToVirtualMachine.groovy

// attachToVirtualMachine.groovy
//
// Provide an MBeanServerConnection acquired via the Attach API.

import javax.management.MBeanServerConnection
import javax.management.remote.JMXConnector
import javax.management.remote.JMXConnectorFactory
import javax.management.remote.JMXServiceURL

import com.sun.tools.attach.VirtualMachine


/**
 * Provide an MBeanServerConnection based on the provided process ID (pid).
 *
 * @param pid Process ID of Java process for which MBeanServerConnection is
 *    desired.
 * @return MBeanServerConnection connecting to Java process identified by pid.
 */
def static MBeanServerConnection retrieveServerConnection(String pid)
{
   println "Get JMX Connector for pid ${pid}!"
   def connectorAddressStr = "com.sun.management.jmxremote.localConnectorAddress"
   def jmxUrl = retrieveUrlForPid(pid, connectorAddressStr)
   def jmxConnector = JMXConnectorFactory.connect(jmxUrl)
   return jmxConnector.getMBeanServerConnection()
}


/**
 * Provide JMX URL for attaching to the provided process ID (pid).
 *
 * @param @pid Process ID for which JMX URL is needed to connect.
 * @param @connectorAddressStr String for connecting.
 * @return JMX URL to communicating with Java process identified by pid.
 */
def static JMXServiceURL retrieveUrlForPid(String pid, String connectorAddressStr)
{
   // Attach to the target application's virtual machine
   def vm = VirtualMachine.attach(pid)

   // Obtain Connector Address
   def connectorAddress =
      vm.getAgentProperties().getProperty(connectorAddressStr)

   // Load Agent if no connector address is available
   if (connectorAddress == null)
   {
      def agent = vm.getSystemProperties().getProperty("java.home") +
          File.separator + "lib" + File.separator + "management-agent.jar"
      vm.loadAgent(agent)

      // agent is started, get the connector address
      connectorAddress =
         vm.getAgentProperties().getProperty(connectorAddressStr)
   }

   return new JMXServiceURL(connectorAddress);
}

displayMxBeanDerivedInfo.groovy

// displayMxBeanDerivedInfo.groovy
//
// Display details regarding attached virtual machine and associated MXBeans.

import java.lang.management.ManagementFactory
import java.lang.management.MemoryMXBean
import java.lang.management.OperatingSystemMXBean
import java.lang.management.RuntimeMXBean
import java.lang.management.ThreadMXBean
import javax.management.MBeanServerConnection

/**
 * Display thread information based on ThreadMXBean associated with the provided
 * MBeanServerConnection.
 *
 * @param server MBeanServerConnection to use for obtaining thread information
 *    via the ThreadMXBean.
 */
def static void displayThreadInfo(MBeanServerConnection server)
{
   def remoteThreadBean = ManagementFactory.newPlatformMXBeanProxy(
                             server,
                             ManagementFactory.THREAD_MXBEAN_NAME,
                             ThreadMXBean.class);

   println "Deadlocked Threads: ${remoteThreadBean.findDeadlockedThreads()}"
   println "Monitor Deadlocked Threads: ${remoteThreadBean.findMonitorDeadlockedThreads()}"
   println "Thread IDs: ${Arrays.toString(remoteThreadBean.getAllThreadIds())}"
   def threads = remoteThreadBean.dumpAllThreads(true, true);
   threads.each
   {
      println "\t${it.getThreadName()} (${it.getThreadId()}): ${it.getThreadState()}"
   }
}


/**
 * Display operating system information based on OperatingSystemMXBean associated
 * with the provided MBeanServerConnection.
 *
 * @param server MBeanServerConnection to use for obtaining operating system
 *    information via the OperatingSystemMXBean.
 */
def static void displayOperatingSystemInfo(MBeanServerConnection server)
{
   def osMxBean = ManagementFactory.newPlatformMXBeanProxy(
                     server,
                     ManagementFactory.OPERATING_SYSTEM_MXBEAN_NAME,
                     OperatingSystemMXBean.class)
   println "Architecture: ${osMxBean.getArch()}"
   println "Number of Processors: ${osMxBean.getAvailableProcessors()}"
   println "Name: ${osMxBean.getName()}"
   println "Version: ${osMxBean.getVersion()}"
   println "System Load Average: ${osMxBean.getSystemLoadAverage()}"
}


/**
 * Display operating system information based on RuntimeMXBean associated with
 * the provided MBeanServerConnection.
 *
 * @param server MBeanServerConnection to use for obtaining runtime information
 *    via the RuntimeMXBean.
 */
def static void displayRuntimeInfo(MBeanServerConnection server)
{
   def remoteRuntime = ManagementFactory.newPlatformMXBeanProxy(
                          server,
                          ManagementFactory.RUNTIME_MXBEAN_NAME,
                          RuntimeMXBean.class);

   println "Target Virtual Machine: ${remoteRuntime.getName()}"
   println "Uptime: ${remoteRuntime.getUptime()}"
   println "Classpath: ${remoteRuntime.getClassPath()}"
   println "Arguments: ${remoteRuntime.getInputArguments()}"
}


/**
 * Display operating system information based on MemoryMXBean associated with
 * the provided MBeanServerConnection.
 *
 * @param server MBeanServerConnection to use for obtaining memory information
 *    via the MemoryMXBean.
 */
def static void displayMemoryInfo(MBeanServerConnection server)
{
   def memoryMxBean = ManagementFactory.newPlatformMXBeanProxy(
                         server,
                         ManagementFactory.MEMORY_MXBEAN_NAME,
                         MemoryMXBean.class);
   println "HEAP Memory: ${memoryMxBean.getHeapMemoryUsage()}"
   println "Non-HEAP Memory: ${memoryMxBean.getNonHeapMemoryUsage()}"
}

The three Groovy code listings above together form a script that will use the Attach API to contact to an executing JVM without host or port specified and solely based on the provided process ID. The examples demonstrate use of several of the available MXBeans built into the virtual machine. Because it's Groovy, the code is somewhat more concise than its Java equivalent, especially because no checked exceptions must be explicitly handled and there is no need for explicit classes.

Much more could be done with the information provided via the Attach API and the MXBeans. For example, the Groovy script could be adjusted to persist some of the gathered details to build reports, Java mail could be used to alert individuals when memory constraints or other issues requiring notice occurred, and nearly anything else that can be done in Java could be added to these client scripts to make it easier to monitor and manage Java applications.

Running with the Attach API

The main implementation class of the Attach API, VirtualMachine, is located in the ${JAVA_HOME}\lib\tools.jar or %JAVA_HOME\lib\tools.jar JAR file included with the HotSpot SDK distribution. This file typically needs to be explicitly placed on the classpath of the Java client that uses the Attach API unless it is otherwise placed in a directory that is part of that inherent classpath. This is typically not required when using Groovy because it's normally already in Groovy's classpath. I briefly demonstrated this in the post Viewing Groovy Application's Classpath.

Conclusion

The Attach API makes it easier for the Java (or Groovy) developer to write clients that can communicate with, manage, and monitor Java processes. The Attach API provides the same benefits to the developer of custom JMX clients that JConsole and VisualVM leverage.

Monday, January 10, 2011

HotSpot JVM Options Displayed: -XX:+PrintFlagsInitial and -XX:+PrintFlagsFinal

Inspecting HotSpot JVM Options is a great post for those wishing to understand better the options provided by Oracle's (formerly Sun's) HotSpot Java Virtual Machine. In this thorough post, Zahid Qureshi discusses how to use the option -XX:+PrintFlagsFinal in conjunction with -XX:+UnlockDiagnosticVMOptions to "dump out every JVM option and its value." Zahid goes further than this and runs these flags against the HotSpot JVM in both client (his client output here) and server mode (his server output here), compares/diffs the options each uses (his diff results here), and analyzes some of the differences. In doing so, Zahid also demonstrates the "super option" -XX:+AggressiveOpts.

Before reading Zahid's post, I had never seen or read about the XX:+PrintFlagsFinal option. After downloading the latest SDK (Java SE 6 Update 23), I was able to use the option.

Neither specifying -version or specifying -XX:+UnlockDiagnosticVMOptions are required to use the -XX:+PrintFlagsFinal option, but there are advantages of doing so. The advantage of specifying -version is that doing so leads to only the version information being printed after the options rather than the longer Java application launcher usage being printed. The advantage of specifying the unlocking of diagnostic VM options with -XX:+UnlockDiagnosticVMOptions is that diagnostic VM options are included in the output. This is shown in the next screen snapshot.

The -XX:+PrintFlagsFinal (emphasis on "Final") option displays what options HotSpot ended up using for running Java code while -XX:+PrintFlagsInitial (emphasis on "Initial") displays what options were provided to HotSpot initially, before HotSpot has made its own tweaks. Comparing the results of -XX:+PrintFlagsFinal to -XX:+PrintFlagsInitial can obviously be helpful in understanding optimizations that HotSpot has made. More details on these options are available in Re: Bleeding Edge JVM Options and SDN Bug 6914622.

I haven't seen any formal explanation regarding these options (Zahid refers to one of them being "hidden away in the JVM source code" - an advantage of open source!). From the output generated, however, it is possible to gain a good idea of what is displayed. The output of running java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version is text with a header row stating [Global flags] followed by numerous rows with one option and its metdata per row. Each row of the output represents a particular global option and has four columns.

The first column appears to reflect the data type of the option (intx, uintx, uint64_t, bool, double, ccstr, ccstrlist). The second column is the name of the flag and the third column is the value, if any, that the flag is set to. The fourth column appears to indicate the type of flag and has values such as {product}, {pd product}, {C1 product} for client or {C2 product} for server, {C1 pd product} for client or {C2 pd product} for server, {product rw}, {diagnostic} (only if -XX:+UnlockDiagnosticVMOptions was specified), {experimental}, and {manageable}. See Eugene Kuleshov's The most complete list of -XX options for Java 6 JVM for a brief description of most of these categories as well as a listing of most of these options themselves.

The plethora of virtual machine options offered makes it possible to understand and control the virtual machine's behavior more granularly. Although a small subset of the virtual machine options for HotSpot are available online, it is always nice to be able to list them explicitly when needed. The option -XX:+PrintFlagsFinal does just that.

Tuesday, January 19, 2010

Reproducing "too many constants" Problem in Java

In my previous blog post, I blogged on the "code too large" problem and reproduced that error message. In this post, I look at the very similar "too many constants" error message (not the same thing as the question too many constants?) and demonstrate reproducing it by having too many methods in a generated Java class.

With a few small adaptations, I can adjust the Groovy script that I used to generate a Java class to reproduce the "code too large" error to instead generate a Java class to reproduce the "too many constants" error. Here is the revised script.

generateJavaClassWithManyMethods.groovy


#!/usr/bin/env groovy

import javax.tools.ToolProvider

println "You're running the script ${System.getProperty('script.name')}"
if (args.length < 2)
{
   println "Usage: javaClassGenerationWithManyMethods packageName className baseDir #methods"
   System.exit(-1)
}

// No use of "def" makes the variable available to entire script including the
// defined methods ("global" variables)

packageName = args[0]
packagePieces = packageName.tokenize(".")  // Get directory names
def fileName = args[1].endsWith(".java") ? args[1] : args[1] + ".java"
def baseDirectory = args.length > 2 ? args[2] : System.getProperty("user.dir")
numberOfMethods = args.length > 3 ? Integer.valueOf(args[3]) : 10

NEW_LINE = System.getProperty("line.separator")

// The setting up of the indentations shows off Groovy's easy feature for
// multiplying Strings and Groovy's tie of an overloaded * operator for Strings
// to the 'multiply' method.  In other words, the "multiply" and "*" used here
// are really the same thing.
SINGLE_INDENT = '   '
DOUBLE_INDENT = SINGLE_INDENT.multiply(2)
TRIPLE_INDENT = SINGLE_INDENT * 3

def outputDirectoryName = createDirectories(baseDirectory)
def generatedJavaFile = generateJavaClass(outputDirectoryName, fileName)
compileJavaClass(generatedJavaFile)


/**
 * Generate the Java class and write its source code to the output directory
 * provided and with the file name provided.  The generated class's name is
 * derived from the provided file name.
 *
 * @param outDirName Name of directory to which to write Java source.
 * @param fileName Name of file to be written to output directory (should include
 *    the .java extension).
 * @return Fully qualified file name of source file.
 */
def String generateJavaClass(outDirName, fileName)
{
   def className = fileName.substring(0,fileName.size()-5)
   outputFileName = outDirName.toString() + File.separator + fileName
   outputFile = new File(outputFileName)
   outputFile.write "package ${packageName};${NEW_LINE.multiply(2)}"  
   outputFile << "public class ${className}${NEW_LINE}"  
   outputFile << "{${NEW_LINE}"
   outputFile << "${SINGLE_INDENT}public static void main(final String[] arguments)"
   outputFile << "${NEW_LINE}${SINGLE_INDENT}{${NEW_LINE}"
   outputFile << DOUBLE_INDENT << 'final String someString = "Dustin";' << NEW_LINE
   outputFile << "${SINGLE_INDENT}}${NEW_LINE}"
   outputFile << buildManyMethods()
   outputFile << "}"
   return outputFileName
}


/**
 * Compile the provided Java source code file name.
 *
 * @param fileName Name of Java file to be compiled.
 */
def void compileJavaClass(fileName)
{
   // Use the Java SE 6 Compiler API (JSR 199)
   // http://java.sun.com/mailers/techtips/corejava/2007/tt0307.html#1
   compiler = ToolProvider.getSystemJavaCompiler()
   
   // The use of nulls in the call to JavaCompiler.run indicate use of defaults
   // of System.in, System.out, and System.err. 
   int compilationResult = compiler.run(null, null, null, fileName)
   if (compilationResult == 0)
   {
      println "${fileName} compiled successfully"
 }
   else
   {
      println "${fileName} compilation failed"
   }
}


/**
 * Create directories to which generated files will be written.
 *
 * @param baseDir The base directory used in which subdirectories for Java
 *    source packages will be generated.
 */
def String createDirectories(baseDir)
{
   def outDirName = new StringBuilder(baseDir)
   for (pkgDir in packagePieces)
   {
      outDirName << File.separator << pkgDir
   }
   outputDirectory = new File(outDirName.toString())
   if (outputDirectory.exists() && outputDirectory.isDirectory())
   {
      println "Directory ${outDirName} already exists."
   }
   else
   {
      isDirectoryCreated = outputDirectory.mkdirs()  // Use mkdirs in case multiple
      println "Directory ${outputDirectoryName} ${isDirectoryCreated ? 'is' : 'not'} created."
   }
   return outDirName.toString()
}


/**
 * Generate the body of generated Java class source code's main function.
 */
def String buildManyMethods()
{
   def str = new StringBuilder() << NEW_LINE
   for (i in 0..numberOfMethods)
   {
      str << SINGLE_INDENT << "private void doMethod${i}(){}" << NEW_LINE
   }
   return str
}

When the above script is run with a parameter of 5 for the number of methods, the following Java code is generated.


package dustin.examples;

public class LotsOfMethods
{
   public static void main(final String[] arguments)
   {
      final String someString = "Dustin";
   }

   private void doMethod0(){}
   private void doMethod1(){}
   private void doMethod2(){}
   private void doMethod3(){}
   private void doMethod4(){}
   private void doMethod5(){}
}

When I turn up the number of generated methods to 65000 methods, I run out of heap space as shown in the next screen snapshot.

The next screen snapshot shows the output of running the script again, but this time with 512 MB maximum heap space specified for the JVM.

What happens when we try to compile a class with too many methods? That is shown in the next screen snapshot that demonstrates what happens when just such a compilation is attempted.

The "too many constants" error message is shown with a pointer at the class keyword in the class definition. The method has too many methods to compile.

When I run javap -c -private dustin.examples.LotsOfMethods (-c to disassemble the code, -private to display the many private methods, and dustin.examples.LotsOfMethods is the name of the generated Java class), I see output like the following (only the first and end shown instead of displaying all 60,000+ methods).


Compiled from "LotsOfMethods.java"
public class dustin.examples.LotsOfMethods extends java.lang.Object{
public dustin.examples.LotsOfMethods();
  Code:
   0: aload_0
   1: invokespecial #1; //Method java/lang/Object."":()V
   4: return

public static void main(java.lang.String[]);
  Code:
   0: return

private void doMethod0();
  Code:
   0: return

private void doMethod1();
  Code:
   0: return

private void doMethod2();
  Code:
   0: return

private void doMethod3();
  Code:
   0: return

private void doMethod4();
  Code:
   0: return

private void doMethod5();
  Code:
   0: return

private void doMethod6();
  Code:
   0: return

private void doMethod7();
  Code:
   0: return

private void doMethod8();
  Code:
   0: return

. . .

. . .

. . .

private void doMethod64992();
  Code:
   0: return

private void doMethod64993();
  Code:
   0: return

private void doMethod64994();
  Code:
   0: return

private void doMethod64995();
  Code:
   0: return

private void doMethod64996();
  Code:
   0: return

private void doMethod64997();
  Code:
   0: return

private void doMethod64998();
  Code:
   0: return

private void doMethod64999();
  Code:
   0: return

private void doMethod65000();
  Code:
   0: return

}

Conclusion

As with the last blog post, this post used Groovy and the Java Compiler API to intentionally reproduce an error that we hope to not see very often.

Additional Reference

Error Writing File: too many constants

Inspired by Actual Events

Dustin's Pages