How to use JavaCC with JRuby

12 Nov 2010

If you've done language hacking with Java you're probably familiar with the parser generator JavaCC. You can find a JavaCC grammar for just about anything; there are a bunch of them listed on the JavaCC site. With the parsers generated from these grammars you can do all sorts of nifty language processing stuff - checking Java code for problems, optimizing inefficient CSS, minifying Javascript, and so on.

I'm doing mostly Ruby these days, but all those JavaCC grammars are still accessible and useful through the magic of JRuby. With JRuby I can write a Ruby script that loads up a JavaCC-generated parser and rips right through whatever data I need to manage. Here's how.

Let's use the Java grammar as an example. Download this Java grammar and build it into a jar file - basically, you'll do this:

cd JavaParser
javac src/japa/parser/*.java src/japa/parser/ast/*.java \
src/japa/parser/ast/expr/*.java \
src/japa/parser/ast/visitor/*.java \
src/japa/parser/ast/body/*.java \
src/japa/parser/ast/type/*.java \
jar -cvf grammar.jar -C src/ japa/

Or, if you're in a hurry, just download grammar.jar which has all that stuff in it. Now, install JRuby if you don't already have it somewhere on your system - rvm is probably the best path for this, or you can just download the latest binary and untar it somewhere on your computer. Finally, add a little test source file to the current directory - call it and put this code in it:

public class Hello {
  public void hi() {
    System.out.println("Hello world!");

With that setup in place, the nicest way to explore JavaCC and JRuby is to use JRuby's interactive interpreter, jirb:

$ jirb
=> "1.8.7"

Great, we're in. Let's try to use that JavaParser class:

>> JavaParser
NameError: uninitialized constant JavaParser
  from (irb):2

Oops, need to import grammar and java as well:

>> require 'java'
=> true
>> require 'grammar'
=> true

Now we'll import JavaParser to save some typing:

>> java_import 'japa.parser.JavaParser'
=> Java::JapaParser::JavaParser

OK, let's load up that file. First we'll create a Java File object:

>> f ="")
=> #<Java::JavaIo::File:0x743c86e9>

Now we parse the file contents!

>> root = JavaParser.parse(f)                
=> #<Java::JapaParserAst::CompilationUnit:0x442982d8>

We now have a reference to the root of the abstract syntax tree (AST) that the parser has built from that source file. What can we do with it? Well, we can show the name of the class:

=> "Hello"

We can also do something a little more interesting - we can use a Visitor implementation that comes with this grammar to visit each node of the AST and print out the source:

>> java_import 'japa.parser.ast.visitor.DumpVisitor' 
=> Java::JapaParserAstVisitor::DumpVisitor
>> d =
=> #<Java::JapaParserAstVisitor::DumpVisitor:0x962e703>
>> d.visit(root, nil)       
=> nil
>> d.getSource
=> "public class Hello {
public void hi() {       
System.out.println("Hello world!");

We can also just use the tokenizer (i.e., the JavaParserTokenManager) if that's all we need. Here's a little program to do that - put this in a file called tokenize.rb:

require 'java'
require 'grammar'

java_import 'japa.parser.JavaParserTokenManager'
java_import 'japa.parser.JavaCharStream'
java_import ''

file_reader ="")
jcs =
jptm =

while ((t = jptm.next_token).image.size != 0) do
   puts t.image 

When you run it with jruby tokenize.rb you'll see this:

$ jruby tokenize.rb 
[ etc etc] 

This gives us the ability to use any JavaCC grammar's tokenizer to lex any data file. Very handy!

There's a lot more we can do with JRuby and JavaCC, but this should give you a feel for the possibilities. Enjoy!

Check out my JavaCC book for a much deeper dive into JavaCC, JJTree, and all that.