If you've done language hacking with Java you're probably familiar with the parser generator JavaCC. You can find a JavaCC grammar for just about anything; there are a bunch of them listed on the JavaCC site. With the parsers generated from these grammars you can do all sorts of nifty language processing stuff - checking Java code for problems, optimizing inefficient CSS, minifying Javascript, and so on.
I'm doing mostly Ruby these days, but all those JavaCC grammars are still accessible and useful through the magic of JRuby. With JRuby I can write a Ruby script that loads up a JavaCC-generated parser and rips right through whatever data I need to manage. Here's how.
Let's use the Java grammar as an example. Download this Java grammar and build it into a jar file - basically, you'll do this:
Or, if you're in a hurry, just download grammar.jar which has all that stuff in it. Now, install JRuby if you don't already have it somewhere on your system - rvm is probably the best path for this, or you can just download the latest binary and untar it somewhere on your computer. Finally, add a little test source file to the current directory - call it Hello.java
and put this code in it:
With that setup in place, the nicest way to explore JavaCC and JRuby is to use JRuby's interactive interpreter, jirb
:
Great, we're in. Let's try to use that JavaParser
class:
Oops, need to import grammar
and java
as well:
Now we'll import JavaParser
to save some typing:
OK, let's load up that Hello.java
file. First we'll create a Java File
object:
Now we parse the file contents!
We now have a reference to the root of the abstract syntax tree (AST) that the parser has built from that source file. What can we do with it? Well, we can show the name of the class:
We can also do something a little more interesting - we can use a Visitor
implementation that comes with this grammar to visit each node of the AST and print out the source:
We can also just use the tokenizer (i.e., the JavaParserTokenManager
) if that's all we need. Here's a little program to do that - put this in a file called tokenize.rb
:
When you run it with jruby tokenize.rb
you'll see this:
This gives us the ability to use any JavaCC grammar's tokenizer to lex any data file. Very handy!
There's a lot more we can do with JRuby and JavaCC, but this should give you a feel for the possibilities. Enjoy!
Check out my JavaCC book for a much deeper dive into JavaCC, JJTree, and all that.