Jython/Pydev JavaCC grammar notes

27 Sep 2006

I came across the Jython JavaCC grammar recently - actually, the grammar in the PyDev project, which I understand is a tweaked version of the Jython grammar. At any rate, I ran JJDoc on it which produced some nice HTML output here.

Note that if you've got a JJTree grammar (e.g., python.jjt) you first need to run JJtree on it to produce a JavaCC grammar (e.g., python.jj). Then you can run JJDoc on that JavaCC grammar. If you run JJDoc on a JJTree grammar you'll get parsing errors since JJDoc doesn't recognize JJTree directives.

The grammar itself is interesting since it does lots of stuff to properly process whitespace. For example, it defines a COMMON_TOKEN_ACTION to add in a newline before an EOF token. The tokenizer uses quite a few lexical states - eighteen in addition to DEFAULT, to be exact.

Keep an eye out for my upcoming JavaCC book for more of this sort of thing. I'm working on examples of nested syntactic lookahead at the moment, great stuff!