Paul Cager has been improving JavaCC again - this time he reduced the amount of object allocation done by a JavaCC-generated lexer.
This began with a nicely detailed bug filed by s_fuhrm that showed that a new StringBuffer is being recreated for every token that's parsed when we could really just reuse one StringBuffer and clear it out after each match. The change that Paul implemented is especially nice in that it also eliminates an
if statement (a null check), so that's an extra performance boost.
The only gotcha is that if you've been using the
image variable in your lexical actions, you'll start getting different results. For example, suppose you had a lexical specification like this:
With JavaCC 4.0, the
image would never be reused and with input data of
b12 a b42 you'd get output like this:
In other words, that
image object that
lastB is referencing would stick around. With this change in place,
image (like the Matrix) is reloaded and you'll get this:
One solution is to use
matchedToken.image instead - or you could just call
toString on the
image reference to get a copy of the String. You can see an example of this on page 59 of Generating Parsers with JavaCC. Finally, if you want to give your grammar a whirl with this change, I've posted a new javacc.jar built from the latest code in CVS here. Enjoy!