Paul Cager has been improving JavaCC again - this time he reduced the amount of object allocation done by a JavaCC-generated lexer.
This began with a nicely detailed bug filed by s_fuhrm that showed that a new StringBuffer is being recreated for every token that's parsed when we could really just reuse one StringBuffer and clear it out after each match. The change that Paul implemented is especially nice in that it also eliminates an if
statement (a null check), so that's an extra performance boost.
The only gotcha is that if you've been using the image
variable in your lexical actions, you'll start getting different results. For example, suppose you had a lexical specification like this:
With JavaCC 4.0, the image
would never be reused and with input data of b12 a b42
you'd get output like this:
In other words, that image
object that lastB
is referencing would stick around. With this change in place, image
(like the Matrix) is reloaded and you'll get this:
One solution is to use matchedToken.image
instead - or you could just call toString
on the image
reference to get a copy of the String. You can see an example of this on page 59 of Generating Parsers with JavaCC. Finally, if you want to give your grammar a whirl with this change, I've posted a new javacc.jar built from the latest code in CVS here. Enjoy!