Parsing binary data with JavaCC

18 Apr 2007

A question came up on the JavaCC user's list about parsing binary data with JavaCC. In response I posted a little example grammar that parses the header section of a DOOM map data file (e.g., a WAD file). There's really not much to it; here's the lexical spec:

TOKEN : {
  <IWAD : "IWAD">
  | <PWAD : "PWAD">
  | <LONG : (["\u0000"-"\u00FF"]){4}>
}

And the syntactic spec:

void Header() : {
  Token lumpCount=null;
  Token offSet=null;
} {
  (<IWAD> | <PWAD>)
  lumpCount=<LONG> 
    { System.out.println("Lumps in this file: " + littleEndianFourByteStringToInt(lumpCount.image)); } 
  offSet=<LONG>  
    { System.out.println("Byte offset of body: " + littleEndianFourByteStringToInt(offSet.image)); }
}

Here's the utility function to decode those little endian ints to Java ints:

  static int littleEndianFourByteStringToInt(String s) {
   int accum = 0;
   for ( int shiftBy=0,counter=0; shiftBy < 32; shiftBy+=8,counter++ ) {
    char c = s.charAt(counter);
    int byteValue = (c &amp; 0xFF) << shiftBy;
    accum |= byteValue;
   }
   return accum;
  }

Most of the above bit shifting stuff is based on this helpful page, and some notes on the DOOM map file format are here. Hm. This sort of thing may be a good late addition to my JavaCC book.