Skip to content

raydac/java-binary-block-parser

Repository files navigation

License Apache 2.0 Maven central Codacy Badge Java 6.0+ Android 2.0+ PayPal donation Yandex.Money donation

JBBP Logo

Introduction

Java has some embedded features to parse binary data (for instance ByteBuffer), but I wanted to work with separated bits and describe binary structure in some strong DSL(domain specific language). I was very impressed by the the Python Struct package package so that I decided to make something like that. So JBBP was born.
p.s.
For instance I have been very actively using the framework in the ZX-Poly emulator to parse snapshot files and save results.
Use cases

Change log

  • 2.0.0 (20-nov-2019)

    • removed DslBinCustom annotation, use @Bin annotation instead
    • renamed attributes of @Bin annotation to their correct form
    • reworked object mapping system, removed hacks to instantiate classes, now only mapping to objects allowed, support of private fields mapping is removed
    • minimal JDK version now 1.8+
    • minimal Android API now 3.0+
    • added support of getters and setters into mapping
    • added Object newInstance(Class) method support of mapped classes to generate local class member instances
    • added generating of makeFIELD() method for structure types in Java class converter
    • refactoring
  • 1.4.1 (20-aug-2018)

    • fixed incompatibility in tokenizer regex syntax for Android SDK #23
    • added DslBinCustom annotation to provide way to mark custom type fields for JBBPDslBuilder
    • fixed NPE in JBBPDslBuilder for not-provided outBitNumber attribute in annotated field marked as type BIT or BIT_ARRAY #20
    • naming of fields has been made more tolerant, now it is allowed to have field names with names similar to data types
    • improved check of field names in JBBPDslBuilder #21
  • 1.4.0 (29-jul-2018)

    • added type val which allows to create virtual field with calculated value, can play role of variable in scripts
    • val and var have been added into reserved words and can't be used as field names
    • added field outByteOrder attribute to Bin annotation, it affects logic of JBBPOut#Bin for output of annotated objects which fields should be saved with different byte order
    • removed deprecated method JBBPFinderException#getNameOrPath
    • added auxiliary class to build JBBP script
    • added flag JBBPParser#FLAG_NEGATIVE_EXPRESSION_RESULT_AS_ZERO to recognize negative expression result as zero
    • improved Java 6 class source generator to process FLAG_SKIP_REMAINING_FIELDS_IF_EOF for structure fields
    • added stable automatic module name igormaznitsa.jbbp into manifest file
    • added support of float, double and string java types, as floatj,doublej and stringj
  • 1.3.0 (02-sep-2017)

  • 1.2.1 (28-JUL-2016)

Full changelog

Maven dependency

The Framework has been published in the Maven Central and can be easily added as a dependency

<dependency>
  <groupId>com.igormaznitsa</groupId>
  <artifactId>jbbp</artifactId>
  <version>2.0.0</version>
</dependency>

the precompiled library jar, javadoc and sources also can be downloaded directly from the Maven central.

Hello world

The Framework is very easy in use because it has only two main classes for its functionality com.igormaznitsa.jbbp.JBBPParser (for data parsing) and com.igormaznitsa.jbbp.io.JBBPOut (for binary block writing), both of them work over low-level IO classes com.igormaznitsa.jbbp.io.JBBPBitInputStream and com.igormaznitsa.jbbp.io.JBBPBitOutputStream which are the core for the framework.

The Easiest case below shows how to parse byte array to bits.

  byte [] parsedBits = JBBPParser.prepare("bit:1 [_];").parse(new byte[]{1,2,3,4,5}).
          findFieldForType(JBBPFieldArrayBit.class).getArray();

Of course sometime it is not a comfortable way to look for parsed fields in the result, so you can use mapping of parsed data to class fields.

class Parsed {@Bin(type = BinType.BIT_ARRAY)byte[] parsed;}
Parsed parsedBits = JBBPParser.prepare("bit:1 [_] parsed;").parse(new byte[]{1,2,3,4,5}).mapTo(new Parsed());

Relative speed of different approaches in parsing

On the start the framework was created to provide comfortable way to parse data, it was not developed for hight speed. But since 1.3.0 version there has been added way to generate Java class sources from JBBP parsers and it allows to increase parsing speed dramatically (keep in mind that JBBP generates Java class sources but not compile it automaticaly). I have made some microbenchmark testing of all parsing approaches to show relative productivity of each one JMH results
The Chart shows three standard ways to parse data with JBBP

  • Dynamic - parsing into inside structures through interpretation of script written in DSL. It is not very fast way but you can generate parsers on fly even from dynamically formed strings.
  • Dynamic + map to class - Parsing into inside structures through interpretation of script and mapping parsed data directly into class instances. the way is very slow (because it uses reflections to fill fields) and recommended only if comfortable parsing is much more preffered than speed.
  • Static class - parsing with Java sources generated from a JBBP parser. It is the fastest way because Java compiler and JIT can make optimizations. The Approach can be used in High-Load systems. It is possible to compile generated Java sources on fly, you can take a look at auxiliary class which I use in tests.

Generate sources from JBBP scripts

Since 1.3.0 version, the framework can convert JBBP scripts into sources (the sources anyway need JBBP framework for work). For instance you can use such simple snippet to generate Java classes from JBBP script, potentially it can generate many classes but usually only one class

  JBBPParser parser = JBBPParser.prepare("byte a; byte b; byte c;");
  List<ResultSrcItem> generated = parser.convertToSrc(TargetSources.JAVA,"com.test.jbbp.gen.SomeClazz");
  for(ResultSrcItem i : generated) {
     for(Map.Entry<String,String> j :i.getResult().entrySet()) {
        System.out.println("Class file name "+j.getKey());                
        System.out.println("Class file content "+j.getValue());                
     }
  }

also there are special plugins for Maven and Gradle to generate sources from JBBP scripts during source generate phase
in Maven you should just add such plugin execution

 <plugin>
   <groupId>com.igormaznitsa</groupId>
   <artifactId>jbbp-maven-plugin</artifactId>
   <version>2.0.0</version>
   <executions>
     <execution>
       <id>gen-jbbp-src</id>
       <goals>
         <goal>generate</goal>
       </goals>
     </execution>
   </executions>
</plugin>

By default the maven plugin looks for files with jbbp extension in src/jbbp folder of project (it can be changed in options) and produces result java classes in target/generated-sources/jbbp folder. I use such approach in ZX-Poly emulator.

More complex example with features added as of 1.1.0

The Example shows how to parse a byte written in non-standard MSB0 order (Java has LSB0 bit order) to bit fields, print its values and pack fields back

class Flags {
      @Bin(outOrder = 1, name = "f1", type = BinType.BIT, outBitNumber = JBBPBitNumber.BITS_1, comment = "It's flag one") byte flag1;
      @Bin(outOrder = 2, name = "f2", type = BinType.BIT, outBitNumber = JBBPBitNumber.BITS_2, comment = "It's second flag") byte flag2;
      @Bin(outOrder = 3, name = "f3", type = BinType.BIT, outBitNumber = JBBPBitNumber.BITS_1, comment = "It's 3th flag") byte flag3;
      @Bin(outOrder = 4, name = "f4", type = BinType.BIT, outBitNumber = JBBPBitNumber.BITS_4, comment = "It's 4th flag") byte flag4;
    }

    final int data = 0b10101010;
    Flags parsed = JBBPParser.prepare("bit:1 f1; bit:2 f2; bit:1 f3; bit:4 f4;", JBBPBitOrder.MSB0).parse(new byte[]{(byte)data}).mapTo(new Flags());
    assertEquals(1,parsed.flag1);
    assertEquals(2,parsed.flag2);
    assertEquals(0,parsed.flag3);
    assertEquals(5,parsed.flag4);

    System.out.println(new JBBPTextWriter().Bin(parsed).Close().toString());

    assertEquals(data, JBBPOut.BeginBin(JBBPBitOrder.MSB0).Bin(parsed).End().toByteArray()[0] & 0xFF);

The Example will print in console the text below

;--------------------------------------------------------------------------------
; START : Flags
;--------------------------------------------------------------------------------
    01; f1, It's flag one
    02; f2, It's second flag
    00; f3, It's 3th flag
    05; f4, It's 4th flag
;--------------------------------------------------------------------------------
; END : Flags
;--------------------------------------------------------------------------------

Fields

Every field can have case insensitive name which should not contain '.' (because it is reserved for links to structure field values) and '#'(because it is also reserved for inside usage). Field name must not be started by a number or chars '$' and '_'. Field names are case insensitive!

int someNamedField;
byte field1;
byte field2;
byte field3;

JBBP field format, types and examples

Primitive types

The Framework supports full set of Java numeric primitives with extra types like ubyte and bit. JBBP field format, types and examples

Complex types

The Framework provides support for arrays and structures. Just keep in mind that in expressions you can make links to field values only defined before expression. JBBP field format, types and examples

Custom types

it is possible to define processors for own custom data types, for instance you can take a look at case processing three byte unsigned integer types.

Float and Double types

The Parser does not support Java float and double types out of the box. But it can be implemented through custom type processor. there is written example and test and the code can be copy pasted.

Variable fields

If you have some data which structure is variable then you can use the var type for defined field and process reading of the data manually with custom JBBPVarFieldProcessor instance.

    final JBBPParser parser = JBBPParser.prepare("short k; var; int;");
    final JBBPIntCounter counter = new JBBPIntCounter();
    final JBBPFieldStruct struct = parser.parse(new byte[]{9, 8, 33, 1, 2, 3, 4}, new JBBPVarFieldProcessor() {

      public JBBPAbstractArrayField<? extends JBBPAbstractField> readVarArray(final JBBPBitInputStream inStream, final int arraySize, final JBBPNamedFieldInfo fieldName, final int extraValue, final JBBPByteOrder byteOrder, final JBBPNamedNumericFieldMap numericFieldMap) throws IOException {
        fail("Must not be called");
        return null;
      }

      public JBBPAbstractField readVarField(final JBBPBitInputStream inStream, final JBBPNamedFieldInfo fieldName, final int extraValue, final JBBPByteOrder byteOrder, final JBBPNamedNumericFieldMap numericFieldMap) throws IOException {
        final int value = inStream.readByte();
        return new JBBPFieldByte(fieldName, (byte) value);
      }
    }, null);

NB! Some programmers trying to use only parser for complex data, it is mistake. In the case it is much better to have several easy parsers working with the same JBBPBitInputStream instance, it allows to keep decision points on Java level and make solution easier.

Special types

Special types makes some actions to skip data in input stream JBBP field format, types and examples

Byte order

Every multi-byte type can be read with different byte order. JBBP field format, types and examples

Expressions

Expressions are used for calculation of length of arrays and allow brackets and integer operators which work similar to Java operators:

  • Arithmetic operators: +,-,%,*,/,%
  • Bit operators: &,|,^,~
  • Shift operators: <<,>>,>>>
  • Brackets: (, )

Inside expression you can use integer numbers and named field values through their names (if you use fields from the same structure) or paths. Keep in your mind that you can't use array fields or fields placed inside structure arrays.

int field1;
   struct1 {
      int field2;
   }
   byte [field1+struct1.field2] data;

Commentaries

You can use commentaries inside a parser script, the parser supports the only comment format and recognizes as commentaries all text after '//' till the end of line.

 int;
    // hello commentaries
    byte field;

Expression macroses

Inside expression you can use field names and field paths, also you can use the special macros '$$' which represents the current input stream byte counter, all fields started with '$' will be recognized by the parser as special user defined variables and it will be requesting them from special user defined provider. If the array size contains the only '_' symbol then the field or structure will not have defined size and whole stream will be read.

How to get result of parsing

The Result of parsing is an instance of com.igormaznitsa.jbbp.model.JBBPFieldStruct class which represents the root invisible structure for the parsed data and you can use its inside methods to find desired fields for their names, paths or classes. All Fields are successors of com.igormaznitsa.jbbp.model.JBBPAbstractField class. To increase comfort, it is easier to use mapping to classes when the mapper automatically places values to fields of a Java class.

Example

The Example below shows how to parse a PNG file with the JBBP parser (the example taken from tests)

final InputStream pngStream = getResourceAsInputStream("picture.png");
    try {

      final JBBPParser pngParser = JBBPParser.prepare(
              "long header;"
              + "// chunks\n"
              + "chunk [_]{"
              + "   int length; "
              + "   int type; "
              + "   byte[length] data; "
              + "   int crc;"
              + "}"
      );

      final JBBPFieldStruct result = pngParser.parse(pngStream);

      assertEquals(0x89504E470D0A1A0AL,result.findFieldForNameAndType("header",JBBPFieldLong.class).getAsLong());

      final JBBPFieldArrayStruct chunks = result.findFieldForNameAndType("chunk", JBBPFieldArrayStruct.class);


      final String [] chunkNames = new String[]{"IHDR","gAMA","bKGD","pHYs","tIME","tEXt","IDAT","IEND"};
      final int [] chunkSizes = new int[]{0x0D, 0x04, 0x06, 0x09, 0x07, 0x19, 0x0E5F, 0x00};

      assertEquals(chunkNames.length,chunks.size());

      for(int i=0;i<chunks.size();i++){
        assertChunk(chunkNames[i], chunkSizes[i], (JBBPFieldStruct)chunks.getElementAt(i));
      }
    }
    finally {
      closeResource(pngStream);
    }

Also it is possible to map parsed packet to class fields

final JBBPParser pngParser = JBBPParser.prepare(
              "long header;"
              + "chunk [_]{"
              + "   int length; "
              + "   int type; "
              + "   byte[length] data; "
              + "   int crc;"
              + "}"
      );

      class Chunk {
        @Bin int length;
        @Bin int type;
        @Bin byte [] data;
        @Bin int crc;
      }

      @Bin  
      class Png {
        long header;
        Chunk [] chunk;
        public Object newInstance(Class<?> klazz){
          return klazz == Chunk.class ? new Chunk() : null;
        }
      }

      final Png png = pngParser.parse(pngStream).mapTo(new Png());

The Example from tests shows how to parse a tcp frame wrapped in a network frame

final JBBPParser tcpParser = JBBPParser.prepare(
              "skip:34; // skip bytes till the frame\n"
              + "ushort SourcePort;"
              + "ushort DestinationPort;"
              + "int SequenceNumber;"
              + "int AcknowledgementNumber;"

              + "bit:1 NONCE;"
              + "bit:3 RESERVED;"
              + "bit:4 HLEN;"

              + "bit:1 FIN;"
              + "bit:1 SYN;"
              + "bit:1 RST;"
              + "bit:1 PSH;"
              + "bit:1 ACK;"
              + "bit:1 URG;"
              + "bit:1 ECNECHO;"
              + "bit:1 CWR;"

              + "ushort WindowSize;"
              + "ushort TCPCheckSum;"
              + "ushort UrgentPointer;"
              + "byte [$$-34-HLEN*4] Option;"
              + "byte [_] Data;"
      );

      final JBBPFieldStruct result = pngParser.parse(tcpFrameStream);

F.A.Q.

Is it possible to use @Bin annotations for parsing and not only mapping?

No, @Bin annotations in classes are used only for mapping and data writing, but there is the code snippet allows to generate JBBP DSL based on detected @Bin annotations in class.

My Binary data format is too complex one to be decoded by a JBBP script

No problems! The Parser works over com.igormaznitsa.jbbp.io.JBBPBitInputStream class which can be used directly and allows read bits, bytes, count bytes and align data from a stream (for output there is similar class JBBPBitOutputStream)

I want to make a bin block instead of parsing!

The Framework contains a special helper as the class com.igormaznitsa.jbbp.io.JBBPOut which allows to build bin blocks with some kind of DSL

import static com.igormaznitsa.jbbp.io.JBBPOut.*;
...
final byte [] array =
          BeginBin().
            Bit(1, 2, 3, 0).
            Bit(true, false, true).
            Align().
            Byte(5).
            Short(1, 2, 3, 4, 5).
            Bool(true, false, true, true).
            Int(0xABCDEF23, 0xCAFEBABE).
            Long(0x123456789ABCDEF1L, 0x212356239091AB32L).
          End().toByteArray();

About

most comfortable and dynamic way to process binary data in Java and Android

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages