Skip to content
/ hcap Public

This library reads and parses packet capture (PCAP) files on Hadoop and Hive. This allows analysis to be done directly on raw PCAP files.

License

Notifications You must be signed in to change notification settings

zzenonn/hcap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hcap Parser

Overview

Hcap is a PCAP parser for Hadoop. It uses the Kaitai Struct library to create the binary parsers for the packets, and it is also based on RIPE's hadoop-pcap library.

Although it supports several fields, only flow related data is explicitly parsed:

  TIMESTAMP
  uTIMESTAMP
  SRCMAC
  DSTMAC
  SRCIP
  DSTIP
  PROTOCOL
  LENGTH
  TTL
  ID
  SRCPORT
  DSTPORT
  LINKTYPE
  ETHERTYPE
  IPVERSION

The library, however, may be easily edited to parse other network fields based on different project needs.

Building

Simple run mvn clean install in the project root directory.

Usage

It may be used either on its own as an InputFormat, or on Hive to analyze the PCAP files in tabular format.

Additional Tips

Splitting the files as follows speeds up the analysis by as much as 98%.

#!/bin/bash

pcap_files='*.pcap'

tcpdump_cmd='tcpdump'
mkdir splitfiles

for file in $pcap_files
do
	mkdir splitfiles/$file
	$tcpdump_cmd -r $file -w splitfiles/$file/$file.split -C 127
done

About

This library reads and parses packet capture (PCAP) files on Hadoop and Hive. This allows analysis to be done directly on raw PCAP files.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages