The pcapkit
project is an open source Python program focus on PCAP parsing and analysis, which works as a stream PCAP file extractor. With support of dictdumper
, it shall support multiple output report formats.
Note that the whole project supports Python 3.4 or later.
pcapkit
is an independent open source library, using only dictdumper
as its formatted output dumper.
There is a project called
jspcapy
works onpcapkit
, which is a command line tool for PCAP extraction but now DEPRECATED.
Unlike popular PCAP file extractors, such as Scapy
, dpkt
, pyshark
, and etc, pcapkit
uses streaming strategy to read input files. That is to read frame by frame, decrease occupation on memory, as well as enhance efficiency in some way.
In pcapkit
, all files can be described as following six parts.
- Interface (
pcapkit.interface
) -- user interface for thepcapkit
library, which standardise and simplify the usage of this library - Foundation (
pcapkit.foundation
) -- synthesise file I/O and protocol analysis, coordinate information exchange in all network layers - Reassembly (
pcapkit.reassembly
) -- base on algorithms described inRFC 815
, implement datagram reassembly of IP and TCP packets - IPSuite (
pcapkit.ipsuite
) -- collection of constructors for Internet Protocol Suite - Protocols (
pcapkit.protocols
) -- collection of all protocol family, with detail implementation and methods - Utilities (
pcapkit.utilities
) -- collection of four utility functions and classes - CoreKit (
pcapkit.corekit
) -- core utilities forpcapkit
implementation - ToolKit (
pcapkit.toolkit
) -- capability tools forpcapkit
implementation - DumpKit (
pcapkit.dumpkit
) -- dump utilities forpcapkit
implementation
Besides, due to complexity of pcapkit
, its extraction procedure takes around 0.01 seconds per packet, which is not ideal enough. Thus, pcapkit
introduced alternative extraction engines to accelerate this procedure. By now, pcapkit
supports Scapy
, DPKT
, and PyShark
. Plus, pcapkit
supports two strategies of multiprocessing (server
& pipeline
). For more information, please refer to the document.
Engine | Performance (seconds per packet) |
---|---|
dpkt |
0.0003609057267506917 |
scapy |
0.002443440357844035 |
default |
0.017523006995519 |
pipeline |
0.014550424114863079 |
server |
0.04667099356651306 |
pyshark |
0.0792640733718872 |
Note that
pcapkit
supports Python versions since 3.4
Simply run the following to install the current version from PyPI:
pip install pypcapkit
Or install the latest version from the git repository:
git clone https://github.com/JarryShaw/PyPCAPKit.git
cd pypcapkit
pip install -e .
# and to update at any time
git pull
And since pcapkit
supports various extraction engines, and extensive plug-in functions, you may want to install the optional ones:
# for DPKT only
pip install pypcapkit[DPKT]
# for Scapy only
pip install pypcapkit[Scapy]
# for PyShark only
pip install pypcapkit[PyShark]
# and to install all the optional packages
pip install pypcapkit[all]
# or to do this explicitly
pip install pypcapkit dpkt scapy pyshark
NAME | DESCRIPTION |
---|---|
extract |
extract a PCAP file |
analyse |
analyse application layer packets |
reassemble |
reassemble fragmented datagrams |
trace |
trace TCP packet flows |
NAME | DESCRIPTION |
---|---|
JSON |
JavaScript Object Notation (JSON) format |
PLIST |
macOS Property List (PLIST) format |
TREE |
Tree-View text format |
PCAP |
PCAP format |
NAME | DESCRIPTION |
---|---|
RAW |
no specific layer |
LINK |
data-link layer |
INET |
internet layer |
TRANS |
transport layer |
APP |
application layer |
NAME | DESCRIPTION |
---|---|
PCAPKit |
the default engine |
MPServer |
the multiprocessing engine with server process strategy |
MPPipeline |
the multiprocessing engine with pipeline strategy |
DPKT |
the DPKT engine |
Scapy |
the Scapy engine |
PyShark |
the PyShark engine |
NAME | DESCRIPTION |
---|---|
NoPayload |
No-Payload |
Raw |
Raw Packet Data |
ARP |
Address Resolution Protocol |
Ethernet |
Ethernet Protocol |
L2TP |
Layer Two Tunnelling Protocol |
OSPF |
Open Shortest Path First |
RARP |
Reverse Address Resolution Protocol |
VLAN |
802.1Q Customer VLAN Tag Type |
AH |
Authentication Header |
HIP |
Host Identity Protocol |
HOPOPT |
IPv6 Hop-by-Hop Options |
IP |
Internet Protocol |
IPsec |
Internet Protocol Security |
IPv4 |
Internet Protocol version 4 |
IPv6 |
Internet Protocol version 6 |
IPv6_Frag |
Fragment Header for IPv6 |
IPv6_Opts |
Destination Options for IPv6 |
IPv6_Route |
Routing Header for IPv6 |
IPX |
Internetwork Packet Exchange |
MH |
Mobility Header |
TCP |
Transmission Control Protocol |
UDP |
User Datagram Protocol |
HTTP |
Hypertext Transfer Protocol |
Documentation can be found in submodules of pcapkit
. Or, you may find usage sample in the test
folder. For further information, please refer to the source code -- the docstrings should help you :)
ps: help
function in Python should always help you out.
The following part was originally described in
jspcapy
, which is now deprecated and merged into this repository.
As it shows in the help manual, it is quite easy to use:
$ pcapkit --help
usage: pcapkit [-h] [-V] [-o file-name] [-f format] [-j] [-p] [-t] [-a] [-v]
[-F] [-E PKG] [-P PROTOCOL] [-L LAYER]
input-file-name
PCAP file extractor and formatted exporter
positional arguments:
input-file-name The name of input pcap file. If ".pcap" omits, it will
be automatically appended.
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-o file-name, --output file-name
The name of input pcap file. If format extension
omits, it will be automatically appended.
-f format, --format format
Print a extraction report in the specified output
format. Available are all formats supported by
dictdumper, e.g.: json, plist, and tree.
-j, --json Display extraction report as json. This will yield
"raw" output that may be used by external tools. This
option overrides all other options.
-p, --plist Display extraction report as macOS Property List
(plist). This will yield "raw" output that may be used
by external tools. This option overrides all other
options.
-t, --tree Display extraction report as tree view text. This will
yield "raw" output that may be used by external tools.
This option overrides all other options.
-a, --auto-extension If output file extension omits, append automatically.
-v, --verbose Show more information.
-F, --files Split each frame into different files.
-E PKG, --engine PKG Indicate extraction engine. Note that except default
engine, all other engines need support of corresponding
packages.
-P PROTOCOL, --protocol PROTOCOL
Indicate extraction stops after which protocol.
-L LAYER, --layer LAYER
Indicate extract frames until which layer.
Under most circumstances, you should indicate the name of input PCAP file (extension may omit) and at least, output format (json
, plist
, or tree
). Once format unspecified, the name of output file must have proper extension (*.json
, *.plist
, or *.txt
), otherwise FormatError
will raise.
As for verbose
mode, detailed information will print while extraction (as following examples). And auto-extension
flag works for the output file, to indicate whether extensions should be appended.
As described in test
folder, pcapkit
is quite easy to use, with simply three verbs as its main interface. Several scenarios are shown as below.
-
extract a PCAP file and dump the result to a specific file (with no reassembly)
import pcapkit # dump to a PLIST file with no frame storage (property frame disabled) plist = pcapkit.extract(fin='in.pcap', fout='out.plist', format='plist', store=False) # dump to a JSON file with no extension auto-complete json = pcapkit.extract(fin='in.cap', fout='out.json', format='json', extension=False) # dump to a folder with each tree-view text file per frame tree = pcapkit.extract(fin='in.pcap', fout='out', format='tree', files=True)
-
extract a PCAP file and fetch IP packet (both IPv4 and IPv6) from a frame (with no output file)
>>> import pcapkit >>> extraction = pcapkit.extract(fin='in.pcap', nofile=True) >>> frame0 = extraction.frame[0] # check if IP in this frame, otherwise ProtocolNotFound will be raised >>> flag = pcapkit.IP in frame0 >>> tcp = frame0[pcapkit.IP] if flag else None
-
extract a PCAP file and reassemble TCP payload (with no output file nor frame storage)
import pcapkit # set strict to make sure full reassembly extraction = pcapkit.extract(fin='in.pcap', store=False, nofile=True, tcp=True, strict=True) # print extracted packet if HTTP in reassembled payloads for packet in extraction.reassembly.tcp: for reassembly in packet.packets: if pcapkit.HTTP in reassembly.protochain: print(reassembly.info)
The CLI (command line interface) of pcapkit
has two different access.
- through console scripts -- use command name
pcapkit [...]
directly (as shown in samples) - through Python module --
python -m pypcapkit [...]
works exactly the same as above
Here are some usage samples:
- export to a macOS Property List (
Xcode
has special support for this format)
$ pcapkit in --format plist --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.plist'
- export to a JSON file (with no format specified)
$ pcapkit in --output out.json --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.json'
- export to a text tree view file (without extension autocorrect)
$ pcapkit in --output out --format tree --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out'
- specify
Raw
packet - interface verbs
- review docstrings
- merge
jspcapy
- write documentation
- implement IP and MAC address containers
- implement option list extractors
- implement more protocols