Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet sanitization and IP masking #615

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lilchurro
Copy link

Created a separate branch so I can keep upstream and workgroup README and version strings separate.

@guyharris guyharris changed the title Send upstream Packet sanitization and IP masking Jul 18, 2017
@infrastation
Copy link
Member

As a general note, this problem is not new and similar tools already exist:

If you still find it better to suggest a new solution, the proposed changes need to be one clean commit, which explains why this specific solution is better.

tcpdump.c Outdated
@@ -87,6 +87,7 @@ The Regents of the University of California. All rights reserved.\n";
#include <sys/nv.h>
#endif /* HAVE_CASPER */
#endif /* HAVE_CAPSICUM */
#include <arpa/inet.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exist on all platforms? No.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanem which platforms are you concerned about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows (and possibly DOS).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guyharris DOS (i.e. Watt-32) do have <arpa/inet.h>. But I'm more concerned about Windows off-course.

/* List of all the reserved IPv4 address spaces, per RFC5735,
* ...PROVIDED IN NETWORK BYTE ORDER!! */
struct netblock specialblock[] = {
{ .netip = 0x00000000, .netmask = 0x000000ff }, /* 0.0.0.0/8 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This C99 (?) syntax is not universally available.

Copy link
Member

@guyharris guyharris Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Wireshark, we currently require compilers that support these C99 features:

  • flexible array members
  • compound literals
  • designated initializers
  • "//" comments
  • mixed declarations and code
  • new block scopes for selection and iteration statements (that is, declaring the type in a for-loop like: for (int i = 0; i < n; i++) ;)
  • macros with a variable number of arguments (variadic macros)
  • trailing comma in enum declarations
  • inline functions (guaranteed only by use of glib.h - tcpdump handles that itself, where it may define inline as __inline)

The syntax in question is the "designated initializer" syntax.

Those features are supported by all versions of GCC and Clang I've dealt with, and I think it's supported by sufficiently-recent versions of UN*X vendor C compilers. Before expanding the list of C99 features, I dug through Microsoft's documentation to see which compilers support which features; as I remember, Visual Studio 2013 (the compiler currently used on the Wireshark buildbots) support all of those. The Wireshark Developer's Guide also mentions VS 2010; I think it might support them as well.

So, are there any compilers that we still want to support and that don't support those features? If not, perhaps it's time to require compilers that support those features to build tcpdump.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanem, could you clarify if it is not available because you use a compiler that does not support C99 in principle or because tcpdump runs the compiler with C99 disabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's designated initializers that Guy wrote, then MSVC-2013 does not have that. I'm not sure how I can disable C99. So no.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's designated initializers that Guy wrote, then MSVC-2013 does not have that

"Does not have that" in the sense of "this page doesn't mention it" or in the sense of "Microsoft didn't implement it"? The feature appears to exist, with some bugs, in VS 2013.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the recommendation from this discussion?

@lilchurro
Copy link
Author

lilchurro commented Jul 19, 2017

Hi there!

I'm currently working on the +#include <arpa/inet.h> issue, and a better description for the PR request, but I'd like to take a moment to respond to the question of "Why does this PR exist?"

As a former network analyst, I'm certainly aware that tools exist that sort of do these anonymization/privatization and sanitization features, but there are (in my experience) some not-insignificant issues with their adoption. Our motivation for ultimately adding this to tcpdump are the following:

  1. Simplicity and automation. We are trying to automate the collection and analysis of network traffic coming off of SDN-enabled switches, and we wanted something that did bulk sanitization without requiring multistep or manual processes (i.e., first collecting, then moving, then scrubbing, etc.) because...

  2. Ability to scale. Most of our work entail large packet captures (e.g., hundreds of GBs), which can pose scalability issues when running subsequent tools, especially if they require writing full packet captures to memory before sanitizing and anonymizing. We had looked at scapy, for example (which is a backbone to some of these tools), which does not seem to handle scaling challenges particularly well.

  3. Thorough sanitization. In the discussions referenced, one will see comments about sanitizing higher layer protocols (e.g., DNS) but potentially omitting or mishandling another (e.g., HTTP). As well, a problem in one layer can undo the anonymity offered by scrubbing a lower layer protocol. This approach may not solve every privacy concern, but completely zeroing out (or truncating) all payload data above TCP/UDP seems to pretty much resolve many of the more conventional privacy issues.

  4. Runs cleanly on Linux. We require something that can be run on Linux as that's where we do the bulk of our work. We are also finding that Linux is typically the base OS on "bare metal" switches / "disaggregated" network gear. Windows-only based approaches (like TraceWrangler) are challenging for those of us running tcpdump on primarily Linux-based environments or on physical network switches.

  5. Is an active project. Ideally, we'd like something that is maintained/not orphaned, and is stable. For example, SCRUBtcpdump looks promising, but doesn't seem to have been updated in a decade, and segfaults from the command line with unexpected input. At the risk of stating the obvious, tcpdump's long history and widescale adoption, significant userbase, active community and development base makes it an ideal and attractive method.

Finally, there's the utility of the proposed -00 option which shrinks the size of data collection. This reduction has significant impact for folks doing downstream ingest and analytics of really large network collections. (Sending 1GB to a GPU farm is way better than 10GB!) Having this functionality built into tcpdump opens it up as a really great option not just for traditional networking folks, but for people doing research in machine learning (e.g., our team) in computer networks and security.

All that being said, thanks so much for your input and guidance! I'll work on better defining the description and the code issues pointed out.

@mcr
Copy link
Member

mcr commented Jul 20, 2017 via email

@mcr
Copy link
Member

mcr commented Jul 24, 2017 via email

@lilchurro
Copy link
Author

lilchurro commented Jul 26, 2017

Hello again... I'm having some issues compiling on Windows. (Using Visual Studio 2017, and attempting to compile with headers and libraries as provided by WpcapSrc_4_1_3.zip.)

I'm trying to compile just the master branch of tcpdump (i.e., without my changes), but am stuck with some dependency issues (among other things). It appears to me that windump version 4.10 doesn't compile on Windows as it refers to the file util.c, which doesn't exist. Also, I'm having issues using wpcapsrc 4.1.3, as the pcap.h included therein contains reference to sys/time.h -- so is there a specific version of winpcap that I should be using? The win32 readme seems vague about this.

Is there some way I can get assistance in compiling the master branch on Windows? It seems like I should get that working before trying to verify that my changes can compile.

@lilchurro
Copy link
Author

lilchurro commented Sep 28, 2017

FYI, I've added new tests to verify the anonymization and packet sanitization for TCP/UDP packet payloads, and have integrated them into TESTrun.sh. I noticed that there wasn't really a way to test for cases where -w [savefile] differs from the usual tcpdump output, so the new test file TESToutfile checks for such a case. For ease of use, it's basically coded in the same structure as TESTonce and uses TESTOUTLIST to keep track of tests related to savefiles.

If there's something else I can do to move this PR along, please let me know; otherwise, I will consider this about as done as it could be. 🙂

@lilchurro
Copy link
Author

Ping... is there anything else I can do to move this PR along?

Makefile.in Outdated
@@ -298,7 +300,7 @@ HDR = \
tcp.h \
netdissect-stdinc.h \
timeval-operations.h \
udp.h
udp.h \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a continuation supposed to be on this line?

README.md Outdated
@@ -1,7 +1,6 @@
# tcpdump

[![Build
Status](https://travis-ci.org/the-tcpdump-group/tcpdump.png)](https://travis-ci.org/the-tcpdump-group/tcpdump)
[![Build Status](https://travis-ci.org/lilchurro/tcpdump.svg?branch=send-upstream)](https://travis-ci.org/lilchurro/tcpdump)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this reference the tcpdump travis ci instead of @lilchurro travis ci?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one, actually, because it specifically tests the changes to my branch.

clean_cap_dump.c Outdated
#include "tcp.h"
#include "udp.h"

#define MAXPACKET 65549

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment about what MAXPACKET defines, (what parts of the packet does it cover?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comment about it being largest IP packet size + eth header.

u_char *s;
const struct ip *ip;
const struct ether_header *ep = (const struct ether_header *)bp;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can h,bp ever be Null?

}

static uint32_t
nd_ipv4_to_network_uint(nd_ipv4 a) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a be Null


return 0;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* comment of what function does */


void
pcap_mod_and_dump(u_char *user, const struct pcap_pkthdr *h, const u_char *sp,
int dlt, int no_payload_flag, int mask_ip_flag, const char *maskIP) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can any of the incoming pointers be Null?

sf_hdr.caplen = h->caplen;
sf_hdr.len = h->len;

switch(dlt) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be more things processed by this switch? yes -> ok
no -> probably should become an if

case DLT_EN10MB:
if ((ip = get_iph_ptr(h, modp)) == NULL)
break;
p_len -= ETHER_HDRLEN;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is p_len always >= ETHER_HDRLEN?

}
}

if (no_payload_flag > 1) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are valid values for no_payload_flag?

@lilchurro
Copy link
Author

😱 Well, that rebase was a harrowing experience. 1 clean commit now tho.

Added two new flags to tcpdump

To incorporate privacy concerns in sharing network traffic among disparate organizations,
we've added two (three?) new flags to tcpdump:
    - `--zero-tcpudp-payload` zeros out packet payload after TCP/UDP headers (in IPv4);
    - `--no-tcpudp-payload` truncates the packet payload after TCP/UDP headers (in IPv4);
    - `--mask-external-address mask_ip` masks all IPv4 addresses outside of the blocks noted in RFC5735.

These currently assume DLT_EN10MB and will leave unchanged any other link layer traffic.
Additionally, these flags only work when writing to a savefile.
@gregs5
Copy link

gregs5 commented Aug 1, 2018

Is this something @lilchurro should consider spending the time to bring up to date? Would be great to get some guidance here!

@infrastation
Copy link
Member

There were two discussions made in this pull request. One about the code itself, which indeed belongs here. I cannot help completing it before I finish other bits of work, other developers may be available. Another discussion, which would belong more to the tcpdump-workers mailing list, was whether it is right to incorporate the masking function into tcpdump as opposed to doing it in a separate binary. I guess the latter one could go into more detail to reach a more pronounced consensus. I, for instance, am not quite convinced yet, but I could be wrong.

@gregs5
Copy link

gregs5 commented Aug 1, 2018

Thanks - really appreciate the reply. Understood on your concerns. I'll share that we looked at other methods (and all the tools you referenced last year) and still came to the conclusion that for anyone who wants this functionality, this is the right path. That said though, I'll start a thread in the tcpdump-workers and we'll see where it takes us. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants