Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protection tool for Whonix, Tails #1

Open
HulaHoopWhonix opened this issue Aug 17, 2016 · 23 comments
Open

Protection tool for Whonix, Tails #1

HulaHoopWhonix opened this issue Aug 17, 2016 · 23 comments

Comments

@HulaHoopWhonix
Copy link

@HulaHoopWhonix HulaHoopWhonix commented Aug 17, 2016

Hi. I've been reading up on the privacy implications keystroke dynamics and came across your (excellent) past research and now this.

I am affiliated with the Whonix project a Tor centric privacy OS similar to Tails but uses a VM anonymizing middlebox architecture.

We've been interested in a countermeasure for this deanonymization vector for the longest time. Unfortunately none of us knows C or enough about the guts of the kernel to write such a tool. Only recently I learned that the uinput API (maybe even python-uinput) can provide a way to influence keystroke timings but there is no program readily available to set this up AFAIK.

Can you please consider writing something we can include?

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Aug 17, 2016

Thanks for your interest. I agree that some type of kernel-level keystroke timing obfuscation would make a nice addition to Whonix and other privacy-preserving tools.

I think the right way to go about this is to write a custom keyboard device driver. This would create a special device file for obfuscated keystroke input, available system-wide.

Another option is to write a browser plugin, but then this functionality won't be available in other programs (SSH in interactive mode is especially vulnerable).

Unfortunately, I too am not well versed on the kernel. I could probably hack something together, but there are no guarantees that this wouldn't open up other vulnerabilities. If there is someone who knows enough about writing device drivers, I would be happy to assist in writing the obfuscation mechanism in C.

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Aug 18, 2016

I've had an idea about an alternative to writing kernel code that directly does this. It was inspired by this answer on SE:
https://stackoverflow.com/a/33134735

Basically funnel all system input events through a local network interface which you inject random latency in. On host so its system wide.

The network latency tools: iperf stress tool or using the kernel's netfilter_queue to delay packets randomly.

Fortunately there is a tool out there that can redirect all the host input to some destination. Netevent: https://github.com/Blub/netevent/wiki/Share-devices-over-the-net

Netevent cobbles netcat host/client together. We can run it as a service and set it to send on the loopback interface so the client and server communication never leaves the machine. Pros: kernel solution, display server agnostic. (It uses uinput interface to capture all events).


This sounds very convoluted but it would be great if it does indeed work.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Aug 19, 2016

Interesting. How well is the loopback interface protected? Would it be easier for malware to listen to network traffic than register a system-wide hook? I'm guessing this depends partly on the permissions of the device files. Although, even if just the timings are observed, the keys pressed can be reconstructed with a fair amount of confidence.

This seems like it could work using netfilter_queue. I could write the buffering mechanism as a library that's fairly self contained, which could be used in this or other solutions.

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Aug 19, 2016

Interesting. How well is the loopback interface protected? Would it be easier for malware to listen to network traffic than register a system-wide hook?

Looks like sniffing a network interface (including loopback) needs root and net capabilities: https://security.stackexchange.com/a/58031

I also recall that using tools like tshark for network leak tests required root so it wouldn't be any easier than the privileges needed for system hooks.

This seems like it could work using netfilter_queue. I could write the buffering mechanism as a library that's fairly self contained, which could be used in this or other solutions.

That would be great. Thanks for offering to help. Also please feel free to drop by our bugtracker: https://phabricator.whonix.org/T542

There is some netfilter_queue code a researcher has written to foil network latency covert channel: https://gist.github.com/ethan2-0/2c8505049c991fe0aac3d303dddb6075

Maybe there is some parts of it that can be re-purposed so you don't have to start from scratch?

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Aug 30, 2016

Thanks for the additional info. I've been actively planning how to go about this, and I'm going to take this approach:

  • First, develop a standalone c library that contains all the core functions needed to obfuscate. These functions are pretty basic, mostly just sampling some distribution that changes over time.
  • Then develop a rudimentary device driver that uses the above library. This will mostly just be an example of how to use the library, and a starting point for someone who wants to develop a full fledged device driver or the network queuing approach.

I'm in the process of extending the existing work. There's a key connection to queuing processes that hasn't yet been made. The system is essentially an MMPP/MMPP/1 queue: https://en.wikipedia.org/wiki/Kendall%27s_notation

I'm also looking at how this technique, or something similar, could be applied to mouse pointer motion. Mouse biometrics are less studied, but the consensus is that they can also be effective for identification/verification.

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Sep 1, 2016

Excellent. We really appreciate your work. Please feel free to ping this thread when its ready.

I'm also looking at how this technique, or something similar, could be applied to mouse pointer motion. Mouse biometrics are less studied, but the consensus is that they can also be effective for identification/verification.

Indeed. There are some successful attack prototypes for this too [1]. Does mitigating mouse motion fingerprinting need more than just delaying input events?

[1] http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprinting.html
[2] http://www.cs.wm.edu/~hnw/paper/ccs11.pdf

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Dec 21, 2016

So, here's a first attempt at something that could be used for keystroke privacy protection: https://github.com/vmonaco/kloak

The above application grabs the input device, randomly delays the key events, and writes the events to a user-level input device via uinput. I thought this approach was less intrusive and more portable than a kernel module or device driver. It could be run in a startup script run by root, or turned on/off as needed.

And to answer your question above (sorry for the delay...), I'm not really sure. I think the delays might help, but most mouse motion biometrics are based primarily on the shape of the trajectory. Fortunately, the OS introduces an artificial acceleration, and this varies greatly by OS, so what ends up being measured is the pointer motion and not the physical mouse motion.

Lastly, I forgot to mention that similar to keystroke biometrics, the packet inter-arrival times of a wireless device can be used to identify the device type (and sometimes even a particular device) in a passive analysis, e.g., http://www2.ece.gatech.edu/cap/papers/1569740227-3.pdf. I don't know if anything like this has been deployed, but a similar obfuscation strategy should make techniques like that less effective.

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Dec 25, 2016

Thank you so much and Happy Holidays. Seems Christmas came a little earlier this year :D We will test and deploy this ASAP.

And to answer your question above (sorry for the delay...), I'm not really sure. I think the delays might help, but most mouse motion biometrics are based primarily on the shape of the trajectory. Fortunately, the OS introduces an artificial acceleration, and this varies greatly by OS, so what ends up being measured is the pointer motion and not the physical mouse motion.

Your solutions are much more effective than the network latency suggestion - that was a really a (desperate) hack in absence of a better way.

You have been very generous with us and I don't want to ask too much - if you feel like it and find the time to write something similar to obfuscate pointer motion, we would appreciate it a lot. This would shut the door on all major ways they can track behavior, combined with the anti-stylommetry tool Anonymouth (whenever they finish migrating to OpenJDK) and users have a powerful toolbox.

Lastly, I forgot to mention that similar to keystroke biometrics, the packet inter-arrival times of a wireless device can be used to identify the device type (and sometimes even a particular device) in a passive analysis

Very interesting! and scary. I always had a hunch something like this is possible. I'll look into the ramifications of this on user anonymity. I hope there is some easy way to mitigate.

Higher level cognitive behavior, such as editing and application usage, are still apparent. These lower-frequency actions are less understood at this point, but could potentially be used to reveal identity.

Is there some literature on this? I'd like to know more.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Dec 26, 2016

I'm happy to work on a solution for mouse biometrics. Implementation can be done similarly to kloak, by modifying mouse events before they're written back to the user device. The hard part is developing an obfuscation model that doesn't affect user experience too much and one that doesn't defeat it's purpose. The relative mouse motion events are usually generated up to some maximum frequency (e.g., 1 event/8 ms), which decreases when velocity decreases. Introducing a random delay may do more harm than good, allowing users with the tool running to be identified.

Re. higher level behavior: For example, see this paper. I think that most higher level "cognitive" behavioral biometrics will be pretty application specific. That paper uses descriptive statistics for actions that are very specific to the game and don't really apply anywhere else.

[Edit] See also "Identifying Users with Application-Specific Command Streams" and references therein. This work used an older dataset containing MS Word actions: http://www.research.rutgers.edu/~sofmac/ml4um/

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 2, 2017

Happy New Year!

Thank you for creating kloak!

The kloak compilation and usage instructions are super simple to follow. (Tested in a VirtualBox Whonix VM.) Was running:

sudo ./kloak -r /dev/input/event0 -w /dev/uinput -v

In my first test using iceweasel, my keytrac detection scores reduced. Once to 8% and once to 82%. So we might have to fine tune the delays?

In my second test using Tor Browser I bumped into a bug:

pthread_create() failed: cannot allocate memory

And the last key pressed (o) kept being sent over and over again. (Very unlikely that my VM was really out of memory.)

The emergency key combination Right Shift + Right Ctrl is non-ideal, since VirtualBox default host key is Right Ctrl. Would be great if we could change that.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 2, 2017

Thank you for the feedback, and happy new year!

The key combo is an easy fix. How about defaulting to something else and letting the user specify the combo as command line params?

I was able to reproduce the repeating key bug a few times, typically with longer delays. Still investigating the cause... See updates here:
vmonaco/kloak#1

Re. choosing a delay, were those results the train kloak/test kloak scenario? Some fine tuning might be required. You can try something like ~500 ms and work your way down until the delay becomes tolerable or not noticeable. I'm also looking into a variable maximum delay that depends on typing speed. This would avoid having to choose a delay, automatically setting a sensible max delay according to typing speed (I think slower typists can tolerate a larger delay than faster typists).

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Jan 2, 2017

Thanks a lot and Happy New Year :)

I've been looking at mouse click dynamics (as opposed to movements) and this study[1] proposes a system based on just that for keyboardless devices like tablets. Its success rate is not high enough to be used on it own yet so the authors recommend it as a backup to keyboard fingerprinting. In your opinion, is there a similar practical solution for click duration like you what you thought of for movements?

[1] http://www.ijicic.org/ijicic-ksi-03.pdf - User Authentication using Rhythm Click Characteristics for Non-Keyboard Devices

[2] https://www.ibm.com/developerworks/library/os-userauth-mouse/index.html - IBM perl guide for fingerprinting mouse click-hold times

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 2, 2017

Since keytrac isn't Open Source, I guess there is no way to know how bad 1% is vs 2% or 0%?

The key combo is an easy fix. How about defaulting to something else and letting the user specify the combo as command line params?

Yes, that would be great!

Re. choosing a delay, were those results the train kloak/test kloak scenario?

Yes.


My typing speed is above 500 CPM. 10 finger and "untrained". Well, many years ago I learned 10 finger typing but didn't make an effort for years now to improve that since typing is probably not my productivity bottleneck. Just tried in a 1 minute typing speed test (for whatever that's worth). And I doubt I could keep doing that speed for long times, but it is probably the speed with which I am typing usernames / passwords at keytrac.


default (100) ms
Train normal, test normal
94 % / 97 %

Train normal, test kloak
33 % / 99 % / 82 %


300ms
Train normal, test kloak
1 %


200ms
Train normal, test kloak
26 %


300ms
Train kloak, test kloak
85 %

@HulaHoopWhonix

This comment has been minimized.

Copy link
Author

@HulaHoopWhonix HulaHoopWhonix commented Jan 3, 2017

@vmonaco Is it okay to stack kloak instances? - run it on the host and VM at the same time

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 3, 2017

@HulaHoopWhonix thanks for the links. Yes, the same exact techniques could be applied to mouse clicks. It would be effective against the "higher frequency" click actions (mainly the duration of a single click and the various double click time intervals). I agree that mouse clicks alone aren't particularly effective, except possibly in applications with a high volume of clicks.

We did a study a few years ago, had ~20 users play Solitaire and Star Bubbles (both online games, the latter requires many clicks), and could identify users by mouse click behavior with 37% accuracy. That's using a pretty simple classifier, training on the first session, and using the remaining sessions for testing (See https://gist.github.com/vmonaco/209647bc6438b1d045d738156179367f)

@adrelanos Correct - with only a few scores, it's difficult to say how they relate to each other. With the scores from many users and sessions per user, it would be possible to determine the accuracy of their system. Since keytrac gives a numeric value (instead of just a accept/reject decision), these can be used to derive an ROC curve and estimate system performance by obtaining many genuine and impostor scores. This would require a bunch of volunteers to obtain the scores and simulate the impostor scores by swapping credentials.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 3, 2017

@HulaHoopWhonix and yes, stacking the kloaks should be find. Though, the maximum delay on the VM will be the sum of maximum delays in each instance, so you might experience more lag there.

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 3, 2017

I found an overview with special keys used by various virtualizers. It would be great if the default emergency key of kloak would not use any of these.

http://vmetc.com/2008/10/02/stuck-in-a-vm-%E2%80%93-to-release-the-mouse-press-the-host-key/

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 4, 2017

@adrelanos thanks! How about "Left Shift + Right Shift + Escape"? This should be pretty hard to press accidentally - a situation we definitely want to avoid.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 5, 2017

@adrelanos Key combo fixed in the latest commit, can now be specified on the command line and has the above default.

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 5, 2017

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 6, 2017

keycodes.c introduced an compilation error. Reported it here:
vmonaco/kloak#2

@adrelanos

This comment has been minimized.

Copy link

@adrelanos adrelanos commented Jan 7, 2017

As per debian-mentors mailing list - Mixed kloak anti keystroke / mice deanonymization tool package or better two separate packages?...

If you were to provide a mice anti fingerprinting tool also, please add the sources to your existing kloak source code repository. Of course this is just a friendly suggestion. After all, distributions have to wrap their head around packaging and not upstream around distribution policies.

@vmonaco

This comment has been minimized.

Copy link
Owner

@vmonaco vmonaco commented Jan 7, 2017

Thanks, that's what I'll do. The timing delays could be applied to clicks and other discrete events, so it makes sense to share some of this code. The pointer "shape" is more difficult. I created vmonaco/kloak#7 to track progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.