Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
ms
 
 
 
 
 
 
 
 

Domain Knowledge: Learning with Pydomains

You are what you browse. More or less. We jest, just a bit.

To help make it easier to learn from browsing data, we developed a Python package, pydomains. The package provides multiple ways to infer the kind of content hosted by a domain. To illustrate its power (and also the general workflow), we use it to answer two important questions:

  1. Do poor people, minorities, and the less-well-educated visit sites that distribute malware or engage in phishing more frequently than their respective complementary groups---the better-off, the racial majority, the better educated?

  2. How does consumption of pornography vary by education and age?

Data

Scripts

  1. Malware by Age, Race, Education
  2. Pornography Consumption by Age and Education for comScore 2004
    • We pick 2004 because we have data from Trusted Source API for 2004 also. We plan to present some supplementary data and analysis that illustrate some of the issues with comScore data but much of it is beyond the scope of this illustration and we may do it separately.

Outputs

Authors

Suriyan Laohaprapanon and Gaurav Sood