Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



19 Commits

Repository files navigation

Domain Knowledge: Learning with Pydomains

You are what you browse. More or less. We jest, just a bit.

To help make it easier to learn from browsing data, we developed a Python package, pydomains. The package provides multiple ways to infer the kind of content hosted by a domain. To illustrate its power (and also the general workflow), we use it to answer two important questions:

  1. Do poor people, minorities, and the less-well-educated visit sites that distribute malware or engage in phishing more frequently than their respective complementary groups---the better-off, the racial majority, the better educated?

  2. How does consumption of pornography vary by education and age?



  1. Malware by Age, Race, Education
  2. Pornography Consumption by Age and Education for comScore 2004
    • We pick 2004 because we have data from Trusted Source API for 2004 also. We plan to present some supplementary data and analysis that illustrate some of the issues with comScore data but much of it is beyond the scope of this illustration and we may do it separately.



Suriyan Laohaprapanon and Gaurav Sood