Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
How To Contribute Terms Location
[derived from https://www.eff.org/deeplinks/2013/01/campus-party-hackathon-making-rule-contribution-tosback ]
The project uses Rails and we'd love people to contribute code. But if you aren't a Rails developer, you can still contribute by submitting rules and letting us know which policies are important to you. This is a developers' guide for submitting new policies for TOSBack to crawl. If you want to get started as quickly as possible, you can scroll down to the "Putting it all together" section below.
What you will need
A browser (Chrome, Firefox, Safari)
A text editor to like TextEdit/TextMate (Mac), Notepad (Windows), or Emacs (all platforms) to modify the XML files.
Make sure you have these installed in order to test your rules:
Git and a Github account.
Ruby 1.9.3 - the version is important
The proper gems:
gem install nokogiri mechanize sanitize mail active_record bigdecimal
Note that you may need to install dependencies before installing these gems.
- (Optional) If you use Firefox, the Firebug extension adds functionality to the browser's developer tools.
Take a look at TOSBack's XML structure
The app scans a set of XML files that define attributes for sites and policies. Then, it uses those attributes to find the policy, look for changes, and store new versions. Here's an example from the current rules:
Curious about the content? (The elements and attributes)
- Each site has its own XML file ("500px.com.xml") in the rules directory, and a "sitename" element.
- name: The name attribute shouldn't include "http://" and is just used to organize the policies.
- Nested beneath the "sitename" element should be one or more "docname" elements
- name: Make sure your docname's name is present and doesn't have any strange characters.
- Nested beneath that is the "url" element and its attributes:
- name: Encode your ampersands and use the full URL! This is the location that TOSBack's scraper will visit to find the policy and if the site is owned by another company, it may not match the "sitename". "http://fullyQualifiedDomainName.com/includes?request=true&lang=EN"
- xpath: Use single quotes inside the brackets, and check the section below.
- lang: Find the two character language code.
Still confused about the XPath?
XPath defines where the policy is nested on the page, and allows us to strip away unrelated content from the policy (ads, related articles, and etc.) In this example, the policy at http://www.500px.com/terms has an XPath of "//div[@id='terms']". Here's a snippet from their source to give you an idea
<div id="terms" class="col d1 rounded shadow filled"> <div class="intro"> ... </div> <div class="left-legal"> ... </div> </div>
Since the policy exists only in elements nested below the div tag with the id of "terms", we can extract it with XPath and ignore the headers, footers, and etc.
Putting it all together
Clone the current git repo to your local machine:
git clone https://github.com/tosdr/tosback2.git
Identify the website for which you want to add a rule. Search through the "rules" directory to make sure the website is not already present.
Visit the site and find its terms of service and privacy policies. The footer of the site is a great place to find the link, but you may have to really dig!
Finding the XPath is a complicated subject, and if you aren't familiar with the syntax, it might be pretty confusing at first. Take a moment on W3Schools and look at the XPath section above.
Save your new XML file and add it to your project: git add example.co.uk.xml
Switch to the rubycode directory and test your new rule by passing it as an argument to main.rb:
cd rubycode rubycode$ ruby main.rb ../rules/500px.com.xml
Before using any of the 500px services, you are required to read, understand and agree to these terms. You may only create an account after reading and accepting these terms.
Make sure you run "git pull" often to ensure your code is up to date.
Commit often, descriptively, and in the present tense:
git commit -m "Add new rule for example.co.uk"
Pull once more and merge if needed, and recommit. Then push to a github repository to which you have access, and submit a pull request to the master repo with your additions.
Add some more rules!
If you need some help...
It may seem very difficult if you're just starting out, but if your policy requires a tricky XPath attribute or if you just need help remembering which git command to use next, get on our IRC channel #tosback on irc.oftc.net and we'll be happy to help you!