Permalink
Browse files

Added license and readme

  • Loading branch information...
1 parent b5f1f56 commit 5f4634a3b976b5cf1f13091219393d9280e12a85 @wilson428 committed Sep 28, 2012
Showing with 5,697 additions and 0 deletions.
  1. +18 −0 LICENSE.txt
  2. +28 −0 README.txt
  3. +5,650 −0 gender.json
  4. +1 −0 gender.py
View
@@ -0,0 +1,18 @@
+Copyright (c) 2012, Chris Wilson All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are
+permitted provided that the following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this list of
+conditions and the following disclaimer. Redistributions in binary form must reproduce the
+above copyright notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution. THIS SOFTWARE IS
+PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
View
@@ -0,0 +1,28 @@
+This Python 2.7 code loads Federal Elections Commission data, in csv format, into a SQLite database.
+Once this is accomplished, it converts the raw strings of names into first and last names, then reduces
+those records to unique first and last names, retaining the frequency of those names and the splits
+in donations between candidates.
+
+This is not the cleanest data in the world, so a small number of names will be incorrectly loaded.
+
+To reduce double counting of names, middle initials are ignored. To prevent undercounting of
+common names, names are assumed to be unique by zip code and candidate to which the donation was made.
+This is necessary because many individuals are listed many times if they gave in installments.
+
+Thus, two distinct John Smiths in area code 22901 both giving to Mitt Romney will be counted as one person.
+The collision space here appears to be zero or near zero.
+
+The source code is heavily commented with more detaisl. This is licensed under BSD 2-Clause License (see LICENSE.txt)
+http://opensource.org/licenses/BSD-2-Clause
+
+All comments, questions or concerns are welcome!
+Chris Wilson
+cewilson@yahoo-inc.com
+https://github.com/wilson428
+
+PERMANENT DISCLOSURE
+I am a reporter by training and a largely self-taught programmer, so I can guarantee that the code here
+is not as elegant or Pythonic as it ought to be. I'm posting this in the interest of transparency for
+anyone who wants to check the methodology or see how this sort of project is accomplished. I would only
+advise against assuming it's the BEST way for it to be accomplished. If you see places to improve, please
+let me know!
Oops, something went wrong.

0 comments on commit 5f4634a

Please sign in to comment.