Permalink
Browse files

FLIM SPRINGFIELD

0 parents commit 227b672323a72a12f702e80995ce788d0dff28e0 @toddwschneider committed Sep 26, 2016
Showing with 4,879 additions and 0 deletions.
  1. +16 −0 .gitignore
  2. +17 −0 Gemfile
  3. +165 −0 Gemfile.lock
  4. +21 −0 LICENSE
  5. +32 −0 README.md
  6. +6 −0 Rakefile
  7. +314 −0 analysis/analysis.R
  8. +324 −0 analysis/data/character_genders.csv
  9. +561 −0 analysis/data/episode_summaries.csv
  10. +1,982 −0 analysis/data/nielsen_ratings.csv
  11. BIN analysis/graphs/01_homer_simpson.png
  12. BIN analysis/graphs/02_marge_simpson.png
  13. BIN analysis/graphs/03_bart_simpson.png
  14. BIN analysis/graphs/04_lisa_simpson.png
  15. BIN analysis/graphs/05_c_montgomery_burns.png
  16. BIN analysis/graphs/06_moe_szyslak.png
  17. BIN analysis/graphs/07_seymour_skinner.png
  18. BIN analysis/graphs/08_ned_flanders.png
  19. BIN analysis/graphs/09_krusty_the_clown.png
  20. BIN analysis/graphs/10_grampa_simpson.png
  21. BIN analysis/graphs/11_chief_wiggum.png
  22. BIN analysis/graphs/12_kent_brockman.png
  23. BIN analysis/graphs/13_milhouse_van_houten.png
  24. BIN analysis/graphs/14_apu_nahasapeemapetilon.png
  25. BIN analysis/graphs/15_lenny_leonard.png
  26. BIN analysis/graphs/16_waylon_smithers.png
  27. BIN analysis/graphs/17_nelson_muntz.png
  28. BIN analysis/graphs/18_dr_julius_hibbert.png
  29. BIN analysis/graphs/19_edna_krabappel-flanders.png
  30. BIN analysis/graphs/20_carl_carlson.png
  31. BIN analysis/graphs/21_rev_timothy_lovejoy.png
  32. BIN analysis/graphs/22_sideshow_bob.png
  33. BIN analysis/graphs/23_mayor_joe_quimby.png
  34. BIN analysis/graphs/24_comic_book_guy.png
  35. BIN analysis/graphs/25_gary_chalmers.png
  36. BIN analysis/graphs/26_groundskeeper_willie.png
  37. BIN analysis/graphs/27_announcer.png
  38. BIN analysis/graphs/28_selma_bouvier.png
  39. BIN analysis/graphs/29_professor_jonathan_frink.png
  40. BIN analysis/graphs/30_barney_gumble.png
  41. BIN analysis/graphs/31_troy_mcclure.png
  42. BIN analysis/graphs/32_patty_bouvier.png
  43. BIN analysis/graphs/33_martin_prince.png
  44. BIN analysis/graphs/34_otto_mann.png
  45. BIN analysis/graphs/35_sideshow_mel.png
  46. BIN analysis/graphs/36_ralph_wiggum.png
  47. BIN analysis/graphs/37_fat_tony.png
  48. BIN analysis/graphs/38_lou.png
  49. BIN analysis/graphs/39_jimbo_jones.png
  50. BIN analysis/graphs/40_kirk_van_houten.png
  51. BIN analysis/graphs/41_agnes_skinner.png
  52. BIN analysis/graphs/42_narrator.png
  53. BIN analysis/graphs/43_cletus_spuckler.png
  54. BIN analysis/graphs/44_gil_gunderson.png
  55. BIN analysis/graphs/45_lionel_hutz.png
  56. BIN analysis/graphs/46_snake_jailbird.png
  57. BIN analysis/graphs/47_kearney_zzyzwicz.png
  58. BIN analysis/graphs/48_the_rich_texan.png
  59. BIN analysis/graphs/49_herb.png
  60. BIN analysis/graphs/50_rainier_wolfcastle.png
  61. BIN analysis/graphs/avg_simpsons_world_views_by_season.png
  62. BIN analysis/graphs/nielsen.png
  63. BIN analysis/graphs/nielsen_1985.png
  64. BIN analysis/graphs/supporting_cast_word_count.png
  65. BIN analysis/graphs/tv_ratings.png
  66. BIN analysis/graphs/word_count.png
  67. BIN analysis/graphs/words_by_location.png
  68. +58 −0 analysis/helpers.R
  69. +62 −0 analysis/nielsen_ratings_by_year.rb
  70. 0 app/assets/images/.keep
  71. +15 −0 app/assets/javascripts/application.js
  72. +15 −0 app/assets/stylesheets/application.css
  73. +5 −0 app/controllers/application_controller.rb
  74. 0 app/controllers/concerns/.keep
  75. +2 −0 app/helpers/application_helper.rb
  76. 0 app/mailers/.keep
  77. 0 app/models/.keep
  78. +72 −0 app/models/character.rb
  79. 0 app/models/concerns/.keep
  80. +193 −0 app/models/episode.rb
  81. +19 −0 app/models/location.rb
  82. +30 −0 app/models/script_line.rb
  83. +14 −0 app/views/layouts/application.html.erb
  84. +3 −0 bin/bundle
  85. +5 −0 bin/delayed_job
  86. +8 −0 bin/rails
  87. +8 −0 bin/rake
  88. +29 −0 bin/setup
  89. +18 −0 bin/spring
  90. +4 −0 config.ru
  91. +30 −0 config/application.rb
  92. +3 −0 config/boot.rb
  93. +85 −0 config/database.yml
  94. +5 −0 config/environment.rb
  95. +41 −0 config/environments/development.rb
  96. +79 −0 config/environments/production.rb
  97. +42 −0 config/environments/test.rb
  98. +5 −0 config/initializers/array_extensions.rb
  99. +11 −0 config/initializers/assets.rb
  100. +7 −0 config/initializers/backtrace_silencers.rb
  101. +3 −0 config/initializers/cookies_serializer.rb
  102. +4 −0 config/initializers/filter_parameter_logging.rb
  103. +16 −0 config/initializers/inflections.rb
  104. +4 −0 config/initializers/mime_types.rb
  105. +3 −0 config/initializers/session_store.rb
  106. +14 −0 config/initializers/wrap_parameters.rb
  107. +23 −0 config/locales/en.yml
  108. +56 −0 config/routes.rb
  109. +22 −0 config/secrets.yml
  110. +24 −0 db/migrate/20150113010701_create_episodes.rb
  111. +22 −0 db/migrate/20150113011505_create_delayed_jobs.rb
  112. +43 −0 db/migrate/20160724230335_create_script_lines_and_characters_and_locations.rb
  113. +96 −0 db/schema.rb
  114. +7 −0 db/seeds.rb
  115. 0 lib/assets/.keep
  116. 0 lib/tasks/.keep
  117. +3 −0 lib/tasks/import_data.rake
  118. +18 −0 lib/text_normalizer.rb
  119. 0 log/.keep
  120. +67 −0 public/404.html
  121. +67 −0 public/422.html
  122. +66 −0 public/500.html
  123. 0 public/favicon.ico
  124. +5 −0 public/robots.txt
  125. 0 test/controllers/.keep
  126. 0 test/fixtures/.keep
  127. 0 test/helpers/.keep
  128. 0 test/integration/.keep
  129. 0 test/mailers/.keep
  130. 0 test/models/.keep
  131. +10 −0 test/test_helper.rb
  132. 0 vendor/assets/javascripts/.keep
  133. 0 vendor/assets/stylesheets/.keep
@@ -0,0 +1,16 @@
+# See https://help.github.com/articles/ignoring-files for more about ignoring files.
+#
+# If you find yourself ignoring temporary files generated by your text editor
+# or operating system, you probably want to add a global ignore instead:
+# git config --global core.excludesfile '~/.gitignore_global'
+
+# Ignore bundler config.
+/.bundle
+
+# Ignore all logfiles and tempfiles.
+/log/*
+!/log/.keep
+/tmp
+.byebug_history
+.Rapp.history
+.DS_Store
@@ -0,0 +1,17 @@
+source 'https://rubygems.org'
+ruby '2.3.1'
+
+gem 'rails', '~> 4.2.6'
+gem 'pg'
+gem 'sass-rails', '~> 5.0'
+gem 'uglifier', '>= 1.3.0'
+gem 'jquery-rails'
+gem 'unicorn'
+gem 'nokogiri'
+gem 'rest-client'
+gem 'delayed_job_active_record'
+
+group :development, :test do
+ gem 'byebug'
+ gem 'web-console', '~> 2.0'
+end
@@ -0,0 +1,165 @@
+GEM
+ remote: https://rubygems.org/
+ specs:
+ actionmailer (4.2.6)
+ actionpack (= 4.2.6)
+ actionview (= 4.2.6)
+ activejob (= 4.2.6)
+ mail (~> 2.5, >= 2.5.4)
+ rails-dom-testing (~> 1.0, >= 1.0.5)
+ actionpack (4.2.6)
+ actionview (= 4.2.6)
+ activesupport (= 4.2.6)
+ rack (~> 1.6)
+ rack-test (~> 0.6.2)
+ rails-dom-testing (~> 1.0, >= 1.0.5)
+ rails-html-sanitizer (~> 1.0, >= 1.0.2)
+ actionview (4.2.6)
+ activesupport (= 4.2.6)
+ builder (~> 3.1)
+ erubis (~> 2.7.0)
+ rails-dom-testing (~> 1.0, >= 1.0.5)
+ rails-html-sanitizer (~> 1.0, >= 1.0.2)
+ activejob (4.2.6)
+ activesupport (= 4.2.6)
+ globalid (>= 0.3.0)
+ activemodel (4.2.6)
+ activesupport (= 4.2.6)
+ builder (~> 3.1)
+ activerecord (4.2.6)
+ activemodel (= 4.2.6)
+ activesupport (= 4.2.6)
+ arel (~> 6.0)
+ activesupport (4.2.6)
+ i18n (~> 0.7)
+ json (~> 1.7, >= 1.7.7)
+ minitest (~> 5.1)
+ thread_safe (~> 0.3, >= 0.3.4)
+ tzinfo (~> 1.1)
+ arel (6.0.3)
+ binding_of_caller (0.7.2)
+ debug_inspector (>= 0.0.1)
+ builder (3.2.2)
+ byebug (9.0.5)
+ concurrent-ruby (1.0.2)
+ debug_inspector (0.0.2)
+ delayed_job (4.1.2)
+ activesupport (>= 3.0, < 5.1)
+ delayed_job_active_record (4.1.1)
+ activerecord (>= 3.0, < 5.1)
+ delayed_job (>= 3.0, < 5)
+ domain_name (0.5.20160615)
+ unf (>= 0.0.5, < 1.0.0)
+ erubis (2.7.0)
+ execjs (2.7.0)
+ globalid (0.3.6)
+ activesupport (>= 4.1.0)
+ http-cookie (1.0.2)
+ domain_name (~> 0.5)
+ i18n (0.7.0)
+ jquery-rails (4.1.1)
+ rails-dom-testing (>= 1, < 3)
+ railties (>= 4.2.0)
+ thor (>= 0.14, < 2.0)
+ json (1.8.3)
+ kgio (2.10.0)
+ loofah (2.0.3)
+ nokogiri (>= 1.5.9)
+ mail (2.6.4)
+ mime-types (>= 1.16, < 4)
+ mime-types (2.99.2)
+ mini_portile2 (2.1.0)
+ minitest (5.9.0)
+ netrc (0.11.0)
+ nokogiri (1.6.8)
+ mini_portile2 (~> 2.1.0)
+ pkg-config (~> 1.1.7)
+ pg (0.18.4)
+ pkg-config (1.1.7)
+ rack (1.6.4)
+ rack-test (0.6.3)
+ rack (>= 1.0)
+ rails (4.2.6)
+ actionmailer (= 4.2.6)
+ actionpack (= 4.2.6)
+ actionview (= 4.2.6)
+ activejob (= 4.2.6)
+ activemodel (= 4.2.6)
+ activerecord (= 4.2.6)
+ activesupport (= 4.2.6)
+ bundler (>= 1.3.0, < 2.0)
+ railties (= 4.2.6)
+ sprockets-rails
+ rails-deprecated_sanitizer (1.0.3)
+ activesupport (>= 4.2.0.alpha)
+ rails-dom-testing (1.0.7)
+ activesupport (>= 4.2.0.beta, < 5.0)
+ nokogiri (~> 1.6.0)
+ rails-deprecated_sanitizer (>= 1.0.1)
+ rails-html-sanitizer (1.0.3)
+ loofah (~> 2.0)
+ railties (4.2.6)
+ actionpack (= 4.2.6)
+ activesupport (= 4.2.6)
+ rake (>= 0.8.7)
+ thor (>= 0.18.1, < 2.0)
+ raindrops (0.16.0)
+ rake (11.2.2)
+ rest-client (2.0.0)
+ http-cookie (>= 1.0.2, < 2.0)
+ mime-types (>= 1.16, < 4.0)
+ netrc (~> 0.8)
+ sass (3.4.22)
+ sass-rails (5.0.5)
+ railties (>= 4.0.0, < 6)
+ sass (~> 3.1)
+ sprockets (>= 2.8, < 4.0)
+ sprockets-rails (>= 2.0, < 4.0)
+ tilt (>= 1.1, < 3)
+ sprockets (3.6.3)
+ concurrent-ruby (~> 1.0)
+ rack (> 1, < 3)
+ sprockets-rails (3.1.1)
+ actionpack (>= 4.0)
+ activesupport (>= 4.0)
+ sprockets (>= 3.0.0)
+ thor (0.19.1)
+ thread_safe (0.3.5)
+ tilt (2.0.5)
+ tzinfo (1.2.2)
+ thread_safe (~> 0.1)
+ uglifier (3.0.0)
+ execjs (>= 0.3.0, < 3)
+ unf (0.1.4)
+ unf_ext
+ unf_ext (0.0.7.2)
+ unicorn (5.1.0)
+ kgio (~> 2.6)
+ raindrops (~> 0.7)
+ web-console (2.3.0)
+ activemodel (>= 4.0)
+ binding_of_caller (>= 0.7.2)
+ railties (>= 4.0)
+ sprockets-rails (>= 2.0, < 4.0)
+
+PLATFORMS
+ ruby
+
+DEPENDENCIES
+ byebug
+ delayed_job_active_record
+ jquery-rails
+ nokogiri
+ pg
+ rails (~> 4.2.6)
+ rest-client
+ sass-rails (~> 5.0)
+ uglifier (>= 1.3.0)
+ unicorn
+ web-console (~> 2.0)
+
+RUBY VERSION
+ ruby 2.3.1p112
+
+BUNDLED WITH
+ 1.12.5
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2016
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,32 @@
+# The Simpsons by the Data
+
+Code in support of this post: [The Simpsons by the Data](http://toddwschneider.com/posts/the-simpsons-by-the-data/)
+
+It's a Rails app, but isn't intended to be run as a server. It processes data from [Simpsons World](http://www.simpsonsworld.com/), [Wikipedia](https://en.wikipedia.org/wiki/List_of_The_Simpsons_episodes), and [IMDb](http://www.imdb.com/title/tt0096697/eprate), and populates a PostgreSQL database called `simpsons_development`. The database contains 4 primary tables: episodes, script_lines, characters, and locations
+
+## Instructions
+
+Assumes you have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and [PostgreSQL](https://wiki.postgresql.org/wiki/Detailed_installation_guides) installed
+
+```
+git clone git@github.com:toddwschneider/flim-springfield.git
+cd flim-springfield/
+createdb simpsons_development
+bundle exec rake db:migrate
+bundle exec rake import_data
+bundle exec rake jobs:work
+```
+
+It takes about 45 minutes to process everything with one worker
+
+## Analysis
+
+R code to analyze the data lives in the `analysis/` folder
+
+## Caveats/areas for improvement
+
+- I deduped some character names when they're printed in different ways, e.g. "TROY" is the same as "Troy McClure", but I certainly did not dedupe all 6000+ characters that appear in the scripts
+- Similarly I manually assigned genders to the top 320 or so characters, who collectively account for 86% of the show's dialogue
+- I did not dedupe any locations
+
+![tab](https://cloud.githubusercontent.com/assets/70271/18603957/9c00df58-7c44-11e6-8222-6073565db089.png)
@@ -0,0 +1,6 @@
+# Add your own tasks in files placed in lib/tasks ending in .rake,
+# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
+
+require File.expand_path('../config/application', __FILE__)
+
+Rails.application.load_tasks
Oops, something went wrong.

0 comments on commit 227b672

Please sign in to comment.