Skip to content

weimer-coders/machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 

Repository files navigation

Machine Learning (Woo!)

Andrew

Tutorials

This set of tutorials on TensorFlow were quick and easy thanks in no small part to Google's tremendous production budget. I highly recommend going through this series if you want to breeze through it quickly (no pun intended).

  1. Hello World: A basic introduction into how machine learning works to classify things.
  2. Decision Trees: A deeper look into how the machine makes these decisions using features.
  3. Feature Selection: A theoretical dive into what exactly makes a good feature (the property you're going to use to measure the differences between your types).
  4. Pipelines: There's a number of things your app will need to do every time it's run to process the data, analyze it, and classify new data. This video shows you how to build that pipeline.
  5. Classifiers: An introduction into making an actual classifier.
  6. Image Classifier: An example of a classifier used to recognize the content of an image.
  7. OCR Classifier: An example of a classifier used to recognzie a digit that was handwritten.
  8. Decision Tree Classifier: A deep dive into how to get more granular with editing your decision tree.
  9. Feature Engineering: Sometimes features aren't a clear data point, but instead a combination of those data points. For example, maybe height and weight independantly won't tell you about someone's likelihood to get heart disease, but a calculted BMI might be a feature worth analyzing. This process is called feature engineering, and this video introduces that concept.

Structured Journalism

I found a lot of content from 2015 about something called Structured Journalism. The idea that journalism itself can be broken down into data points which can then be used to create more stories down the road, or provide interesting links between stories. One Medium post went so far as to call this The Next Revolution in Storytelling. It looks like both The New York Times and BBC were working on ways to use machine learning to turn journalism stories into this kind of structured data for one reason for another. It was also the inspiration for Circa (before it was bought out and gutted by Sinclair). It seems like a big thing that quickly died soon after it was born.

In the same vein: A 2015 story about the AP using "robo-journalism" showed how stories automatically generated from data are not nearly as simplistic as Mad-Libs. Wordsmith is a commercial software product that does this. MM

I know The LA Times Data Desk still uses this philosophy on some of it's projects. One (which I worked) on is called The Homicide Report, and I know it's used for their recipes site. These data sets are created manually though, and turned into data. They seem to have stopped using machine learning as the classifier for it.

Ethics in Machine Learning

As I normally do with these research things, here's my quick blurb about ethics. We should consider them. Machines aren't objective, algorithms aren't objective, machine learning algorithms aren't objective. All technology has embedded in it the biases of it's creators. But don't take my word for it. I'll get off my pulpit now.

Caitlin

Tutorials

Adding on to Andrew's videos, DataCamp has a good walkthrough on machine learning. If you go to the comments, the creators have answered questions people have had so that might be useful.

Machine Learning in Journalism

So some of this will be over our heads but these are some examples of machine learning in news stories.

  1. ProPublica has used machine learning to analyze Congressional representatives' priorities by reading keywords in press releases for Represent.
  2. ProPublica also used machine learning to find out that the Broward County Sheriff's Office was using a biased system to predict whether black and white felons would reoffend for its project Machine Bias. If you're feeling really brave, their code is here.
  3. The Atlanta Journal-Constitution used machine learning for Doctors and Sex Abuse to analyze documents based on keywords to flag those that need to be read.
  4. Machine learning can also be used for analytics and understanding one's audience or moderating content.

Gabrielle

Machine Learning + Journalism

That link does not work - MM

  • This is a very cool conversation from a conference that has good advice for journalists venturing into AI.

  • This Medium post explains a bunch of different uses for AI in journalism, plus trends in the industry and tips to get started.

Cool examples of how journalists are using machine learning

  • Juicer (BBC News Lab): Interactive map that collects stories from around the world. Users can click on a country and see the latest news from that area.

  • Lazarus (The New York Times): A really cool archive of images/visual journalism from the last century, organized with tags and metadata (publication dates, etc).

  • Editor (The New York Times): This is an “ experimental text editing interface” that use neural networks (fancy) to annotate articles and apply New York Times tags to text.

Ryan

Background

I think it may be a good idea to understand how machine learning actually works without being bogged down by the technicalities of it, so I recommend every watching this video by CGP Grey on how machines learn.

Resources

  • Joseph Misiti has a curated list of ML python frameworks and libraries on GitHub.
  • Marco Bonzanini walks you through mining mining Twitter data and performing sentiment analysis on them.
  • The University of California, Irvine, has a machine learning repository replete with 426 data sets that's meant to assist the machine learning community.
  • William Koehrsen was inspired by Moneyball and wanted to use python and machine learning to mimic the same kind of Sabermetrics. He was successful and created a tutorial so that others could do the same.
  • Here's an article that talks about using python and machine learning to predict winners of NBA games. Link.
  • I'm seeing a whole lot of talk of scikit across all these tutorials. This may be the python library for ML.

I've been exploring Supervised Learning with scikit-learn at DataCamp, and in addition to using NumPy and Pandas, it assumes you understand linear regression, which I really do not. Just a heads-up. MM

Mary-Lou

Just for the sake of not repeating what has already been said, I kept my section kind of short. Below I have provided more examples and discussion about what machine learning is, because Andrew really covered the tutorial stuff and Caitlin gave some great examples.

Resources

  1. How machine learning could change journalism — How machine learning could change journalism - Storybench. This is a pretty good article about how journalists can use machine learning. It talks about what machine learning is, what the future with machine learning looks like and some books to help learn it (although we won’t be buying any books to learn it in this short time span).
  2. Different types of machine learning — Data journalism’s AI opportunity: the 3 different types of machine learning & how they have already been used | Online Journalism Blog. This article is good because it talks about the three different types of machine learning: supervised, unsupervised and reinforcement.
  3. Three examples of machine learning — Three examples of machine learning in the newsroom – Global Editors Network – Medium. This article also goes over what machine learning is, and then it gives three examples of machine learning in journalism. They are all the ones we discussed in class — the LA Times crime story, BuzzFeed plane story and New York Times Congress Shazam app.

Nicole

I think we have most of the tutorials covered, so I'll be posting some more background on machine learning.

  1. Automated Journalism - AI Applications at New York Times, Reuters, and Other Media Giants - This article shows what other news organizations have done and can serve as inspiration for our projects. Many of these involve "automated journalism."
  2. What News Writing Bots Mean For the Future of Journalism - This article further explores the implications of having bots write articles for news orgs. It discusses how automated articles were written about elections and the Olympics using templates created by editors and Heliograf.
  3. After years of testing, The Wall Street Journal has built a paywall that bends to the individual reader - This article explores how the Wall Street Journal gives each reader a score and bends the paywall depending on how likely that person is to subscribe. It factors in their location, what they clicked on, their location, and more, and was probably developed through machine learning by comparing the traits of the people who subscribed versus those who did not.

About

A place to learn about machine learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published