Hello! 👻

Today we're going to talk a bit about text scraping, manipulation, and analysis in Python.

Workshop by Phil, Riley, and Yeli.

Tools

Sublime Text

If you don't have a favorite text editor already, download Sublime Text. You can use Xcode for these exercises if you're used to it, but we recommend Sublime since it's simpler and less clunky.

pip

Open up a terminal and run this command to download the installer:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

then to install,

sudo python get-pip.py

Beautiful Soup

Once we've got pip set up, we can install Beautiful Soup with

$ sudo pip install beautifulsoup4

Beautiful Soup helps us scrape text from the internet. Muahahaha! 👹

Natural Language Toolkit (NLTK)

Once we've got pip set up, we can install NLTK with

$ sudo pip install nltk

NLTK is a suite of text processing libraries for Python that lets us analyze text in some really interesting and powerful ways. For the intro exercises, we'll work through part of the NLTK Book. It's a great resource, check it out!!

NLTK comes loaded with a bunch of corpora and trained models. We're going to use some of them, so in your Python REPL type:

import nltk
nltk.download()

If it looks like nothing happened, check if a new window popped open in the background. We want to download book under the "Collections" tab.

Cool links

Python scripts for getting stuff from social media: https://github.com/lamthuyvo/social-media-data-scripts

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cheats		cheats
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello! 👻

Tools

Sublime Text

pip

Beautiful Soup

Natural Language Toolkit (NLTK)

Cool links

About

Releases

Packages

Contributors 3

Languages

uniphil/sfpc-py101

Folders and files

Latest commit

History

Repository files navigation

Hello! 👻

Tools

Sublime Text

pip

Beautiful Soup

Natural Language Toolkit (NLTK)

Cool links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages