# Search and Save Query Pages

Author: Zipporah Cohen

**Summary**

This notebook uses Selenium to perform Google searches for a set list of queries and saves each resulting HTML locally.

**Table of Contents**
1. [Create JSON for Queries](#sec1)
2. [Set up WebDriver](#sec2)
3. [Create and Call Searching Function](#sec3)

### Perform Imports

In [2]:
import selenium
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

from time import sleep

<a id="sec1"></a>
## Create a JSON file to contain all of the queries for re-use

Break queries into four separate categories which label them by type of language used or content of the query

In [1]:
from collections import defaultdict

queries = defaultdict(list)
queries['inclusive-terms'] = ['Why do people get periods',
    'why do we menstruate?',
    'advice for people who menstruate',
    'how to mitigate dysphoria while menstruating',
    'things that everyone who has a period should know',
    'should menstruators drink while on their period',
    'do trans men stop menstruating?',
    'do trans women menstruate?',
    'do trans women ever want to menstruate?',
    'how does menstruation affect the trans experience?']

queries['non-gendered'] = ['what menstrual pads are best?',
    'average length of menstrual cycle',
    'how much blood is lost in one period',
    'How do periods work',
    'what\'s the point of menstruation?',
    'what\'s the deal with menstruation?',
    'why do people menstruate?',
    'why is menstruation so uncomfortable?',
    'how much blood is too much blood',
     'what is a period',
    'how long do periods last',
    'when do i get my period',
    'what is menstruation?',
    'why do humans menstruate?',
    'is menstruating annoying?',
    'will we ever evolve out of menstruating?',
    'why is there blood in my underwear',
    'is my period normal',
    'period',
    'cramps']

queries['female-gendered'] = ['how long are women\'s periods?',
    'how many periods does a woman get in her lifetime',
    'why is my girlfriend so moody before her period',
    'how to take care of my girlfriend on her period',
    'When do girls start their periods?',
    'female monthly period',
    'why do women menstruate?',
    'is menstruating annoying to women?',
    'do only women menstruate?',
    'why do only women menstruate?',
    'how long do women menstruate',
    'how late should women be on their period']

queries['anatomical-terms'] = ['menstruation and vaginas',
    'breasts hurt during period',
    'tampon isn\'t fitting in my vagina',
    'diva cup fit test for vulva',
    'why does menstruation cause cramps and butt pain?',
    'Does menstruation come out of uterus',
    'what is going on in with the uterus and ovaries during menstruation?',
    'what happens in the body during menstruation?',
    'why do boobs hurt during menstruation?',
    'blood coming out of my vagina',
    'my uterus hurts',
    'vagina',
    'uterus']

In [2]:
import json
with open('search-phrases.json', 'w') as outFile:
    json.dump(queries, outFile)

In [3]:
sum([len(queries[k]) for k in queries.keys()])

55

In [4]:
for category in queries:
    print(f"{category} --- {len(queries[category])} phrases")

inclusive-terms --- 10 phrases
non-gendered --- 20 phrases
female-gendered --- 12 phrases
anatomical-terms --- 13 phrases


<a id="sec2"></a>
## Set up the driver

Creating a headless instance of the driver.

In [6]:
driverpath = 'driver/chromedriver'

# This option is what will skip opening a browser window
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('headless');

# Create the driver instance
service = Service(driverpath)

<a id="sec3"></a>
## Searching function

Create a search function that opens a headless instance of google chrome and searches for a given query. Create a folder of the name of the query and save the generated html in that folder.

In [8]:
import time, os

def search(query):
    """
    This function can search for a given query using Google. It will then save the html page in a new directory.
    Parameters:
    query - a string that contains the phrase that will be searched
    """
    # Create a new instance of the driver for every search
    driver = webdriver.Chrome(service=service, options=chrome_options)
    
    # perform the search, because we need the location link to show
    url = f"https://google.com/search?q={query}"
    driver.get(url)

    # Access the content of the page
    htmlPage = driver.page_source
    
    # if a folder with the name of the query doesn't exist, create it, then save the file
    if not os.path.isdir(f"queries/{query}"):
        os.mkdir(f"queries/{query}")
    with open(f"queries/{query}/{query}.html", 'w', encoding='utf-8') as output:
        output.write(htmlPage)
    print(query)
        
    # close the instance
    driver.close()

### Call the Function for each Query

In [37]:
for category in queries:
    for q in queries[category]:
        search(q)
        time.sleep(10)

Why do people get periods
why do we menstruate?
advice for people who menstruate
how to mitigate dysphoria while menstruating
things that everyone who has a period should know
should menstruators drink while on their period
do trans men stop menstruating?
do trans women menstruate?
do trans women ever want to menstruate?
how does menstruation affect the trans experience?
what menstrual pads are best?
average length of menstrual cycle
how much blood is lost in one period
How do periods work
what's the point of menstruation?what's the deal with menstruation?
why do people menstruate?
why is menstruation so uncomfortable?
how much blood is too much blood
what is a period
how long do periods last
when do i get my period
How do periods work
what is menstruation?
why do humans menstruate?
is menstruating annoying?
will we ever evolve out of menstruating?
why is there blood in my underwear
is my period normal
menstruation and vaginas
breasts hurt during period
tampon isn’t fitting in my vagin