# Main task: Web Scraping for answers and upvotes in Quora Website
- Extract tags under each question
    - In the following jupyter notebook, predict tags for each question.

E.g. The question on Quora, "What is the expected price of Bitcoin in 2018?" has tags including **Virtual Currencies, Bitcoin, and Cryptocurrencies.**
<img height="500" width="700" src="tag.png" /> 

### 1. Import packages

In [36]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.keys import Keys
import time
import csv
import sys
import datetime
import pandas as pd
import string

### 2. Read the dataset
- Originally, 6790 questions are in the all_link.csv. I collected before using web-scraping techniques.
- Collect tags for all the questions (from all_link.csv) take time, so I collected tags for 1200 questions (nearly 20%).

In [37]:
# Collect tags by 300 questions at one time
links = pd.read_csv("all_link.csv")
print(len(links))

test_links = links['links'][900:1200].tolist()
print(len(test_links))

7161
300


### Record collecting histories
#### March 16, 0-300 links 
- (last one: https://www.quora.com/Is-Bitcoin-just-a-big-con-and-nothing-more-than-a-hype-or-does-anybody-make-any-money-with-it)

#### March 17, 300-600 links
- (last one: https://www.quora.com/How-can-I-trade-Bitcoin-at-100X-leverage-or-margin)

#### March 18, 600-900 links
- (last one: https://www.quora.com/Has-Bruce-Wayne-invested-in-Bitcoin)

#### March 18, 900-1200 links
- (last one: https://www.quora.com/What-will-be-the-highest-price-of-Bitcoin-in-this-February)

In [38]:
test_links[-1]

'https://www.quora.com/What-will-be-the-highest-price-of-Bitcoin-in-this-February'

### 3. Main part: Extract answers and upvotes for each question

In [39]:
#questions = []
tags = {}
answers = []
start_time = datetime.datetime.now() 

for link in test_links:
    #print(link)
    # Specify path where I save the webdriver for windows
    executable_path = './geckodriver'

    # Initiator the webdriver for Firefox browser
    driver = webdriver.Firefox(executable_path=executable_path)

    # Wait to let webdriver complete the initialization
    driver.wait = WebDriverWait(driver, 5)

    # send a request to the website
    driver.get(link)
    
    # scroll down page function
    body = driver.find_element_by_css_selector('body')
    
    for each in range(10):

        # scroll page down
        body.send_keys(Keys.PAGE_DOWN)

        # sleep; wait until pages to be loaded
        time.sleep(2)
    
    # search & store questions (titles) part
    selen_questions=driver.find_elements_by_css_selector("span.QuestionText span.rendered_qtext")
    
    #for question in selen_questions:
        #questions.append(question.text)
     
    # search & store tags part
    selen_tags = driver.find_elements_by_css_selector("span.name_text span.TopicName.TopicNameSpan")
    
    tag_ls = []
    for each in selen_tags:
        tag_ls.append(each.text)
        
    for question in selen_questions:
        tags[question.text] = tag_ls
    

    # search & store answer part
    selen_answers = driver.find_elements_by_css_selector("div.ui_qtext_expanded span.ui_qtext_rendered_qtext")
    
    full_text = []
    
    if selen_answers == []:
        full_text = "NA"
        answers.append([question.text, full_text])
        
    else:
        for each in selen_answers:
            answers.append([question.text, each.text])
        
    driver.quit()

print("running time:", datetime.datetime.now()-start_time)

https://www.quora.com/unanswered/How-regulated-are-Bitcoin-s-exchanges
https://www.quora.com/Does-a-bank-have-the-legal-right-to-stop-you-from-making-specific-purchases-Bitcoin-Are-they-not-reneging-on-the-promise-of-credit
https://www.quora.com/Is-the-Bitcoin-rally-over
https://www.quora.com/How-do-I-buy-Bitcoins-in-the-UK-I-have-an-account-on-Bittrex-Are-there-any-secure-websites-or-exchanges-where-I-can-buy-from
https://www.quora.com/I-have-some-money-I-want-to-invest-in-Bitcoins-Is-it-advisable-now-to-take-such-venture
https://www.quora.com/Is-Bitcoin-and-other-cryptocurrency-not-going-to-survive-again
https://www.quora.com/unanswered/Is-this-Bitcoin-graph-authentic
https://www.quora.com/How-much-does-the-cost-of-selling-and-buying-Bitcoin-do
https://www.quora.com/Should-I-maintain-my-stop-loss-at-6-000-on-Bitcoin-or-adjust-down-to-5-000
https://www.quora.com/There-is-a-login-problem-with-Coinsecure-Are-they-afraid-that-Bitcoin-will-crash-or-do-they-not-want-users-to-sell-the-coins

https://www.quora.com/Is-it-legal-for-an-Indian-citizen-to-wire-money-to-a-foreign-cryptocurrency-exchange-like-Bitfinex-or-Bitstamp-to-purchase-crypto-Would-Indian-banks-allow-the-transaction
https://www.quora.com/How-much-would-be-charged-if-I-transfer-my-bitcoin-from-coinbase-to-Zebpay-I-want-to-exchange-to-an-Indian-wallet-from-coinbase-I-bought-it-while-I-was-in-US-as-I-have-returned-back-to-India-and-now-I-want-to-sell
https://www.quora.com/unanswered/What-is-Bitcoin-mining-and-other-work-in-this-currency
https://www.quora.com/Is-it-true-that-every-BTC-crash-coincides-with-the-24-days-before-the-Chinese-New-Year
https://www.quora.com/Is-it-safe-to-invest-in-Bitcoin-or-other-cryptocurrencies-in-India
https://www.quora.com/Should-I-have-listened-to-my-millennial-son-and-placed-my-retirement-money-into-Bitcoin
https://www.quora.com/How-is-your-country-as-of-February-2018
https://www.quora.com/unanswered/Do-transaction-fees-depreciating-Bitcoin-value
https://www.quora.com/Bitcoin-kee

https://www.quora.com/Why-do-many-disregard-Bitcoin-but-not-blockchain-technology
https://www.quora.com/I-invested-50k-in-btc-and-alts-when-it-was-13k-Im-worried-but-I-dont-want-to-sell-at-loss-Willing-to-hold-6-12months-Will-it-go-up
https://www.quora.com/The-price-of-Bitcoin-is-falling-quickly-Will-it-really-reach-1-million-by-2020-like-John-McAfee-predicted
https://www.quora.com/What-are-the-tax-implications-if-I-buy-Bitcoin-in-the-US-and-sell-it-in-India
https://www.quora.com/How-does-shorting-Bitcoin-really-work-Can-you-provide-a-non-technical-description-of-how-the-transaction-open-and-close-is-recorded-within-the-blockchain-This-is-in-regards-to-Bitcoin-itself
https://www.quora.com/Do-you-think-it-is-a-good-idea-to-invest-in-bitcoin-stocks-as-a-fresher-in-corporate
https://www.quora.com/Is-Bitcoins-atomic-swap-going-to-make-it-much-harder-for-forensic-authorities-to-trace-transactions-especially-involving-coins-such-as-Monero-or-Dash
https://www.quora.com/For-long-term-holding-s

https://www.quora.com/Is-the-Stratis-coin-still-rising-Do-you-think-itll-be-around-like-Bitcoin
https://www.quora.com/Why-did-India-ban-cryptocurrency
https://www.quora.com/Is-BTC-difficulty-going-to-decrease-in-the-next-weeks-because-of-the-low-prices-or-will-it-stay-at-least-stable
https://www.quora.com/Is-it-possible-to-buy-Altcoins-with-cash-or-only-with-Bitcoin
https://www.quora.com/unanswered/What-is-the-worst-Bitcoin-loss-story-youve-heard-or-experienced
https://www.quora.com/Is-the-current-downfall-of-crypto-currency-similar-to-the-one-when-China-banned-Bitcoin
https://www.quora.com/Is-there-a-website-that-sells-Bitcoins-for-cash-or-where-you-can-pay-with-a-credit-card-How-do-those-companies-buy-Bitcoin-with-cash
https://www.quora.com/Should-I-set-my-stop-loss-on-bitcoin-at-5-000
https://www.quora.com/Where-do-you-see-Ardor-ADR-by-the-end-of-2018
https://www.quora.com/What-would-happen-if-some-master-hacker-managed-to-delete-all-virtual-money-bitcoins-bank-accounts-etc-in-a-cou

### 4. Store (Update) data

In [41]:
with open('tag.csv', mode='r') as infile:
    reader = csv.reader(infile)
    next(reader)
    mydict = {rows[0]:rows[1:] for rows in reader}

In [42]:
len(tags)

300

In [43]:
len(mydict)

898

In [44]:
mydict.update(tags)

In [45]:
len(mydict)

1198

In [48]:
csv = open("tag.csv", "w") 

columnTitleRow = "question, tag\n"
csv.write(columnTitleRow)

for key, value in mydict.items():
    question = key.replace(',',';')
    tag = str(value)
    row = question + "," + tag + "\n"
    csv.write(row)