Overview: Scrape questions, answers, comments, and metadata from StackOverflow, specifically questions about R, starting at the URL.
- start at this first page of the search results on stackoverflow
- get the links to the questions on this page
- get the next page of results and the links to the questions on that
- process the questions on the first 3 and last page of results of the search results, fetching 50 results/questions per page
The goal is to explore the source of the HTML pages for the search results to find HTML structures identifying the elements of interest described below:
- the number of views of the question
- the number of votes
- the text of the question
- the tags for the question
- when the question was posted
- the user/display name of the person posting the question, their reputation, and how many gold, silver, and bronze badges they have
- who edited the question and when
- for each answer/comment, find: the text, the person who posted, when they posted, and their reputation and badge information