# r/feminism Baseline Analysis

These two datasets contain every archived post from r/feminism between June 2017 to the end of 2017.

In [124]:
femPosts <- read.csv("feminism_posts_06.30.2017-12.31.2107.csv")                          # contains posts
femComments <- read.csv("feminism_comments_on_posts_with_body_06.30.2017-12.31.2107.csv",
                       stringsAsFactors = FALSE) # contains comments

### First question: How many newcomers does the subreddit receive per day?

This depends on how we define a newcomer. For this analysis, I'll break down the definition of 'newcomer' into four possibilities:

- someone posting who has never commented / posted before.
- someone posting who has never comemented before.
- someone commenting who has never commented / posted before.
- someone commenting who has never commented before.

In [126]:
## posted: never commented / posted before
newcomers <- femPosts[femPosts$previous.comments == 0 & femPosts$previous.posts == 0,]
## posted: never commented before
firstPost <- femPosts[femPosts$previous.comments == 0,]

## commented: never commented / posted before
newcomerCommenters <- femComments[femComments$previous.comments == 0 & femComments$previous.posts == 0,]
## commented: never commented before
firstComment <- femComments[femComments$previous.comments == 0,]

In [127]:
# how many days are represented in our data?
(range <- as.numeric(max(as.Date(newcomers$created)) - min(as.Date(newcomers$created))))

In [132]:
round(nrow(newcomers) / range)             # posters: never comment or post
round(nrow(firstPost) / range)             # posters: never comment before
round(nrow(newcomerCommenters) / range)    # comment: never comment or post
round(nrow(firstComment) / range)          # comment: never comment before

On an average day, r/feminism receives:

- 16 posts from users who have never commented / posted before.
- 19 posts from users who have never commented before.
- 18 comments from users who have never commented / posted before.
- 20 comments from users who have never commented before.

### Second question: How many newcomer commenters does the subreddit receive per post?


In [104]:
round(sum(femComments$previous.comments == 0) / nrow(femPosts))

In [136]:
round(sum(femComments$previous.comments == 0 & femComments$previous.posts == 0) / nrow(femPosts))

### Third: How many first-time commenters comment a second time in the first two weeks? In the first three months?

In [106]:
## first, get first-time commenters names
firstTimeNames <- femComments$author[femComments$previous.comments == 0]
firstTimeData <- femComments[femComments$author %in% firstTimeNames,]

nums <- NULL
nums3Months <- NULL

## for each first-time commenter, how many post a second time in two weeks? / 3 months?
for (i in 1:length(firstTimeNames)) {
    
    currentAuthor <- firstTimeNames[i]
    authorData <- firstTimeData[firstTimeData$author == currentAuthor,]
    earliestPost <- which(as.Date(authorData$created) == min(as.Date(authorData$created)))
    earliestPost <- authorData[earliestPost[1],]
    otherPosts <- subset(authorData, id != earliestPost$id)
    
    ## get earliest data
    earliestPost <- as.Date(earliestPost$created)
    
    ## exclude earliest post from analysis
    twoWeekPosts <- authorData[as.Date(otherPosts$created) >= earliestPost & 
                               as.Date(otherPosts$created) <= (earliestPost + 14),]
    threeMonthPosts <- authorData[as.Date(otherPosts$created) >= earliestPost & 
                               as.Date(otherPosts$created) <= (earliestPost + 90),]
    nums <- rbind(nums, nrow(twoWeekPosts))
    nums3Months <- rbind(nums3Months, nrow(threeMonthPosts))
}
print("done")

[1] "done"


### How many times do these users comment two weeks / three months after their first comment?

In [107]:
table(nums)

nums
   0    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
2303  624  263  148   80   53   36   23   19   15    7    7   14    2    4    5 
  17   18   19   20   21   22   23   24   25   27   28   30   33   35   49 
   5    2    3    2    1    2    3    3    1    1    1    2    1    1    1 

In [108]:
table(nums3Months)

nums3Months
   0    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
2124  652  301  169  102   63   45   29   19   20   13   11   18    5    7    5 
  17   18   19   20   21   22   23   24   25   26   28   29   32   33   34   36 
   8    4    4    1    4    2    3    5    1    1    1    1    2    3    1    1 
  38   41   51   54   57   65  102 
   1    1    1    1    1    1    1 

### Among first-time commenters, 37% comment again within two weeks and 42% comment again within 3 months.

In [109]:
c(sum(nums == 0), sum(nums != 0))       # counts
sum(nums != 0) / length(firstTimeNames) # prop.

In [110]:
c(sum(nums3Months == 0), sum(nums3Months != 0)) # counts
sum(nums3Months != 0) / length(firstTimeNames)  # prop.