## Understanding the for-loop 
Learning to use vectorized operations is a key skill in R.
Why? 

A vectorized function works not just on a single value, but on a whole vector of values at the same time.
http://www.dummies.com/programming/r/how-to-vectorize-your-functions-in-r/

In [101]:
library(tidyverse)
library(data.table)
library(rbenchmark)
options(warn=-1)
set.seed(123)

# load data 
df <- read.csv('http://datashaping.com/passwords.txt', header = F, skip = 16) %>%
                sample_n(10) %>% 
                rename(password = V1)
head(df, 5)

Unnamed: 0,password
310629,Bigmaccas
851491,0127515559
441758,dbqky73p
953793,sewing12
1015846,990990990990


#### My attempt: wrong!

As you can see, the result is *always* the same for each column I want to create.
Let's understand why.

In [64]:
# define patterns
patterns = c("[a-z]+","[A-Z]+","[A-Za-z]+")

# empty columns
df$has_lower <- 0 
df$has_upper <- 0
df$has_numeric <- 0

# start the loop
for(i in 1:nrow(df)){
    for(j in patterns){
        n <- ifelse(grepl(j, df$password[i]),1,0)
        }
    df$has_lower[i] <- n
    df$has_upper[i] <- n 
    df$has_numeric[i] <- n
}
head(df)

Unnamed: 0,password,has_lower,has_upper,has_numeric
310629,Bigmaccas,1,1,1
851491,0127515559,0,0,0
441758,dbqky73p,1,1,1
953793,sewing12,1,1,1
1015846,990990990990,0,0,0
49208,5cmajo76,1,1,1


### Method 1: canonical for-loop

First you need to update `has_lower`,`has_upper`,`has_numeric` within the j loop otherwise your n remains the same for this 3 cases. 
To do so you need to be able to loop over the names of the columns:

In [73]:
# define patterns
patterns = c("[a-z]+","[A-Z]+","[A-Za-z]+")
names <- c("has_lower","has_upper","has_numeric")

for(i in 1:nrow(df)){
    # used 1:length(patterns) where
    # length(pattern)=3
  for(j in 1:length(patterns)){
      # df[i,(names[j])] selects the row i for the column named after names[j]. 
      # The brackets around names[j] tells R that it has to use the value 
      # of the variable names[j] to look for the corresponding column in df. 
      # There is no column named names[j] in df but there are columns named has_lower, has_upper etc..
    df[i,(names[j])] <- as.numeric(grepl(j, df$password[i]))
  }
}


### Method 2: apply with data.table
A quicker, nicer, more compact alternative using `apply` and the fact that `grepl` is already vectorized:

In [76]:
df[, c("has_lower","has_upper","has_numeric"):=lapply(patterns, function(x) grepl(x,dt$password))]
df

password,has_lower,has_upper,has_numeric
Bigmaccas,True,True,True
0127515559,False,False,False
dbqky73p,True,False,True
sewing12,True,False,True
990990990990,False,False,False
5cmajo76,True,False,True
acolite4,True,False,True
Ladychamp09,True,True,True
198246,False,False,False
230203zx,True,False,True


### Method 3: a little semplification: aggregating the pattern in a vector containing already the names

We can simplify things if we just name your pattern vector.

Basically we just loop through each of the names, grab the regular expression corresponding to that name, then do the matching and adding the column.

In [67]:
patterns = c(has_lower="[a-z]",
             has_upper="[A-Z]",
             has_numeric="[0-9]+")

for(i in names(patterns)) {
  df[, i] = as.numeric(grepl(patterns[i], df$password))
}
head(df)

password,has_lower,has_upper,has_numeric
Bigmaccas,1,1,0
0127515559,0,0,1
dbqky73p,1,0,1
sewing12,1,0,1
990990990990,0,0,1
5cmajo76,1,0,1


### Method 4: easy-peasy
A data frame is above all a list.
So, you can simply do:

In [68]:
df <- as.data.frame(df)
patterns = c("[a-z]+","[A-Z]+","[A-Za-z]+")

df[c("has_lower", "has_upper", "has_numeric")] <- 
  lapply(patterns, function(pattern) grepl(pattern, df$password) + 0)
head(df)

password,has_lower,has_upper,has_numeric
Bigmaccas,1,1,1
0127515559,0,0,0
dbqky73p,1,0,1
sewing12,1,0,1
990990990990,0,0,0
5cmajo76,1,0,1


Use `+ 0L` instead of `+ 0` is you want integers instead of doubles (I would recommend to do nothing and to keep logicals).

### And now let's benchmark!

In [111]:
# rbenchmark
benchmark_table <- benchmark("simple_loop" = {
    patterns = c("[a-z]+","[A-Z]+","[A-Za-z]+")
    names <- c("has_lower","has_upper","has_numeric")
    for(i in 1:nrow(df)){
      for(j in 1:length(patterns)){
        df[i,(names[j])] <- as.logical(grepl(j, df$password[i]))
      }
    }
}, 
# 2 method
"data.table_apply" = {
    dt <- setDT(df)
    dt[, c("has_lower","has_upper","has_numeric"):=lapply(patterns, function(x) grepl(x,dt$password))]
 },
"apply.with.columns.trick" = {
        patterns = c(has_lower="[a-z]",
                     has_upper="[A-Z]",
                     has_numeric="[0-9]+")
    for(i in names(patterns)) {
      df[, i] = as.logical(grepl(patterns[i], df$password))
    }
# # 4 method        
},"easy-peasy" = {
    df[,c("has_lower", "has_upper", "has_numeric")] <- lapply(patterns, function(pattern) grepl(pattern, df$password) + 0)
},
replications = 10,
columns = c("test", "replications", "elapsed", "relative", "user.self", "sys.self"))
benchmark_table

Unnamed: 0,test,replications,elapsed,relative,user.self,sys.self
3,apply.with.columns.trick,10,0.298,49.667,0.277,0.02
2,data.table_apply,10,0.006,1.0,0.006,0.0
4,easy-peasy,10,0.036,6.0,0.025,0.012
1,simple_loop,10,4.935,822.5,4.908,0.026


#### It's time to start learning data.table :( - damn! 