## Single Customer View
Steps involved in creating SCV
### 1. Build base UserSCV table  
> 1.1. Cleanse the data (Validate Email, format Phone No, Landline No). This is done by calling function on each row in dataFrame. 

> 1.2. An intermediate table is created to hold validated/cleanse data, before transforming original data. 

> 1.3. Create additional fields by combining base fields (FirstName, UserName, LastName, DOB). Some of the combinations are as follows:
            * Firstname_Lastname_RegIP		
            * Firstname_Lastname_LastIP		
            * Firstname_Lastname_Username		
            * Firstname_DOB_City				
            * Firstname_Postcode				
            * Firstname_Mobilephone			
            * DOB_Postcode					
            * Address1_Postcode				
            * Firstname_Lastname_Address1_City
> Create UserSCV hive table with base fields and additional fields.


### 2. For each data load, perform check against base UserCSV. A record is considered same if it meets any one of the criteria:
| FirstName| Lastname | DOB  | Email | Postcode | Result   |
| :-------:| :-------:| :---:| :----:| :-------:| :-------:|
| X|X|X|X|X|**MATCH**|
|  |X|X|X|X|**MATCH**|
| X| |X|X|X|**MATCH**|
| X|X| |X|X|**MATCH**|
| X|X|X| |X|**MATCH**|
| X|X|X|X| |**MATCH**|

**Minimal conditions for match: **

| S.No| Criteria|
| :--:| :------:|
|1.|Firstname + IP Address|
|2.|Firstname + Username|

### 2.1. Data is given as  csv file and converted into Table with cleansed data. Join is performed with UserSCV table and loaded data and eac criteria mentioned above is checked to determine the match with existing Master UserSCV table.  
### 3. If matched records found, insert new version of user record with Related Id into UserSCV table. 


####  Issues faced while building UserSCV

1. **Pre-processing data**: Pre-processing and cleansing posed as main milestone when building base UserSCV table. A function is called on each row to pre-process the data.
2. **Transforming data:** Transforming pre-process data before converting into UserSCV table involves adding many new fields by combining different combinations of exisitin field and assigning each row with unique Id. This unique Id will be used as "Related Id" when matching user record is found in UserSCV table.

In [0]:
# import libraries
from pyspark.sql.types import StringType, IntegerType, TimestampType, DateType, DoubleType, StructType, StructField
import requests
import json
import re
import datetime
import schedule
import time
import pandas as pd
import phonenumbers
from pyspark.sql import SQLContext, Row
from pyspark.sql.functions import  col
from pyspark.sql.functions import unix_timestamp, from_unixtime
from pyspark.sql import functions as F
from pyspark.sql.functions import lit
from pyspark.sql.functions import monotonically_increasing_id

In [0]:
# schema for SCV User Table 
user_schema = StructType([
            StructField("id", IntegerType(), False),
            StructField("Userid", IntegerType(), True),
            StructField("SkinID", StringType(), True),
            StructField("username", StringType(), True),
            StructField("first_name", StringType(), True),
            StructField("last_name", StringType(), True),
            StructField("email", StringType(), True),
            StructField("gender", StringType(), True), 
            StructField("ip_address", StringType(), True), 
            StructField("RegDate", StringType(), True), 
            StructField("RegIP", StringType(), True), 
            StructField("LastIP", StringType(), True), 
            StructField("DOB", StringType(), True), 
            StructField("Postcode", StringType(), True), 
            StructField("MobilePhone", StringType(), True), 
            StructField("Landline", StringType(), True), 
            StructField("Address1", StringType(), True),
            StructField("City", StringType(), True),
            StructField("County", StringType(), True),
            StructField("Country", StringType(), True),
            StructField("SelfExcludedUntil", StringType(), True),
            StructField("Status", StringType(), True)])
            

In [0]:
# schema for incoming stream 
stream_schema = StructType([
            StructField("id", IntegerType(), False),
            StructField("Userid", IntegerType(), True),
            StructField("SkinID", StringType(), True),
            StructField("username", StringType(), True),
            StructField("first_name", StringType(), True),
            StructField("last_name", StringType(), True),
            StructField("email", StringType(), True),
            StructField("gender", StringType(), True), 
            StructField("ip_address", StringType(), True), 
            StructField("RegDate", StringType(), True), 
            StructField("RegIP", StringType(), True), 
            StructField("LastIP", StringType(), True), 
            StructField("DOB", StringType(), True), 
            StructField("Postcode", StringType(), True), 
            StructField("MobilePhone", StringType(), True), 
            StructField("Landline", StringType(), True), 
            StructField("Address1", StringType(), True),
            StructField("City", StringType(), True),
            StructField("County", StringType(), True),
            StructField("Country", StringType(), True),
            StructField("SelfExcludedUntil", StringType(), True),
            StructField("Status", StringType(), True),
            StructField("batch", StringType(), True)])
# The batch field is added to show the batch for a record 

In [0]:
# This function cleans the user MobilePhone
def fixUserMobile(number, country):
  # initialize variables
  is_valid_number = "N"
  clean_number = None
  number_type = None
  valid_mail = None

  p = None

  if number is not None:
      # Clean the Mobile Number first
      try:
          p = phonenumbers.parse(number, country)

          if phonenumbers.is_valid_number(p):
              is_valid_number = "Y"
          elif phonenumbers.truncate_too_long_number(p):
              is_valid_number = "Y"
          else:
              is_valid_number = "N"

          clean_number = "%s%s" % (p.country_code, p.national_number)

      except:
          p = None


  return clean_number

In [0]:
# This function cleans the user row; it cleans the Landline field
def fixUserLandline(phone_no):
  # clean up PhoneNumber
  if phone_no is not None:
    phone_no = phone_no.replace('-', '')
    if (len(phone_no) != 10):
      phone_no = None

  return phone_no

In [0]:
# This function cleans the user Email
def fixUserEmail(email):
  # validate Email 
  valid_mail = None
  if re.match(r"^[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*$", email):
    valid_mail = email
  return valid_mail

In [0]:
# Convert the data pre-processing funcions into udf lambda functions
udf_fixUserMobile = udf(lambda x, y: fixUserMobile(x,y), returnType=StringType())
udf_fixUserLandline = udf(lambda x: fixUserLandline(x), returnType=StringType())
udf_fixUserEmail = udf(lambda x: fixUserEmail(x), returnType=StringType())

In [0]:
# insert matching records into UserSCV table
def insertNewVersionOfUser(tableName):
  df = spark.sql("select  * from " + tableName)
  dateTimeStr = datetime.datetime.today().strftime("%m-%d-%Y %H:%M:%S")

  # select max of id from userSCV table
  lv = sqlContext.sql("select max(ID) as lastVal from UserSCV").collect()
  lastValue = lv[0]["lastVal"]
  df_userSCV = df.select("ID", \
                         "Userid1", \
                         "SkinID1", \
                         "username1", \
                         "first_name1", \
                         "last_name1", \
                         "email1", \
                         "gender1", "ip_address1", "RegDate1", "RegIP1", \
                         "LastIP1", "DOB1", "Postcode1", "MobilePhone1", "Landline1", \
                         "Address11", "City1", "County1", "Country1", \
                         "SelfExcludedUntil1", "Status1", \
                         "EntityId", \
                         "OriginalEmail", \
                         "OriginalFirstname", \
                         "OriginalLastname", \
                         "OriginalRegDate", \
                         "OriginalDOB", \
                         "OriginalPostcode", \
                         "OriginalMobilePhone", \
                         "OriginalAddress1", \
                         "OriginalCity", \
                         "Firstname_Lastname_RegIP", \
                         "Firstname_Lastname_LastIP", \
                         "Firstname_Lastname_Username", \
                         "Firstname_DOB_City",\
                         "Firstname_Postcode", \
                         "Firstname_Mobilephone", \
                         "DOB_Postcode",  \
                         "Address1_Postcode", \
                         "Firstname_Lastname_Address1_City")
  #df_userSCV = df_userSCV.withColumnRenamed("ID", "RelatedID") 
  df_userSCV = df_userSCV.withColumn("RelatedID", col("ID"))
  df_userSCV = df_userSCV.withColumn("Load_date", lit(dateTimeStr))
  df_userSCV = df_userSCV.withColumn("LastModifiedDate", lit(dateTimeStr))
  df_userSCV = df_userSCV.withColumn("CompareStatus", lit(0))
   
  #df_userSCV = df_userSCV.withColumn("ID", monotonically_increasing_id() + lastValue)
  df_userSCV = df_userSCV.select("ID", \
                        col("Userid1").alias("Userid"), col("SkinID1").alias("SkinID"), \
                        col("username1").alias("username"), col("first_name1").alias("first_name"), \
                        col("last_name1").alias("last_name"), col("email1").alias("email"), \
                        col("gender1").alias("gender"), col("ip_address1").alias("ip_address"), \
                        col("RegDate1").alias("RegDate"), col("RegIP1").alias("RegIP"), \
                        col("LastIP1").alias("LastIP"), col("DOB1").alias("DOB"), \
                        col("Postcode1").alias("Postcode"), col("MobilePhone1").alias("MobilePhone"), \
                        col("Landline1").alias("Landline"), col("Address11").alias("Address1"), \
                        col("City1").alias("City"), col("County1").alias("County"), \
                        col("Country1").alias("Country"), col("SelfExcludedUntil1").alias("SelfExcludedUntil"), \
                        col("Status1").alias("Status"), \
                         "RelatedID", \
                         "EntityId", \
                         "OriginalEmail", \
                         "OriginalFirstname", \
                         "OriginalLastname", \
                         "OriginalRegDate", \
                         "OriginalDOB", \
                         "OriginalPostcode", \
                         "OriginalMobilePhone", \
                         "OriginalAddress1", \
                         "OriginalCity", \
                         "Firstname_Lastname_RegIP", \
                         "Firstname_Lastname_LastIP", \
                         "Firstname_Lastname_Username", \
                         "Firstname_DOB_City",\
                         "Firstname_Postcode", \
                         "Firstname_Mobilephone", \
                         "DOB_Postcode",  \
                         "Address1_Postcode", \
                         "Firstname_Lastname_Address1_City", \
                         "Load_date", \
                         "LastModifiedDate",\
                         "CompareStatus")

  df_userSCV.write.insertInto("UserSCV")
  
  

In [0]:
# This function converts the csv file to Spark Data Frame.
def getDataFrameFromStream(df_new):
  # change the column type now
  df_new = df_new.select (col("ID").alias("ID1"), col("Userid").alias("Userid1"), col("SkinID").alias("SkinID1"), \
                        col("username").alias("username1"), col("first_name").alias("first_name1"), \
                        col("last_name").alias("last_name1"), col("email").alias("email1"), \
                        col("gender").alias("gender1"), col("ip_address").alias("ip_address1"), \
                        col("RegDate").alias("RegDate1"), col("RegIP").alias("RegIP1"), \
                        col("LastIP").alias("LastIP1"), col("DOB").alias("DOB1"), \
                        col("Postcode").alias("Postcode1"), col("MobilePhone").alias("MobilePhone1"), \
                        col("Landline").alias("Landline1"), col("Address1").alias("Address11"), \
                        col("City").alias("City1"), col("County").alias("County1"), \
                        col("Country").alias("Country1"), col("SelfExcludedUntil").alias("SelfExcludedUntil1"), \
                        col("Status").alias("Status1")) 
  return df_new
  

In [0]:
# This function creates and/or inserts new records to Output Table - UserSCV
def createOutputTable(tableName):
  # create output table
  df = spark.sql("select * from " + tableName)
  dateTimeStr = datetime.datetime.today().strftime("%m-%d-%Y %H:%M:%S")


  userSCV =  df.withColumn("ID", F.monotonically_increasing_id()) \
    .withColumn("RelatedID", lit(-1).cast(IntegerType())) 
  userSCV = userSCV.withColumn("EntityId", col("ID")) 

  # rename columns 
  userSCV = userSCV.withColumn("OriginalEmail", col("email")) 
  userSCV = userSCV.withColumn("OriginalFirstname", col("first_name")) 
  userSCV = userSCV.withColumn("OriginalLastname", col("last_name")) 
  userSCV = userSCV.withColumn("OriginalRegDate", col("RegDate"))
  userSCV = userSCV.withColumn("OriginalDOB", col("DOB"))
  userSCV = userSCV.withColumn("OriginalPostcode", col("Postcode"))             
  userSCV = userSCV.withColumn("OriginalMobilePhone", col("MobilePhone"))
  userSCV = userSCV.withColumn("OriginalAddress1", col("Address1"))            
  #userSCV = userSCV.withColumn("OriginalAddress2", col("Address2"))            
  userSCV = userSCV.withColumn("OriginalCity", col("City"))
  userSCV = userSCV.withColumn("Firstname_Lastname_RegIP", F.concat(col('first_name'),lit('_'), col('last_name'), lit('_'),col('RegIP') ))       
  userSCV = userSCV.withColumn("Firstname_Lastname_LastIP", \
                               F.concat(col('first_name'),lit('_'), col('last_name'), lit('_'),col('LastIP') ))
  userSCV = userSCV.withColumn("Firstname_Lastname_Username", \
                               F.concat(col('first_name'),lit('_'), col('last_name'), lit('_'),col('Username') ))
  userSCV = userSCV.withColumn("Firstname_DOB_City", F.concat(col('first_name'),lit('_'), col('DOB'), lit('_'),col('City') ))
  userSCV = userSCV.withColumn("Firstname_Postcode", F.concat(col('first_name'),lit('_'), col('Postcode')  )) 
  userSCV = userSCV.withColumn("Firstname_Mobilephone", F.concat(col('first_name'),lit('_'), col('MobilePhone')  ))          
  userSCV = userSCV.withColumn("DOB_Postcode", F.concat(col('DOB'),lit('_'), col('Postcode')  )) 
  userSCV = userSCV.withColumn("Address1_Postcode", F.concat(col('Address1'),lit('_'), col('Postcode')  ))              
  userSCV = userSCV.withColumn("Firstname_Lastname_Address1_City", \
                               F.concat(col('first_name'),lit('_'), col('last_name'), lit('_'),col('Address1'), lit('_'), col('City') ))
  userSCV = userSCV.withColumn("Load_date", lit(dateTimeStr))
  userSCV = userSCV.withColumn("LastModifiedDate", lit(dateTimeStr))
  userSCV = userSCV.withColumn("CompareStatus", lit(0))
  userSCV = userSCV.withColumn("CompareStatus", lit(None).cast(StringType()))
  # Create a HIVE table to save Data fro Dataframe 
  if (len(spark.sql("SHOW TABLES LIKE '" + "UserSCV"+ "'").collect()) == 1):
    userSCV.write.insertInto("UserSCV")
  else:
    userSCV.write.saveAsTable("UserSCV")



In [0]:
# This function converts the Existing User with new User records.
def compareData(tableName):
  spark.sql("REFRESH TABLE  " + tableName)
  df_temp = spark.sql ("select * from " + tableName)
  count = df_temp.count()
  if (count > 0):
    insertNewVersionOfUser(tableName)
  

In [0]:
# streaming starts here by reading the input file 
inputPath = "/FileStore/users/inprogress/"
streamingInputDF = (
  spark
    .readStream
    .schema(stream_schema)
    .option("maxFilesPerTrigger", "1")
    .option("header", "true")
    .csv(inputPath)
)

In [0]:
# Generate the clean stream by applying udf functions to the original stream
clean_stream = ( streamingInputDF.withColumn("MobilePhone", udf_fixUserMobile(streamingInputDF.MobilePhone, streamingInputDF.Country))
                          .withColumn("email", udf_fixUserEmail(streamingInputDF.email))
                          .withColumn("Landline", udf_fixUserLandline(streamingInputDF.Landline)))

In [0]:
def processUserInfo(df_user):
  print("-----------------------------------------------------------------------------------")
  print("Reading streaming data")
  
  # check if UserSCV table exists:
  if (len(spark.sql("SHOW TABLES LIKE '" + "UserSCV"+ "'").collect()) == 0) :
    print("UserSCV table do not exist; hence creating it")
    df_user.createOrReplaceTempView("UserSCV")
  else: 
    # compare the data with existing data in UserSCV
#     userSCV = spark.sql("select * from UserSCV")
    # 1. Rename the base columns 
    df_new =  getDataFrameFromStream(df_user)
    # 2. compare the data
    # check for the minimal condition
    # whether firstName + IP equals
#     print("1. checking for firstName + IP")
#     df_criteria_min = df_user.join(df_new, (df_user.first_name == df_new.first_name1) & (df_user.ip_address == df_new.ip_address1) )
#     df_criteria_min.createOrReplaceTempView("c1_FN_IP") 
#     compareData("c1_FN_IP")

    # This is to check  criteria: FirstName + username 
    print("2. checking for firstName + username") 
    #df_new =  getDataFrameFromCSV(csvFilePath_new, user_schema)
    df_criteria_fn_username = df_user.join(df_new, (df_user.first_name == df_new.first_name1) & (df_user.username == df_new.username1) )
    df_criteria_fn_username.createOrReplaceTempView("c1_FN_username") 
#     compareData("c1_FN_username")

    # check for firstName, Dob and city
#     print("3. checking for firstName + DOB+ City") 
#     df_criteria_fn_dob_city = df_user.join(df_new, (df_user.first_name == df_new.first_name1) \
#                                        & (df_user.DOB == df_new.DOB1) \
#                                        & (df_user.City == df_new.City1) )
#     df_criteria_fn_dob_city.createOrReplaceTempView("c1_fn_dob_city") 
#     compareData("c1_fn_dob_city") 

    # This is to check  criteria: FirstName + postcode 
#     print("4. checking for firstName + PostCode") 
#     df_criteria_fn_postcode = df_user.join(df_new, (df_user.first_name == df_new.first_name1) \
#                                        & (df_user.Postcode == df_new.Postcode1) )
#     df_criteria_fn_postcode.createOrReplaceTempView("c1_fn_postcode") 
#     compareData("c1_fn_postcode")

    # This is to check  criteria: DOB + postcode
#     print("5. checking for DOB + PostCode") 
#     df_criteria_postcode_dob = df_user.join(df_new, (df_user.DOB == df_new.DOB1) \
#                                        & (df_user.Postcode == df_new.Postcode1) )
#     df_criteria_postcode_dob.createOrReplaceTempView("c1_postcode_dob") 
#     compareData("c1_postcode_dob")

    # This is to check  criteria: Address1 + postcode 
#     print("6. checking for Address1 + PostCode") 
#     df_criteria_postcode_addr1 = df_user.join(df_new, (df_user.Address1 == df_new.Address11) \
#                                        & (df_user.Postcode == df_new.Postcode1) )
#     df_criteria_postcode_addr1.createOrReplaceTempView("c1_postcode_addr1") 
#     compareData("c1_postcode_addr1")

    # This is to check  criteria: FirstName + LastName + Address1 + city 
#     print("7. checking for firstName + Address1 + LastName + City") 
#     df_criteria_fn_ln_addr1_city = df_user.join(df_new, (df_user.first_name == df_new.first_name1) \
#                                        & (df_user.last_name == df_new.last_name1)
#                                        & (df_user.Address1 == df_new.Address11) \
#                                        & (df_user.City == df_new.City1) )

#     df_criteria_fn_ln_addr1_city.createOrReplaceTempView("c1_fn_ln_addr1_city") 
#     compareData("c1_fn_ln_addr1_city")

    # check for FirstName and MobilePhone
#     print("8. checking for firstName + MobilePhone") 
#     df_fn_mobile = df_user.join(df_new, (df_user.first_name == df_new.first_name1) & (df_user.MobilePhone == df_new.MobilePhone1) )
#     df_fn_mobile.createOrReplaceTempView("c1_fn_mobile") 
#     compareData("c1_fn_mobile")
  print("-----------------------------------------------------------------------------------")


In [0]:
# This function filters new data
def filterUpdatedDF(df_user, tracker_table):
  if (len(spark.sql("SHOW TABLES LIKE '" + tracker_table+ "'").collect()) == 1):
    spark.sql("REFRESH TABLE " + tracker_table)
    tracker_df = spark.sql("select batch from " + tracker_table).distinct()
    old_batch = df_user.select("batch").intersect(tracker_df)
    df_filtered = df_user.select("*").where(df_user.batch != old_batch.batch)
    df_filtered.drop("batch").createOrReplaceTempView("filteredView")
    df_filtered.select("batch").distinct().write.insertInto(tracker_table)
  else:
    df_user.drop("batch").createOrReplaceTempView("filteredView")
    df_user.select("batch").distinct().write.saveAsTable(tracker_table)
  return spark.sql("select * from filteredView")


In [0]:
row_count = 0
def job():
  global row_count
  if cleaned_stream.count() > row_count:
    total = str(cleaned_stream.count() - row_count)
    print("New csv files received with {} records. Total Records received {}".format(total, str(cleaned_stream.count())))
    row_count = cleaned_stream.count()
    df = cleaned_stream.drop("batch")
    if df.count()>0:
      processUserInfo(df)
    else:
      print("No new record received ...")
  else:
    print("No new csv file received ...")

schedule.every(10).seconds.do(job)

while True:
  schedule.run_pending()
  time.sleep(1)

In [0]:
processUserInfo(clean_stream.drop("batch"))

In [0]:
%sql select * from c1_FN_username

In [0]:
userSCV = spark.sql("select * from UserSCV")

In [0]:
display(userSCV)

ID,Userid,SkinID,username,first_name,last_name,email,gender,ip_address,RegDate,RegIP,LastIP,DOB,Postcode,MobilePhone,Landline,Address1,City,County,Country,SelfExcludedUntil,Status,RelatedID,EntityId,OriginalEmail,OriginalFirstname,OriginalLastname,OriginalRegDate,OriginalDOB,OriginalPostcode,OriginalMobilePhone,OriginalAddress1,OriginalCity,Firstname_Lastname_RegIP,Firstname_Lastname_LastIP,Firstname_Lastname_Username,Firstname_DOB_City,Firstname_Postcode,Firstname_Mobilephone,DOB_Postcode,Address1_Postcode,Firstname_Lastname_Address1_City,Load_date,LastModifiedDate,CompareStatus
0,1,Violet,wmavin0,Wakefield,Mavin,wmavin0@patch.com,Male,145.90.162.218,01/17/2018,208.55.139.3,134.168.200.8,06/27/1968,13205,13156520392,3478153933,5 Hauk Trail,Syracuse,NY,US,5/6/2017,False,-1,0,wmavin0@patch.com,Wakefield,Mavin,01/17/2018,06/27/1968,13205,13156520392,5 Hauk Trail,Syracuse,Wakefield_Mavin_208.55.139.3,Wakefield_Mavin_134.168.200.8,Wakefield_Mavin_wmavin0,Wakefield_06/27/1968_Syracuse,Wakefield_13205,Wakefield_13156520392,06/27/1968_13205,5 Hauk Trail_13205,Wakefield_Mavin_5 Hauk Trail_Syracuse,06-12-2018 16:43:15,06-12-2018 16:43:15,
1,2,Puce,rpankettman1,Rosabella,Pankettman,rpankettman1@github.io,Female,116.160.176.111,02/06/2018,24.115.80.212,45.0.33.98,08/03/1968,13217,13158379757,7166225965,909 Wayridge Place,Syracuse,NY,US,5/15/2017,True,-1,1,rpankettman1@github.io,Rosabella,Pankettman,02/06/2018,08/03/1968,13217,13158379757,909 Wayridge Place,Syracuse,Rosabella_Pankettman_24.115.80.212,Rosabella_Pankettman_45.0.33.98,Rosabella_Pankettman_rpankettman1,Rosabella_08/03/1968_Syracuse,Rosabella_13217,Rosabella_13158379757,08/03/1968_13217,909 Wayridge Place_13217,Rosabella_Pankettman_909 Wayridge Place_Syracuse,06-12-2018 16:43:15,06-12-2018 16:43:15,
2,3,Green,lhachette2,Lindsy,Hachette,lhachette2@apple.com,Female,149.102.46.105,09/14/2017,195.26.30.138,184.102.154.95,05/29/1965,10125,12125499210,5188991277,90 Shoshone Trail,New York City,NY,US,5/10/2017,False,-1,2,lhachette2@apple.com,Lindsy,Hachette,09/14/2017,05/29/1965,10125,12125499210,90 Shoshone Trail,New York City,Lindsy_Hachette_195.26.30.138,Lindsy_Hachette_184.102.154.95,Lindsy_Hachette_lhachette2,Lindsy_05/29/1965_New York City,Lindsy_10125,Lindsy_12125499210,05/29/1965_10125,90 Shoshone Trail_10125,Lindsy_Hachette_90 Shoshone Trail_New York City,06-12-2018 16:43:15,06-12-2018 16:43:15,
3,4,Pink,hbance3,Harp,Bance,hbance3@indiegogo.com,Male,142.252.161.131,10/23/2017,209.200.97.103,90.216.208.96,05/03/1980,14652,15855116341,3476003754,68 Birchwood Avenue,Rochester,NY,US,8/25/2017,False,-1,3,hbance3@indiegogo.com,Harp,Bance,10/23/2017,05/03/1980,14652,15855116341,68 Birchwood Avenue,Rochester,Harp_Bance_209.200.97.103,Harp_Bance_90.216.208.96,Harp_Bance_hbance3,Harp_05/03/1980_Rochester,Harp_14652,Harp_15855116341,05/03/1980_14652,68 Birchwood Avenue_14652,Harp_Bance_68 Birchwood Avenue_Rochester,06-12-2018 16:43:15,06-12-2018 16:43:15,
4,5,Indigo,vmatzkaitis4,Vitia,Matzkaitis,vmatzkaitis4@ox.ac.uk,Female,153.214.222.184,03/04/2017,188.62.77.155,19.142.249.219,11/27/1984,11231,12127641807,3152282744,4495 Mcbride Parkway,Brooklyn,NY,US,2/1/2018,True,-1,4,vmatzkaitis4@ox.ac.uk,Vitia,Matzkaitis,03/04/2017,11/27/1984,11231,12127641807,4495 Mcbride Parkway,Brooklyn,Vitia_Matzkaitis_188.62.77.155,Vitia_Matzkaitis_19.142.249.219,Vitia_Matzkaitis_vmatzkaitis4,Vitia_11/27/1984_Brooklyn,Vitia_11231,Vitia_12127641807,11/27/1984_11231,4495 Mcbride Parkway_11231,Vitia_Matzkaitis_4495 Mcbride Parkway_Brooklyn,06-12-2018 16:43:15,06-12-2018 16:43:15,
5,6,Pink,fkrauze5,Francene,Krauze,fkrauze5@seesaa.net,Female,81.75.195.232,08/04/2017,4.200.198.170,107.52.205.146,07/09/1974,10249,12122494591,9174554402,2605 Fordem Center,New York City,NY,US,3/13/2018,False,-1,5,fkrauze5@seesaa.net,Francene,Krauze,08/04/2017,07/09/1974,10249,12122494591,2605 Fordem Center,New York City,Francene_Krauze_4.200.198.170,Francene_Krauze_107.52.205.146,Francene_Krauze_fkrauze5,Francene_07/09/1974_New York City,Francene_10249,Francene_12122494591,07/09/1974_10249,2605 Fordem Center_10249,Francene_Krauze_2605 Fordem Center_New York City,06-12-2018 16:43:15,06-12-2018 16:43:15,
6,7,Blue,scliff6,Selby,Cliff,scliff6@stumbleupon.com,Male,46.39.126.38,12/08/2017,114.108.70.31,60.17.51.193,06/22/1982,10275,12125503348,5181020553,5082 Toban Terrace,New York City,NY,US,5/12/2017,True,-1,6,scliff6@stumbleupon.com,Selby,Cliff,12/08/2017,06/22/1982,10275,12125503348,5082 Toban Terrace,New York City,Selby_Cliff_114.108.70.31,Selby_Cliff_60.17.51.193,Selby_Cliff_scliff6,Selby_06/22/1982_New York City,Selby_10275,Selby_12125503348,06/22/1982_10275,5082 Toban Terrace_10275,Selby_Cliff_5082 Toban Terrace_New York City,06-12-2018 16:43:15,06-12-2018 16:43:15,
7,8,Violet,grulton7,Gretta,Rulton,grulton7@nbcnews.com,Female,33.223.128.52,11/10/2017,51.38.123.40,117.109.43.3,04/15/1996,10099,16461959692,5856227253,693 Messerschmidt Lane,New York City,NY,US,12/2/2017,True,-1,7,grulton7@nbcnews.com,Gretta,Rulton,11/10/2017,04/15/1996,10099,16461959692,693 Messerschmidt Lane,New York City,Gretta_Rulton_51.38.123.40,Gretta_Rulton_117.109.43.3,Gretta_Rulton_grulton7,Gretta_04/15/1996_New York City,Gretta_10099,Gretta_16461959692,04/15/1996_10099,693 Messerschmidt Lane_10099,Gretta_Rulton_693 Messerschmidt Lane_New York City,06-12-2018 16:43:15,06-12-2018 16:43:15,
8,9,Orange,ebartoszewski8,Esteban,Bartoszewski,ebartoszewski8@dell.com,Male,151.243.237.163,04/15/2018,218.12.107.71,5.124.46.243,07/09/1985,10110,16463474899,7163390281,44 Pankratz Point,New York City,NY,US,10/23/2017,True,-1,8,ebartoszewski8@dell.com,Esteban,Bartoszewski,04/15/2018,07/09/1985,10110,16463474899,44 Pankratz Point,New York City,Esteban_Bartoszewski_218.12.107.71,Esteban_Bartoszewski_5.124.46.243,Esteban_Bartoszewski_ebartoszewski8,Esteban_07/09/1985_New York City,Esteban_10110,Esteban_16463474899,07/09/1985_10110,44 Pankratz Point_10110,Esteban_Bartoszewski_44 Pankratz Point_New York City,06-12-2018 16:43:15,06-12-2018 16:43:15,
9,10,Orange,jlappin9,Jourdain,Lappin,jlappin9@forbes.com,Male,51.167.45.250,02/16/2017,254.106.123.207,73.126.165.25,04/18/1998,14210,17161288347,7162084343,8 Farwell Point,Buffalo,NY,US,8/21/2017,False,-1,9,jlappin9@forbes.com,Jourdain,Lappin,02/16/2017,04/18/1998,14210,17161288347,8 Farwell Point,Buffalo,Jourdain_Lappin_254.106.123.207,Jourdain_Lappin_73.126.165.25,Jourdain_Lappin_jlappin9,Jourdain_04/18/1998_Buffalo,Jourdain_14210,Jourdain_17161288347,04/18/1998_14210,8 Farwell Point_14210,Jourdain_Lappin_8 Farwell Point_Buffalo,06-12-2018 16:43:15,06-12-2018 16:43:15,


In [0]:
%sql select count(*) from userSCV

count(1)
916


In [0]:
%sql select count(*) from users

count(1)
300
