# Kickstarter

What will make your project at Kickstarter successful?

## Import stuff

In [74]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns

plt.style.use('fivethirtyeight')

%matplotlib inline

## Load the data

In [75]:
data = pd.read_csv('./DSI_kickstarterscrape_dataset.csv')

## Look at the data

Take an initial look at the data and especially the columns

In [76]:
data.head(3)

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.0
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.0


How many rows and columns

In [77]:
data.shape

(45957, 17)

Get more info about the data

In [78]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45957 entries, 0 to 45956
Data columns (total 17 columns):
project id           45957 non-null int64
name                 45957 non-null object
url                  45957 non-null object
category             45957 non-null object
subcategory          45957 non-null object
location             44635 non-null object
status               45957 non-null object
goal                 45957 non-null float64
pledged              45945 non-null float64
funded percentage    45957 non-null float64
backers              45957 non-null int64
funded date          45957 non-null object
levels               45957 non-null int64
reward levels        45898 non-null object
updates              45957 non-null int64
comments             45957 non-null int64
duration             45957 non-null float64
dtypes: float64(4), int64(5), object(8)
memory usage: 6.0+ MB


Get more info about missing data

In [79]:
data.isnull().sum()

project id              0
name                    0
url                     0
category                0
subcategory             0
location             1322
status                  0
goal                    0
pledged                12
funded percentage       0
backers                 0
funded date             0
levels                  0
reward levels          59
updates                 0
comments                0
duration                0
dtype: int64

Look for duplicates

In [80]:
data.duplicated().sum()

89

Get statistical info about the data (might need to be redone after other things are fixed with the data). Here we can also see signs of outliers.

In [81]:
data.describe()

Unnamed: 0,project id,goal,pledged,funded percentage,backers,levels,updates,comments,duration
count,45957.0,45957.0,45945.0,45957.0,45957.0,45957.0,45957.0,45957.0,45957.0
mean,1080800000.0,11942.71,4980.75,1.850129,69.973192,8.004939,4.08508,8.379529,39.995547
std,621805700.0,188758.3,56741.62,88.492706,688.628479,4.233907,6.43922,174.015737,17.414458
min,39409.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,1.0
25%,543896200.0,1800.0,196.0,0.044,5.0,5.0,0.0,0.0,30.0
50%,1078345000.0,4000.0,1310.0,1.0,23.0,7.0,2.0,0.0,32.0
75%,1621596000.0,9862.0,4165.0,1.11564,59.0,10.0,6.0,3.0,48.39
max,2147460000.0,21474840.0,10266840.0,15066.0,87142.0,80.0,149.0,19311.0,91.96


### What I need to fix

- Category column: Make dummies out of categories
- Subcategory column: Make dummies out of subcategories
- Location column: Split into 2 new columns (city and state) - make dummies?
- Status column: Give the status numbers. (1 = successful, 2 = failed, 3 = live)
- Funded date column: Make into date time format
- Reward levels column: Make numeric


Handle missing values.<BR/>
Look for duplicates.<BR/>
Look for outliers.<BR/>
...

## Work the data (EDA/Munging)

### Duplicates

#### Handle

### Missing values

### funded date column: Split column into different columns

In [82]:
data['funded date'].head()

0    Fri, 19 Aug 2011 19:28:17 -0000
1    Mon, 02 Aug 2010 03:59:00 -0000
2    Fri, 08 Jun 2012 00:00:31 -0000
3    Sun, 08 Apr 2012 02:14:00 -0000
4    Wed, 01 Jun 2011 15:25:39 -0000
Name: funded date, dtype: object

In [83]:
new_date = data['funded date'].str.split(", ", n=1, expand=True)
data["weekday"]= new_date[0] 
data["date"]= new_date[1] 

In [84]:
new_date2 = data['date'].str.split(" ", n=1, expand=True)
data["nr_in_month"]= new_date2[0] 
data["date2"]= new_date2[1] 

In [85]:
new_date3 = data['date2'].str.split(" ", n=1, expand=True)
data["month"]= new_date3[0] 
data["date3"]= new_date3[1] 

In [86]:
new_date4 = data['date3'].str.split(" ", n=1, expand=True)
data["year"]= new_date4[0] 
data["time"]= new_date4[1] 

In [87]:
# remove columns not needed
data = data.drop(['date', 'date2', 'date3'], axis=1)
data.head(3)

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,...,levels,reward levels,updates,comments,duration,weekday,nr_in_month,month,year,time
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,...,7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.0,Fri,19,Aug,2011,19:28:17 -0000
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,...,5,"$1,$5,$10,$25,$50",6,0,47.18,Mon,2,Aug,2010,03:59:00 -0000
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,...,10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.0,Fri,8,Jun,2012,00:00:31 -0000


In [88]:
### Make day column to number (1 = monday, 2 = tuesday etc.)
data['weekday_nr'] = data['weekday'].replace({'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5, 'Sat': 6, 'Sun':7})

In [89]:
### Make month column to number (1 = jan, 2 = feb etc.)
data['month_nr'] = data['month'].replace({'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul':7,
                                             'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12})

In [90]:
data.head()

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,...,updates,comments,duration,weekday,nr_in_month,month,year,time,weekday_nr,month_nr
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,...,10,2,30.0,Fri,19,Aug,2011,19:28:17 -0000,5,8
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,...,6,0,47.18,Mon,2,Aug,2010,03:59:00 -0000,1,8
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,...,1,0,28.0,Fri,8,Jun,2012,00:00:31 -0000,5,6
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,...,4,0,32.22,Sun,8,Apr,2012,02:14:00 -0000,7,4
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.0,...,2,0,30.0,Wed,1,Jun,2011,15:25:39 -0000,3,6


In [91]:
data[['project id', 'name', 'url', 'category', 'subcategory', 'location', 'status', 'goal', 'pledged', 
             'funded percentage', 'reward levels', 'updates', 'comments', 'duration', 'weekday', 'weekday_nr', 
             'month', 'month_nr', 'year', 'time']]

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,reward levels,updates,comments,duration,weekday,weekday_nr,month,month_nr,year,time
0,39409,WHILE THE TREES SLEEP,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.00,Fri,5,Aug,8,2011,19:28:17 -0000
1,126581,Educational Online Trading Card Game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005000,"$1,$5,$10,$25,$50",6,0,47.18,Mon,1,Aug,8,2010,03:59:00 -0000
2,138119,STRUM,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.002800,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.00,Fri,5,Jun,6,2012,00:00:31 -0000
3,237090,GETTING OVER - One son's search to finally kno...,http://www.kickstarter.com/projects/charnick/g...,Film & Video,Documentary,"Los Angeles, CA",successful,6000.0,6535.0,1.089167,"$1,$10,$25,$30,$50,$75,$85,$100,$110,$250,$500...",4,0,32.22,Sun,7,Apr,4,2012,02:14:00 -0000
4,246101,The Launch of FlyeGrlRoyalty &quot;The New Nam...,http://www.kickstarter.com/projects/flyegrlroy...,Fashion,Fashion,"Novi, MI",failed,3500.0,0.0,0.000000,"$10,$25,$50,$100,$150,$250",2,0,30.00,Wed,3,Jun,6,2011,15:25:39 -0000
5,316217,Dinner Party - a short film about friendship.....,http://www.kickstarter.com/projects/249354515/...,Film & Video,Short Film,"Portland, OR",successful,3500.0,3582.0,1.023331,"$5,$25,$50,$100,$250,$500,$1,000",8,0,21.43,Wed,3,Jun,6,2011,13:33:00 -0000
6,325034,Mezzo,http://www.kickstarter.com/projects/geoffsaysh...,Film & Video,Short Film,"Collegedale, TN",failed,1000.0,280.0,0.280000,"$5,$10,$25,$50,$100",0,0,30.00,Sat,6,Feb,2,2012,02:17:08 -0000
7,407836,Help APORTA continue to make handwoven/knit ac...,http://www.kickstarter.com/projects/1078097864...,Fashion,Fashion,"Chicago, IL",successful,2000.0,2180.0,1.090000,"$10,$20,$50,$100,$250,$500,$1,000",13,5,30.00,Fri,5,Dec,12,2011,04:36:53 -0000
8,436325,Music - Comedy - Album!,http://www.kickstarter.com/projects/mattgriffo...,Music,Music,"Chicago, IL",successful,1000.0,1125.0,1.125000,"$5,$8,$10,$15,$20,$30,$50,$100,$120,$250,$500,...",10,1,67.53,Sun,7,Apr,4,2010,04:59:00 -0000
9,610918,The Apocalypse Calendar,http://www.kickstarter.com/projects/tqvinn/the...,Art,Illustration,"Chicago, IL",successful,7500.0,9836.0,1.311527,"$1,$20,$35,$50,$60,$100,$110,$500,$1,000,$1,500",6,5,35.29,Tue,2,Nov,11,2011,04:59:00 -0000


### name column: Turn all letters to lower case

In [13]:
data['name'] = data['name'].str.lower()
data.head(3)

Unnamed: 0,project id,name,url,category,subcategory,location,status,goal,pledged,funded percentage,backers,funded date,levels,reward levels,updates,comments,duration
0,39409,while the trees sleep,http://www.kickstarter.com/projects/emiliesaba...,Film & Video,Short Film,"Columbia, MO",successful,10500.0,11545.0,1.099524,66,"Fri, 19 Aug 2011 19:28:17 -0000",7,"$25,$50,$100,$250,$500,$1,000,$2,500",10,2,30.0
1,126581,educational online trading card game,http://www.kickstarter.com/projects/972789543/...,Games,Board & Card Games,"Maplewood, NJ",failed,4000.0,20.0,0.005,2,"Mon, 02 Aug 2010 03:59:00 -0000",5,"$1,$5,$10,$25,$50",6,0,47.18
2,138119,strum,http://www.kickstarter.com/projects/185476022/...,Film & Video,Animation,"Los Angeles, CA",live,20000.0,56.0,0.0028,3,"Fri, 08 Jun 2012 00:00:31 -0000",10,"$1,$10,$25,$40,$50,$100,$250,$1,000,$1,337,$9,001",1,0,28.0


## Was the initial findings in themself correct?

#### Best lenght for compaign?
more then $6,000 - 35 days

less then $6,000 - a shorter duration is better

#### Different pledge goals at low and high amonts
But for both intervals a low pledge seem to me more successful. (No surprise)

#### Some types of campaign are more successful than others
<u>Category</u>

Dance, theater, music (most successfull)

Fashion, publishing, technology (fashion is by far least successfull)


<u>Sub categorys of the Music category</u>

Indie rock, country & folk, jazz, Classical music (most successfull)

Hip hop, electronic music, world music (Hip hop is by far least successfull)


#### When to launch?

Month: February

Day: Monday

Time: 9:00 am (EST)

(The variables was studied individually, this means that a Monday in February at 9:00 am might not necessarily be the best time.)

#### More comments, more success

Dubble the successrate with 1-2 comments

Tripple the successrate with 10 comments