## Facebook Recruiting IV: Human or Robot?

train.csv

* bidder_id – Unique identifier of a bidder.
* payment_account – Payment account associated with a bidder. These are obfuscated to protect privacy. 
* address – Mailing address of a bidder. These are obfuscated to protect privacy. 
* outcome – Label of a bidder indicating whether or not it is a robot. Value 1.0 indicates a robot, where value 0.0 indicates human. 
    
    The outcome was half hand labeled, half stats-based. There are two types of "bots" with different levels of proof:

    1. Bidders who are identified as bots/fraudulent with clear proof. Their accounts were banned by the auction site.

    2. Bidder who may have just started their business/clicks or their stats exceed from system wide average. There are no clear proof that they are bots. 

bids.csv

* bid_id - unique id for this bid
* bidder_id – Unique identifier of a bidder (same as the bidder_id used in train.csv and test.csv)
* auction – Unique identifier of an auction
* merchandise –  The category of the auction site campaign, which means the bidder might come to this site by way of searching for "home goods" but ended up bidding for "sporting goods" - and that leads to this field being "home goods". This categorical field could be a search term, or online advertisement. 
* device – Phone model of a visitor
* time - Time that the bid is made (transformed to protect privacy).
* country - The country that the IP belongs to
* ip – IP address of a bidder (obfuscated to protect privacy).
* url - url where the bidder was referred from (obfuscated to protect privacy). 

### 평가

area under the ROC curve

### 제출형식

id, 확률 값 제출

bidder_id,prediction
<br>38d9e2e83f25229bd75bfcdc39d776bajysie,0.3
<br>9744d8ea513490911a671959c4a530d8mp2qm,0.0
<br>dda14384d59bf0b3cb883a7065311dac3toxe,0.9
<br>...
<br>etc

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline

In [2]:
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')
sampleSubmission = pd.read_csv('data/sampleSubmission.csv')
bids = pd.read_csv('data/bids.csv')

In [3]:
print(train.shape)
train.head()

(2013, 4)


Unnamed: 0,bidder_id,payment_account,address,outcome
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0


In [4]:
print(test.shape)
test.head()

(4700, 3)


Unnamed: 0,bidder_id,payment_account,address
0,49bb5a3c944b8fc337981cc7a9ccae41u31d7,a3d2de7675556553a5f08e4c88d2c228htx90,5d9fa1b71f992e7c7a106ce4b07a0a754le7c
1,a921612b85a1494456e74c09393ccb65ylp4y,a3d2de7675556553a5f08e4c88d2c228rs17i,a3d2de7675556553a5f08e4c88d2c228klidn
2,6b601e72a4d264dab9ace9d7b229b47479v6i,925381cce086b8cc9594eee1c77edf665zjpl,a3d2de7675556553a5f08e4c88d2c228aght0
3,eaf0ed0afc9689779417274b4791726cn5udi,a3d2de7675556553a5f08e4c88d2c228nclv5,b5714de1fd69d4a0d2e39d59e53fe9e15vwat
4,cdecd8d02ed8c6037e38042c7745f688mx5sf,a3d2de7675556553a5f08e4c88d2c228dtdkd,c3b363a3c3b838d58c85acf0fc9964cb4pnfa


In [5]:
print(bids.shape)
bids.head()

(7656334, 9)


Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3


## Concat train + test

In [6]:
all_data = pd.concat([train, test], sort=False)

In [7]:
print(all_data.shape)
all_data.head()

(6713, 4)


Unnamed: 0,bidder_id,payment_account,address,outcome
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0


## bidder_id count from bid table

In [8]:
bidder_id_cnt = bids.groupby('bidder_id')['bid_id'].count().reset_index().rename(columns={'bid_id' : 'count'})
print(bidder_id_cnt.shape)
bidder_id_cnt

(6614, 2)


Unnamed: 0,bidder_id,count
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,20
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,25075
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,22
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,664


In [9]:
merged = pd.merge(all_data, bidder_id_cnt, how='left')
print(merged.shape)
merged.head()

(6713, 5)


Unnamed: 0,bidder_id,payment_account,address,outcome,count
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0


## bidder_id auction nunique count from bid table

In [10]:
bidder_id_auction_nunique = bids.groupby('bidder_id')['auction'].nunique().reset_index().rename(columns={'auction': 'auction_nunique'})
bidder_id_auction_nunique.head()

Unnamed: 0,bidder_id,auction_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,13


In [11]:
merged = pd.merge(merged, bidder_id_auction_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 6)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0


In [12]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3


In [13]:
bids.groupby('bidder_id')['bid_id'].count().reset_index().rename(columns={'bid_id' : 'count'})

Unnamed: 0,bidder_id,count
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,20
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,25075
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,22
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,664


## device nunique

In [14]:
bidder_id_device_nunique = bids.groupby('bidder_id')['device'].nunique().reset_index().rename(columns={'device': 'device_nunique'})
bidder_id_device_nunique

Unnamed: 0,bidder_id,device_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,8
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,792
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,13
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,96


In [15]:
merged = pd.merge(merged, bidder_id_device_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 7)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0


## ip nunique

In [16]:
bidder_id_ip_nunique = bids.groupby('bidder_id')['ip'].nunique().reset_index().rename(columns={'ip': 'ip_nunique'})
bidder_id_ip_nunique

Unnamed: 0,bidder_id,ip_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,10
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,18726
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,18
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,37


In [17]:
merged = pd.merge(merged, bidder_id_ip_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 8)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0


## country nunique

In [18]:
bidder_id_country_nunique = bids.groupby('bidder_id')['country'].nunique().reset_index().rename(columns={'country': 'country_nunique'})
bidder_id_country_nunique

Unnamed: 0,bidder_id,country_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,1
4,00486a11dff552c4bd7696265724ff81yeo9v,1
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,102
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,6
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,1


In [19]:
merged = pd.merge(merged, bidder_id_country_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 9)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0


In [20]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3


## time nunique

In [21]:
bidder_id_time_nunique = bids.groupby('bidder_id')['time'].nunique().reset_index().rename(columns={'time': 'time_nunique'})
bidder_id_time_nunique

Unnamed: 0,bidder_id,time_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,20
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,23487
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,22
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,664


In [22]:
merged = pd.merge(merged, bidder_id_time_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 10)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0


## Url nunique

In [23]:
bidder_id_url_nunique = bids.groupby('bidder_id')['url'].nunique().reset_index().rename(columns={'url': 'url_nunique'})
bidder_id_url_nunique

Unnamed: 0,bidder_id,url_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,2
4,00486a11dff552c4bd7696265724ff81yeo9v,7
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,8039
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,12
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,144


In [24]:
merged = pd.merge(merged, bidder_id_url_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 11)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0


## Merchandise nuniqu BUT NOT USEFUL

In [25]:
bidder_id_merchandise_nunique = bids.groupby('bidder_id')['merchandise'].nunique().reset_index().rename(columns={'merchandise': 'merchandise_nunique'})
bidder_id_merchandise_nunique

Unnamed: 0,bidder_id,merchandise_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,1
4,00486a11dff552c4bd7696265724ff81yeo9v,1
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,1
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,1
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,1


In [26]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3


## time converting

In [27]:
bids['time_clean'] = pd.to_datetime(bids['time'])
bids['year'] = bids['time_clean'].dt.year
bids['month'] = bids['time_clean'].dt.month
bids['day'] = bids['time_clean'].dt.day
bids['hour'] = bids['time_clean'].dt.hour
bids['minute'] = bids['time_clean'].dt.minute
bids['second'] = bids['time_clean'].dt.second

## add second_sum

In [28]:
bidder_id_second_sum = bids.groupby('bidder_id')['second'].sum().reset_index().rename(columns={'second': 'second_sum'})
bidder_id_second_sum

Unnamed: 0,bidder_id,second_sum
0,001068c415025a009fee375a12cff4fcnht8y,25
1,002d229ffb247009810828f648afc2ef593rb,8
2,0030a2dd87ad2733e0873062e4f83954mkj86,33
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,117
4,00486a11dff552c4bd7696265724ff81yeo9v,606
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,741189
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,556
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,19
6612,ffd62646d600b759a985d45918bd6f0431vmz,19790


In [29]:
merged = pd.merge(merged, bidder_id_second_sum, how='left')
print(merged.shape)
merged.head()

(6713, 12)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0


In [30]:
# day: 3, hour: OK, minue: OK, second: OK

## add minute_sum

In [31]:
bidder_id_minute_sum = bids.groupby('bidder_id')['minute'].sum().reset_index().rename(columns={'minute': 'minute_sum'})
bidder_id_minute_sum

Unnamed: 0,bidder_id,minute_sum
0,001068c415025a009fee375a12cff4fcnht8y,12
1,002d229ffb247009810828f648afc2ef593rb,118
2,0030a2dd87ad2733e0873062e4f83954mkj86,42
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,88
4,00486a11dff552c4bd7696265724ff81yeo9v,625
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,713727
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,744
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,26
6612,ffd62646d600b759a985d45918bd6f0431vmz,18226


In [32]:
merged = pd.merge(merged, bidder_id_minute_sum, how='left')
print(merged.shape)
merged.head()

(6713, 13)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0


## add hour_sum

In [33]:
bidder_id_hour_sum = bids.groupby('bidder_id')['hour'].sum().reset_index().rename(columns={'hour': 'hour_sum'})
bidder_id_hour_sum

Unnamed: 0,bidder_id,hour_sum
0,001068c415025a009fee375a12cff4fcnht8y,8
1,002d229ffb247009810828f648afc2ef593rb,0
2,0030a2dd87ad2733e0873062e4f83954mkj86,7
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,27
4,00486a11dff552c4bd7696265724ff81yeo9v,224
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,213334
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,232
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,14
6612,ffd62646d600b759a985d45918bd6f0431vmz,4997


In [34]:
merged = pd.merge(merged, bidder_id_hour_sum, how='left')
print(merged.shape)
merged.head()

(6713, 14)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0


## add second_nunique

In [35]:
bidder_id_second_nunique = bids.groupby('bidder_id')['second'].nunique().reset_index().rename(columns={'second': 'second_nunique'})
bidder_id_second_nunique

Unnamed: 0,bidder_id,second_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,18
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,60
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,20
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,60


In [36]:
merged = pd.merge(merged, bidder_id_second_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 15)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0


## add minute_nunique

In [37]:
bidder_id_minute_nunique = bids.groupby('bidder_id')['minute'].nunique().reset_index().rename(columns={'minute': 'minute_nunique'})
bidder_id_minute_nunique

Unnamed: 0,bidder_id,minute_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,16
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,60
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,18
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,60


In [38]:
merged = pd.merge(merged, bidder_id_minute_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 16)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0,16.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0,45.0


## add hour_nunique

In [39]:
bidder_id_hour_nunique = bids.groupby('bidder_id')['hour'].nunique().reset_index().rename(columns={'hour': 'hour_nunique'})
bidder_id_hour_nunique

Unnamed: 0,bidder_id,hour_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,8
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,5
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,9
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,5


In [40]:
merged = pd.merge(merged, bidder_id_hour_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 17)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0,16.0,5.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0,3.0,2.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0,4.0,3.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0,45.0,5.0


## add day_nunique

In [41]:
bidder_id_day_nunique = bids.groupby('bidder_id')['day'].nunique().reset_index().rename(columns={'day': 'day_nunique'})
bidder_id_day_nunique

Unnamed: 0,bidder_id,day_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,2
4,00486a11dff552c4bd7696265724ff81yeo9v,2
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,2
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,2
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,2


In [42]:
merged = pd.merge(merged, bidder_id_day_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 18)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0,16.0,5.0,2.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0,3.0,2.0,1.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0,4.0,3.0,1.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0,45.0,5.0,2.0


In [43]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url,time_clean,year,month,day,hour,minute,second
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3


## nanosecond & microsecond

In [44]:
bids['nanosecond'] = bids['time_clean'].dt.nanosecond
bids['microsecond'] = bids['time_clean'].dt.microsecond

In [45]:
bidder_id_nanosecond_nunique = bids.groupby('bidder_id')['nanosecond'].nunique().reset_index().rename(columns={'nanosecond': 'nanosecond_nunique'})
bidder_id_nanosecond_nunique

Unnamed: 0,bidder_id,nanosecond_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,14
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,19
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,12
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,19


In [46]:
merged = pd.merge(merged, bidder_id_nanosecond_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 19)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0,16.0,5.0,2.0,15.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0,3.0,2.0,1.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0,4.0,3.0,1.0,3.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0,45.0,5.0,2.0,19.0


In [47]:
bidder_id_nanosecond_sum = bids.groupby('bidder_id')['nanosecond'].sum().reset_index().rename(columns={'nanosecond': 'nanosecond_sum'})
bidder_id_nanosecond_sum

Unnamed: 0,bidder_id,nanosecond_sum
0,001068c415025a009fee375a12cff4fcnht8y,578
1,002d229ffb247009810828f648afc2ef593rb,472
2,0030a2dd87ad2733e0873062e4f83954mkj86,421
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,1998
4,00486a11dff552c4bd7696265724ff81yeo9v,9623
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,11958388
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,9200
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,421
6612,ffd62646d600b759a985d45918bd6f0431vmz,316785


In [48]:
merged = pd.merge(merged, bidder_id_nanosecond_sum, how='left')
print(merged.shape)
merged.head()

(6713, 20)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,1.0,636.0,680.0,152.0,21.0,16.0,5.0,2.0,15.0,9619.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,2.0,77.0,62.0,2.0,3.0,3.0,2.0,1.0,3.0,1682.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,2.0,117.0,111.0,5.0,4.0,4.0,3.0,1.0,3.0,1103.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,8.0,1.0,1.0,1.0,1.0,1.0,315.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,91.0,4613.0,5527.0,2884.0,58.0,45.0,5.0,2.0,19.0,71029.0


In [49]:
bidder_id_nanosecond_mean = bids.groupby('bidder_id')['nanosecond'].mean().reset_index().rename(columns={'nanosecond': 'nanosecond_mean'})
bidder_id_nanosecond_mean

Unnamed: 0,bidder_id,nanosecond_mean
0,001068c415025a009fee375a12cff4fcnht8y,578.000000
1,002d229ffb247009810828f648afc2ef593rb,236.000000
2,0030a2dd87ad2733e0873062e4f83954mkj86,421.000000
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,666.000000
4,00486a11dff552c4bd7696265724ff81yeo9v,481.150000
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,476.904806
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,418.181818
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,421.000000
6612,ffd62646d600b759a985d45918bd6f0431vmz,477.085843


In [50]:
merged = pd.merge(merged, bidder_id_nanosecond_mean, how='left')
print(merged.shape)
merged.head()

(6713, 21)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,second_sum,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,636.0,680.0,152.0,21.0,16.0,5.0,2.0,15.0,9619.0,400.791667
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,77.0,62.0,2.0,3.0,3.0,2.0,1.0,3.0,1682.0,560.666667
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,117.0,111.0,5.0,4.0,4.0,3.0,1.0,3.0,1103.0,275.75
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,2.0,14.0,8.0,1.0,1.0,1.0,1.0,1.0,315.0,315.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,4613.0,5527.0,2884.0,58.0,45.0,5.0,2.0,19.0,71029.0,458.251613


In [51]:
bidder_id_microsecond_sum = bids.groupby('bidder_id')['microsecond'].sum().reset_index().rename(columns={'microsecond': 'microsecond_sum'})
bidder_id_microsecond_sum

Unnamed: 0,bidder_id,microsecond_sum
0,001068c415025a009fee375a12cff4fcnht8y,52631
1,002d229ffb247009810828f648afc2ef593rb,315789
2,0030a2dd87ad2733e0873062e4f83954mkj86,947368
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,999998
4,00486a11dff552c4bd7696265724ff81yeo9v,9421043
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,11939830135
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,10473675
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,947368
6612,ffd62646d600b759a985d45918bd6f0431vmz,303736525


In [52]:
merged = pd.merge(merged, bidder_id_microsecond_sum, how='left')
print(merged.shape)
merged.head()

(6713, 22)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,minute_sum,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,680.0,152.0,21.0,16.0,5.0,2.0,15.0,9619.0,400.791667,10421043.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,62.0,2.0,3.0,3.0,2.0,1.0,3.0,1682.0,560.666667,789472.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,111.0,5.0,4.0,4.0,3.0,1.0,3.0,1103.0,275.75,1736841.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,14.0,8.0,1.0,1.0,1.0,1.0,1.0,315.0,315.0,210526.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,5527.0,2884.0,58.0,45.0,5.0,2.0,19.0,71029.0,458.251613,70736771.0


In [53]:
bidder_id_microsecond_mean = bids.groupby('bidder_id')['microsecond'].mean().reset_index().rename(columns={'microsecond': 'microsecond_mean'})
bidder_id_microsecond_mean

Unnamed: 0,bidder_id,microsecond_mean
0,001068c415025a009fee375a12cff4fcnht8y,52631.000000
1,002d229ffb247009810828f648afc2ef593rb,157894.500000
2,0030a2dd87ad2733e0873062e4f83954mkj86,947368.000000
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,333332.666667
4,00486a11dff552c4bd7696265724ff81yeo9v,471052.150000
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,476164.711266
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,476076.136364
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,947368.000000
6612,ffd62646d600b759a985d45918bd6f0431vmz,457434.525602


In [54]:
merged = pd.merge(merged, bidder_id_microsecond_mean, how='left')
print(merged.shape)
merged.head()

(6713, 23)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,hour_sum,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,152.0,21.0,16.0,5.0,2.0,15.0,9619.0,400.791667,10421043.0,434210.125
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,2.0,3.0,3.0,2.0,1.0,3.0,1682.0,560.666667,789472.0,263157.333333
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,5.0,4.0,4.0,3.0,1.0,3.0,1103.0,275.75,1736841.0,434210.25
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,8.0,1.0,1.0,1.0,1.0,1.0,315.0,315.0,210526.0,210526.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,2884.0,58.0,45.0,5.0,2.0,19.0,71029.0,458.251613,70736771.0,456366.264516


In [55]:
bidder_id_nanosecond_nunique = bids.groupby('bidder_id')['microsecond'].nunique().reset_index().rename(columns={'microsecond': 'microsecond_nunique'})
bidder_id_nanosecond_nunique

Unnamed: 0,bidder_id,microsecond_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,2
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,14
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,19
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,12
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,19


In [56]:
merged = pd.merge(merged, bidder_id_nanosecond_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 24)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,second_nunique,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,21.0,16.0,5.0,2.0,15.0,9619.0,400.791667,10421043.0,434210.125,15.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,3.0,3.0,2.0,1.0,3.0,1682.0,560.666667,789472.0,263157.333333,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,4.0,4.0,3.0,1.0,3.0,1103.0,275.75,1736841.0,434210.25,3.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,315.0,315.0,210526.0,210526.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,58.0,45.0,5.0,2.0,19.0,71029.0,458.251613,70736771.0,456366.264516,19.0


In [57]:
bidder_id_second_mean = bids.groupby('bidder_id')['second'].mean().reset_index().rename(columns={'second': 'second_mean'})
bidder_id_second_mean

Unnamed: 0,bidder_id,second_mean
0,001068c415025a009fee375a12cff4fcnht8y,25.000000
1,002d229ffb247009810828f648afc2ef593rb,4.000000
2,0030a2dd87ad2733e0873062e4f83954mkj86,33.000000
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,39.000000
4,00486a11dff552c4bd7696265724ff81yeo9v,30.300000
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,29.558883
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,25.272727
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,19.000000
6612,ffd62646d600b759a985d45918bd6f0431vmz,29.804217


In [58]:
merged = pd.merge(merged, bidder_id_second_mean, how='left')
print(merged.shape)
merged.head()

(6713, 25)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,minute_nunique,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,16.0,5.0,2.0,15.0,9619.0,400.791667,10421043.0,434210.125,15.0,26.5
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,3.0,2.0,1.0,3.0,1682.0,560.666667,789472.0,263157.333333,3.0,25.666667
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,4.0,3.0,1.0,3.0,1103.0,275.75,1736841.0,434210.25,3.0,29.25
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,315.0,315.0,210526.0,210526.0,1.0,2.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,45.0,5.0,2.0,19.0,71029.0,458.251613,70736771.0,456366.264516,19.0,29.76129


In [59]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url,time_clean,year,month,day,hour,minute,second,nanosecond,microsecond
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3,736,157894
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3,736,157894
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3,736,157894
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3,736,157894
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,1970,4,23,22,54,3,736,157894


In [60]:
bids['ip_split'] = bids['ip'].str.split('.')

In [61]:
bids['ip_split'].head()

0     [69, 166, 231, 58]
1     [50, 201, 125, 84]
2    [112, 54, 208, 157]
3     [18, 99, 175, 133]
4      [145, 138, 5, 37]
Name: ip_split, dtype: object

In [62]:
ip_df = pd.DataFrame(bids['ip_split'].values.tolist(), columns=['ip_first', 'ip_second', 'ip_third', 'ip_fourth'])
ip_df

Unnamed: 0,ip_first,ip_second,ip_third,ip_fourth
0,69,166,231,58
1,50,201,125,84
2,112,54,208,157
3,18,99,175,133
4,145,138,5,37
...,...,...,...,...
7656329,140,204,227,63
7656330,24,232,159,118
7656331,80,237,28,246
7656332,91,162,27,152


In [63]:
len(ip_df['ip_fourth'].value_counts())

256

In [64]:
bids = pd.concat([bids, ip_df], axis=1, sort=False)

In [65]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url,time_clean,...,hour,minute,second,nanosecond,microsecond,ip_split,ip_first,ip_second,ip_third,ip_fourth
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[69, 166, 231, 58]",69,166,231,58
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[50, 201, 125, 84]",50,201,125,84
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[112, 54, 208, 157]",112,54,208,157
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[18, 99, 175, 133]",18,99,175,133
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[145, 138, 5, 37]",145,138,5,37


## splitted ip nunique

In [66]:
bidder_id_ip_first_nunique = bids.groupby('bidder_id')['ip_first'].nunique().reset_index().rename(columns={'ip_first': 'ip_first_nunique'})
bidder_id_ip_first_nunique

Unnamed: 0,bidder_id,ip_first_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,10
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,256
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,18
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,36


In [67]:
merged = pd.merge(merged, bidder_id_ip_first_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 26)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,hour_nunique,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,5.0,2.0,15.0,9619.0,400.791667,10421043.0,434210.125,15.0,26.5,19.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,2.0,1.0,3.0,1682.0,560.666667,789472.0,263157.333333,3.0,25.666667,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,3.0,1.0,3.0,1103.0,275.75,1736841.0,434210.25,3.0,29.25,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,315.0,315.0,210526.0,210526.0,1.0,2.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,5.0,2.0,19.0,71029.0,458.251613,70736771.0,456366.264516,19.0,29.76129,93.0


In [68]:
bidder_id_ip_second_nunique = bids.groupby('bidder_id')['ip_second'].nunique().reset_index().rename(columns={'ip_second': 'ip_second_nunique'})
bidder_id_ip_second_nunique

Unnamed: 0,bidder_id,ip_second_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,10
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,256
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,18
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,35


In [69]:
merged = pd.merge(merged, bidder_id_ip_second_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 27)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,day_nunique,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,2.0,15.0,9619.0,400.791667,10421043.0,434210.125,15.0,26.5,19.0,20.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,1.0,3.0,1682.0,560.666667,789472.0,263157.333333,3.0,25.666667,3.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,1.0,3.0,1103.0,275.75,1736841.0,434210.25,3.0,29.25,4.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,315.0,315.0,210526.0,210526.0,1.0,2.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,2.0,19.0,71029.0,458.251613,70736771.0,456366.264516,19.0,29.76129,93.0,101.0


In [70]:
bidder_id_ip_third_nunique = bids.groupby('bidder_id')['ip_third'].nunique().reset_index().rename(columns={'ip_third': 'ip_third_nunique'})
bidder_id_ip_third_nunique

Unnamed: 0,bidder_id,ip_third_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,9
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,256
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,18
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,37


In [71]:
merged = pd.merge(merged, bidder_id_ip_third_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 28)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,nanosecond_nunique,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,15.0,9619.0,400.791667,10421043.0,434210.125,15.0,26.5,19.0,20.0,20.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,3.0,1682.0,560.666667,789472.0,263157.333333,3.0,25.666667,3.0,3.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,3.0,1103.0,275.75,1736841.0,434210.25,3.0,29.25,4.0,4.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,315.0,315.0,210526.0,210526.0,1.0,2.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,19.0,71029.0,458.251613,70736771.0,456366.264516,19.0,29.76129,93.0,101.0,99.0


In [72]:
bidder_id_ip_fourth_nunique = bids.groupby('bidder_id')['ip_fourth'].nunique().reset_index().rename(columns={'ip_fourth': 'ip_fourth_nunique'})
bidder_id_ip_fourth_nunique

Unnamed: 0,bidder_id,ip_fourth_nunique
0,001068c415025a009fee375a12cff4fcnht8y,1
1,002d229ffb247009810828f648afc2ef593rb,1
2,0030a2dd87ad2733e0873062e4f83954mkj86,1
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,3
4,00486a11dff552c4bd7696265724ff81yeo9v,10
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,256
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,17
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,1
6612,ffd62646d600b759a985d45918bd6f0431vmz,35


In [73]:
merged = pd.merge(merged, bidder_id_ip_fourth_nunique, how='left')
print(merged.shape)
merged.head()

(6713, 29)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,nanosecond_sum,nanosecond_mean,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,9619.0,400.791667,10421043.0,434210.125,15.0,26.5,19.0,20.0,20.0,19.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,1682.0,560.666667,789472.0,263157.333333,3.0,25.666667,3.0,3.0,3.0,3.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,1103.0,275.75,1736841.0,434210.25,3.0,29.25,4.0,4.0,4.0,4.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,315.0,315.0,210526.0,210526.0,1.0,2.0,1.0,1.0,1.0,1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,71029.0,458.251613,70736771.0,456366.264516,19.0,29.76129,93.0,101.0,99.0,97.0


In [74]:
merged['ip_nunique_mean'] = merged[['ip_first_nunique', 'ip_second_nunique', 'ip_third_nunique', 'ip_fourth_nunique']].mean(axis=1)

In [75]:
bids

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url,time_clean,...,hour,minute,second,nanosecond,microsecond,ip_split,ip_first,ip_second,ip_third,ip_fourth
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[69, 166, 231, 58]",69,166,231,58
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[50, 201, 125, 84]",50,201,125,84
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[112, 54, 208, 157]",112,54,208,157
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[18, 99, 175, 133]",18,99,175,133
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,22,54,3,736,157894,"[145, 138, 5, 37]",145,138,5,37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7656329,7656329,626159dd6f2228ede002d9f9340f75b7puk8d,3e64w,jewelry,phone91,9709222052631578,ru,140.204.227.63,cghhmomsaxi6pug,1970-04-23 09:00:22.052631578,...,9,0,22,578,52631,"[140, 204, 227, 63]",140,204,227,63
7656330,7656330,a318ea333ceee1ba39a494476386136a826dv,xn0y0,mobile,phone236,9709222052631578,pl,24.232.159.118,wgggpdg2gx5pesn,1970-04-23 09:00:22.052631578,...,9,0,22,578,52631,"[24, 232, 159, 118]",24,232,159,118
7656331,7656331,f5b2bbad20d1d7ded3ed960393bec0f40u6hn,gja6c,sporting goods,phone80,9709222052631578,za,80.237.28.246,5xgysg14grlersa,1970-04-23 09:00:22.052631578,...,9,0,22,578,52631,"[80, 237, 28, 246]",80,237,28,246
7656332,7656332,d4bd412590f5106b9d887a43c51b254eldo4f,hmwk8,jewelry,phone349,9709222052631578,my,91.162.27.152,bhtrek44bzi2wfl,1970-04-23 09:00:22.052631578,...,9,0,22,578,52631,"[91, 162, 27, 152]",91,162,27,152


In [76]:
bids.sort_values(['time']).groupby(by='bidder_id')['time'].diff()

2351187            NaN
2351202            NaN
2351201            NaN
2351200            NaN
2351199            NaN
              ...     
2351182    263157894.0
2351183    421052631.0
2351184            0.0
2351185    526315789.0
2351186            0.0
Name: time, Length: 7656334, dtype: float64

In [77]:
bids['time_diff'] = bids.groupby('bidder_id')['time'].diff().fillna(0)

In [78]:
bidder_id_time_diff_mean = bids.groupby('bidder_id')['time_diff'].mean().reset_index().rename(columns={'time_diff': 'time_diff_mean'})
bidder_id_time_diff_mean

Unnamed: 0,bidder_id,time_diff_mean
0,001068c415025a009fee375a12cff4fcnht8y,0.000000e+00
1,002d229ffb247009810828f648afc2ef593rb,5.263158e+07
2,0030a2dd87ad2733e0873062e4f83954mkj86,0.000000e+00
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,2.198523e+13
4,00486a11dff552c4bd7696265724ff81yeo9v,3.817492e+12
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,5.439912e+08
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,3.432876e+12
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,0.000000e+00
6612,ffd62646d600b759a985d45918bd6f0431vmz,2.051855e+10


In [79]:
merged = pd.merge(merged, bidder_id_time_diff_mean, how='left')
print(merged.shape)
merged.head()

(6713, 31)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,microsecond_sum,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique,ip_nunique_mean,time_diff_mean
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,10421043.0,434210.125,15.0,26.5,19.0,20.0,20.0,19.0,19.5,547315800000.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,789472.0,263157.333333,3.0,25.666667,3.0,3.0,3.0,3.0,3.0,2155719000000.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,1736841.0,434210.25,3.0,29.25,4.0,4.0,4.0,4.0,4.0,1784250000000.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,210526.0,210526.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,70736771.0,456366.264516,19.0,29.76129,93.0,101.0,99.0,97.0,97.5,77277080000.0


In [80]:
bidder_id_time_diff_max = bids.groupby('bidder_id')['time_diff'].max().reset_index().rename(columns={'time_diff': 'time_diff_max'})
bidder_id_time_diff_max

Unnamed: 0,bidder_id,time_diff_max
0,001068c415025a009fee375a12cff4fcnht8y,0.000000e+00
1,002d229ffb247009810828f648afc2ef593rb,1.052632e+08
2,0030a2dd87ad2733e0873062e4f83954mkj86,0.000000e+00
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,6.058642e+13
4,00486a11dff552c4bd7696265724ff81yeo9v,5.094174e+13
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,8.842105e+09
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,5.082974e+13
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,0.000000e+00
6612,ffd62646d600b759a985d45918bd6f0431vmz,2.590000e+11


In [81]:
merged = pd.merge(merged, bidder_id_time_diff_max, how='left')
print(merged.shape)
merged.head()

(6713, 32)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,microsecond_mean,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique,ip_nunique_mean,time_diff_mean,time_diff_max
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,434210.125,15.0,26.5,19.0,20.0,20.0,19.0,19.5,547315800000.0,3167632000000.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,263157.333333,3.0,25.666667,3.0,3.0,3.0,3.0,3.0,2155719000000.0,4477842000000.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,434210.25,3.0,29.25,4.0,4.0,4.0,4.0,4.0,1784250000000.0,3154105000000.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,210526.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,456366.264516,19.0,29.76129,93.0,101.0,99.0,97.0,97.5,77277080000.0,1619211000000.0


In [82]:
bidder_id_time_diff_max['time_diff_max'].value_counts()

0.000000e+00    1058
5.002195e+13       6
1.052632e+08       4
5.002600e+13       4
5.002132e+13       4
                ... 
5.005758e+13       1
1.009853e+13       1
2.428000e+12       1
1.952737e+12       1
5.286705e+13       1
Name: time_diff_max, Length: 5380, dtype: int64

In [83]:
bids.groupby('bidder_id')['second'].diff()

0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
          ... 
7656329    1.0
7656330    4.0
7656331    0.0
7656332    1.0
7656333    1.0
Name: second, Length: 7656334, dtype: float64

In [84]:
bids['time_second_diff'] = bids.groupby('bidder_id')['second'].diff().fillna(-1)

In [85]:
diff = bids.groupby('bidder_id')['time_second_diff'].max().reset_index().rename(columns={'time_second_diff': 'time_second_diff_max'})
diff

Unnamed: 0,bidder_id,time_second_diff_max
0,001068c415025a009fee375a12cff4fcnht8y,-1.0
1,002d229ffb247009810828f648afc2ef593rb,0.0
2,0030a2dd87ad2733e0873062e4f83954mkj86,-1.0
3,003180b29c6a5f8f1d84a6b7b6f7be57tjj1o,-1.0
4,00486a11dff552c4bd7696265724ff81yeo9v,41.0
...,...,...
6609,ffbc0fdfbf19a8a9116b68714138f2902cc13,8.0
6610,ffc4e2dd2cc08249f299cab46ecbfacfobmr3,41.0
6611,ffd29eb307a4c54610dd2d3d212bf3bagmmpl,-1.0
6612,ffd62646d600b759a985d45918bd6f0431vmz,52.0


In [86]:
merged = pd.merge(merged, diff, how='left')
print(merged.shape)
merged.head()

(6713, 33)


Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique,ip_nunique_mean,time_diff_mean,time_diff_max,time_second_diff_max
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,15.0,26.5,19.0,20.0,20.0,19.0,19.5,547315800000.0,3167632000000.0,52.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,3.0,25.666667,3.0,3.0,3.0,3.0,3.0,2155719000000.0,4477842000000.0,38.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,3.0,29.25,4.0,4.0,4.0,4.0,4.0,1784250000000.0,3154105000000.0,12.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,-1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,19.0,29.76129,93.0,101.0,99.0,97.0,97.5,77277080000.0,1619211000000.0,53.0


In [87]:
bids.head()

Unnamed: 0,bid_id,bidder_id,auction,merchandise,device,time,country,ip,url,time_clean,...,second,nanosecond,microsecond,ip_split,ip_first,ip_second,ip_third,ip_fourth,time_diff,time_second_diff
0,0,8dac2b259fd1c6d1120e519fb1ac14fbqvax8,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,3,736,157894,"[69, 166, 231, 58]",69,166,231,58,0.0,-1.0
1,1,668d393e858e8126275433046bbd35c6tywop,aeqok,furniture,phone1,9759243157894736,in,50.201.125.84,jmqlhflrzwuay9c,1970-04-23 22:54:03.157894736,...,3,736,157894,"[50, 201, 125, 84]",50,201,125,84,0.0,-1.0
2,2,aa5f360084278b35d746fa6af3a7a1a5ra3xe,wa00e,home goods,phone2,9759243157894736,py,112.54.208.157,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,3,736,157894,"[112, 54, 208, 157]",112,54,208,157,0.0,-1.0
3,3,3939ac3ef7d472a59a9c5f893dd3e39fh9ofi,jefix,jewelry,phone4,9759243157894736,in,18.99.175.133,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,3,736,157894,"[18, 99, 175, 133]",18,99,175,133,0.0,-1.0
4,4,8393c48eaf4b8fa96886edc7cf27b372dsibi,jefix,jewelry,phone5,9759243157894736,in,145.138.5.37,vasstdc27m7nks3,1970-04-23 22:54:03.157894736,...,3,736,157894,"[145, 138, 5, 37]",145,138,5,37,0.0,-1.0


In [88]:
merged.head()

Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,microsecond_nunique,second_mean,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique,ip_nunique_mean,time_diff_mean,time_diff_max,time_second_diff_max
0,91a3c57b13234af24875c56fb7e2b2f4rb56a,a3d2de7675556553a5f08e4c88d2c228754av,a3d2de7675556553a5f08e4c88d2c228vt0u4,0.0,24.0,18.0,14.0,20.0,6.0,24.0,...,15.0,26.5,19.0,20.0,20.0,19.0,19.5,547315800000.0,3167632000000.0,52.0
1,624f258b49e77713fc34034560f93fb3hu3jo,a3d2de7675556553a5f08e4c88d2c228v1sga,ae87054e5a97a8f840a3991d12611fdcrfbq3,0.0,3.0,1.0,2.0,3.0,1.0,3.0,...,3.0,25.666667,3.0,3.0,3.0,3.0,3.0,2155719000000.0,4477842000000.0,38.0
2,1c5f4fc669099bfbfac515cd26997bd12ruaj,a3d2de7675556553a5f08e4c88d2c2280cybl,92520288b50f03907041887884ba49c0cl0pd,0.0,4.0,4.0,2.0,4.0,1.0,4.0,...,3.0,29.25,4.0,4.0,4.0,4.0,4.0,1784250000000.0,3154105000000.0,12.0
3,4bee9aba2abda51bf43d639013d6efe12iycd,51d80e233f7b6a7dfdee484a3c120f3b2ita8,4cb9717c8ad7e88a9a284989dd79b98dbevyi,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,-1.0
4,4ab12bc61c82ddd9c2d65e60555808acqgos1,a3d2de7675556553a5f08e4c88d2c22857ddh,2a96c3ce94b3be921e0296097b88b56a7x1ji,0.0,155.0,23.0,53.0,123.0,2.0,155.0,...,19.0,29.76129,93.0,101.0,99.0,97.0,97.5,77277080000.0,1619211000000.0,53.0


In [89]:
merged.shape

(6713, 33)

## Fill Na Values

In [90]:
merged['count'].fillna(-1, inplace=True)

In [91]:
merged['auction_nunique'].fillna(-1, inplace=True)

In [92]:
merged['device_nunique'].fillna(-1, inplace=True)

In [93]:
merged['ip_nunique'].fillna(-1, inplace=True)

In [94]:
merged['country_nunique'].fillna(-1, inplace=True)

In [95]:
merged['time_nunique'].fillna(-1, inplace=True)

In [96]:
merged['url_nunique'].fillna(-1, inplace=True)

In [97]:
merged['second_nunique'].fillna(-1, inplace=True)

In [98]:
merged['second_sum'].fillna(-1, inplace=True)

In [99]:
merged['minute_nunique'].fillna(-1, inplace=True)

In [100]:
merged['minute_sum'].fillna(-1, inplace=True)

In [101]:
merged['hour_nunique'].fillna(-1, inplace=True)

In [102]:
merged['hour_sum'].fillna(-1, inplace=True)

In [103]:
merged['day_nunique'].fillna(-1, inplace=True)

In [104]:
merged['second_mean'].fillna(-1, inplace=True)

In [105]:
merged['nanosecond_nunique'].fillna(-1, inplace=True)

In [106]:
merged['nanosecond_sum'].fillna(-1, inplace=True)

In [107]:
merged['nanosecond_mean'].fillna(-1, inplace=True)

In [108]:
merged['microsecond_nunique'].fillna(-1, inplace=True)

In [109]:
merged['microsecond_sum'].fillna(-1, inplace=True)

In [110]:
merged['microsecond_mean'].fillna(-1, inplace=True)

In [111]:
merged['ip_first_nunique'].fillna(-1, inplace=True)

In [112]:
merged['ip_second_nunique'].fillna(-1, inplace=True)

In [113]:
merged['ip_third_nunique'].fillna(-1, inplace=True)

In [114]:
merged['ip_fourth_nunique'].fillna(-1, inplace=True)

In [115]:
merged['ip_nunique_mean'].fillna(-1, inplace=True)

In [116]:
merged['time_diff_mean'].fillna(-1, inplace=True)

In [117]:
merged['time_diff_max'].fillna(-1, inplace=True)

In [118]:
merged['time_second_diff_max'].fillna(-1, inplace=True)

## Split train & test data

In [119]:
train = merged[merged['outcome'].notnull()]

In [120]:
test = merged[merged['outcome'].isnull()]

In [121]:
train.shape, test.shape

((2013, 33), (4700, 33))

## Decode ids

In [122]:
train['payment_code'] = train['payment_account'].str[:5].astype('category').cat.codes
train['address_code'] = train['address'].str[:5].astype('category').cat.codes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [123]:
test['payment_code'] = test['payment_account'].str[:5].astype('category').cat.codes
test['address_code'] = test['address'].str[:5].astype('category').cat.codes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


## Make Predictions

In [124]:
feature = ['payment_code', 
           'address_code', 
           'count', 
           'auction_nunique', 
           'device_nunique', 
           'ip_nunique', 
           'country_nunique', 
           'time_nunique', 
           'url_nunique', 
           'second_nunique', 
           'second_sum', 
           'minute_sum', 
           'minute_nunique', 
           'hour_sum', 
           'hour_nunique',
           'day_nunique',
           'nanosecond_nunique',
#            'nanosecond_sum',
#            'nanosecond_mean',
           'microsecond_nunique',
#            'microsecond_sum',
#            'microsecond_mean',
           'ip_first_nunique',
#            'ip_second_nunique',
#            'ip_third_nunique',
#            'ip_fourth_nunique',
#            'ip_nunique_mean',
           'time_diff_mean',
#            'time_diff_max'
#            'time_second_diff_max',
          ]
label = ['outcome']

In [125]:
from sklearn.ensemble import RandomForestClassifier

In [126]:
# from sklearn.model_selection import train_test_split
# from sklearn.model_selection import cross_val_score
# from sklearn.model_selection import GridSearchCV
# from sklearn.metrics import accuracy_score

# # Create the parameter grid based on the results of random search 
# param_grid = {
#     'bootstrap': [True, False],
#     'warm_start': [True, False],
#     'max_depth': [None, 50, 200],
#     'max_features': ['auto', 'sqrt', 'log2', None],
#     'min_samples_leaf': [1, 3, 5, 7, 9, 11],
#     'min_samples_split': [6, 8, 10, 12, 14],
#     'criterion': ['gini', 'entropy'],
#     'n_estimators':[200],
# }

# # Create a based model
# rf = RandomForestClassifier(random_state=30)

# # Instantiate the grid search model
# grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, 
#                           cv=5, n_jobs=-1, verbose=2)

In [127]:
# Fit the grid search to the data
# grid_search.fit(x_train, y_train)

In [128]:
# grid_search.best_params_

In [129]:
# best_grid = grid_search.best_estimator_

In [130]:
rfc = RandomForestClassifier(n_estimators=5000,
                             random_state=30, 
                             n_jobs=-1,
                             criterion='entropy',
                             max_features='auto',
                             min_samples_leaf=3,
                             min_samples_split=10,
                             bootstrap=False,
                            )
rfc

RandomForestClassifier(bootstrap=False, class_weight=None, criterion='entropy',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=3, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, n_estimators=5000,
                       n_jobs=-1, oob_score=False, random_state=30, verbose=0,
                       warm_start=False)

In [131]:
print(train.isnull().sum())
print(test.isnull().sum())

bidder_id               0
payment_account         0
address                 0
outcome                 0
count                   0
auction_nunique         0
device_nunique          0
ip_nunique              0
country_nunique         0
time_nunique            0
url_nunique             0
second_sum              0
minute_sum              0
hour_sum                0
second_nunique          0
minute_nunique          0
hour_nunique            0
day_nunique             0
nanosecond_nunique      0
nanosecond_sum          0
nanosecond_mean         0
microsecond_sum         0
microsecond_mean        0
microsecond_nunique     0
second_mean             0
ip_first_nunique        0
ip_second_nunique       0
ip_third_nunique        0
ip_fourth_nunique       0
ip_nunique_mean         0
time_diff_mean          0
time_diff_max           0
time_second_diff_max    0
payment_code            0
address_code            0
dtype: int64
bidder_id                  0
payment_account            0
address            

In [132]:
x_train = train[feature]
y_train = train[label]
x_test = test[feature]

In [133]:
x_train.head()

Unnamed: 0,payment_code,address_code,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_nunique,second_sum,minute_sum,minute_nunique,hour_sum,hour_nunique,day_nunique,nanosecond_nunique,microsecond_nunique,ip_first_nunique,time_diff_mean
0,542,901,24.0,18.0,14.0,20.0,6.0,24.0,1.0,21.0,636.0,680.0,16.0,152.0,5.0,2.0,15.0,15.0,19.0,547315800000.0
1,542,968,3.0,1.0,2.0,3.0,1.0,3.0,2.0,3.0,77.0,62.0,3.0,2.0,2.0,1.0,3.0,3.0,3.0,2155719000000.0
2,542,806,4.0,4.0,2.0,4.0,1.0,4.0,2.0,4.0,117.0,111.0,4.0,5.0,3.0,1.0,3.0,3.0,4.0,1784250000000.0
3,273,421,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,14.0,1.0,8.0,1.0,1.0,1.0,1.0,1.0,0.0
4,542,224,155.0,23.0,53.0,123.0,2.0,155.0,91.0,58.0,4613.0,5527.0,45.0,2884.0,5.0,2.0,19.0,19.0,93.0,77277080000.0


In [134]:
y_train.head()

Unnamed: 0,outcome
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0


In [135]:
x_test.head()

Unnamed: 0,payment_code,address_code,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,url_nunique,second_nunique,second_sum,minute_sum,minute_nunique,hour_sum,hour_nunique,day_nunique,nanosecond_nunique,microsecond_nunique,ip_first_nunique,time_diff_mean
2013,1202,1050,4.0,3.0,2.0,4.0,3.0,4.0,3.0,4.0,107.0,108.0,4.0,37.0,4.0,2.0,4.0,4.0,4.0,17555920000000.0
2014,1202,1906,3.0,2.0,3.0,2.0,2.0,3.0,1.0,2.0,78.0,126.0,2.0,30.0,2.0,2.0,3.0,3.0,2.0,25334020000000.0
2015,1064,1906,17.0,14.0,4.0,4.0,3.0,17.0,2.0,14.0,418.0,883.0,3.0,119.0,1.0,1.0,13.0,13.0,3.0,17120740000.0
2016,1202,2119,148.0,90.0,81.0,129.0,14.0,148.0,80.0,56.0,3760.0,4838.0,54.0,1344.0,9.0,2.0,19.0,19.0,97.0,517038100000.0
2017,1202,2297,23.0,20.0,17.0,17.0,2.0,23.0,1.0,20.0,850.0,383.0,7.0,302.0,3.0,1.0,14.0,14.0,17.0,285860400000.0


In [136]:
rfc.fit(x_train, y_train)

  """Entry point for launching an IPython kernel.


RandomForestClassifier(bootstrap=False, class_weight=None, criterion='entropy',
                       max_depth=None, max_features='auto', max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=3, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, n_estimators=5000,
                       n_jobs=-1, oob_score=False, random_state=30, verbose=0,
                       warm_start=False)

In [137]:
pred = rfc.predict_proba(x_test)

In [138]:
list(zip(feature, rfc.feature_importances_))

[('payment_code', 0.025907685615606194),
 ('address_code', 0.03882587326263935),
 ('count', 0.11076978018498133),
 ('auction_nunique', 0.03387851820376274),
 ('device_nunique', 0.06583397210093193),
 ('ip_nunique', 0.04099305903608344),
 ('country_nunique', 0.03259439392054428),
 ('time_nunique', 0.09499537125568551),
 ('url_nunique', 0.044242468222729266),
 ('second_nunique', 0.03521842200015284),
 ('second_sum', 0.09653120819022519),
 ('minute_sum', 0.07564851590515001),
 ('minute_nunique', 0.03347007585361921),
 ('hour_sum', 0.06494374039393476),
 ('hour_nunique', 0.03299295115598182),
 ('day_nunique', 0.032660956513180864),
 ('nanosecond_nunique', 0.02031097684345681),
 ('microsecond_nunique', 0.019420784988640395),
 ('ip_first_nunique', 0.0318728691017721),
 ('time_diff_mean', 0.06888837725092202)]

In [139]:
prediction = pred[:,1]

In [140]:
prediction.shape

(4700,)

In [141]:
test.head()

Unnamed: 0,bidder_id,payment_account,address,outcome,count,auction_nunique,device_nunique,ip_nunique,country_nunique,time_nunique,...,ip_first_nunique,ip_second_nunique,ip_third_nunique,ip_fourth_nunique,ip_nunique_mean,time_diff_mean,time_diff_max,time_second_diff_max,payment_code,address_code
2013,49bb5a3c944b8fc337981cc7a9ccae41u31d7,a3d2de7675556553a5f08e4c88d2c228htx90,5d9fa1b71f992e7c7a106ce4b07a0a754le7c,,4.0,3.0,2.0,4.0,3.0,4.0,...,4.0,4.0,4.0,4.0,4.0,17555920000000.0,59897210000000.0,21.0,1202,1050
2014,a921612b85a1494456e74c09393ccb65ylp4y,a3d2de7675556553a5f08e4c88d2c228rs17i,a3d2de7675556553a5f08e4c88d2c228klidn,,3.0,2.0,3.0,2.0,2.0,3.0,...,2.0,2.0,2.0,2.0,2.0,25334020000000.0,76001950000000.0,42.0,1202,1906
2015,6b601e72a4d264dab9ace9d7b229b47479v6i,925381cce086b8cc9594eee1c77edf665zjpl,a3d2de7675556553a5f08e4c88d2c228aght0,,17.0,14.0,4.0,4.0,3.0,17.0,...,3.0,4.0,4.0,4.0,3.75,17120740000.0,248578900000.0,8.0,1064,1906
2016,eaf0ed0afc9689779417274b4791726cn5udi,a3d2de7675556553a5f08e4c88d2c228nclv5,b5714de1fd69d4a0d2e39d59e53fe9e15vwat,,148.0,90.0,81.0,129.0,14.0,148.0,...,97.0,95.0,100.0,108.0,100.0,517038100000.0,50159470000000.0,50.0,1202,2119
2017,cdecd8d02ed8c6037e38042c7745f688mx5sf,a3d2de7675556553a5f08e4c88d2c228dtdkd,c3b363a3c3b838d58c85acf0fc9964cb4pnfa,,23.0,20.0,17.0,17.0,2.0,23.0,...,17.0,17.0,17.0,17.0,17.0,285860400000.0,3373105000000.0,40.0,1202,2297


In [142]:
sampleSubmission['prediction'] = prediction

In [143]:
sampleSubmission.to_csv('./rf_submission.csv', index=False)