### Please complete the following sections sequentially to complete this assignment.

##### <span style="color:red">Note: You can create as many code or markdown cells as you deem necessary to answer each question. However, please leave the problems unchanged. We will evaluate your solutions by executing your code sequentially.</span> 
---

**Within the expansion of the Internet and Web, there has also been a growing interest in online articles and reviews, which allows an easy and fast spread of information worldwide. Thus, predicting the popularity of online news has become a trend. Popularity is often measured by considering the number of interactions in the Web and social networks (e.g., number of shares, likes, and comments). Predicting such popularity is valuable for advertisers, authors, content providers, and even activists/politicians (e.g., to understand or influence public opinion). In this assignment, we use a news popularity dataset utilized by Fernandes et al. (2015) based on the articles published by [Mashable](https://mashable.com/) from January 7, 2013, to January 7, 2015.**

**<span style="color:red">The objective of this assignment is to predict the number of times a news article is shared. </span> The assignment's dataset is included in the homework's zipped folder. Table below has the description of each variable in the dataset.**

| Variable                      | Description                                                                       |
|-------------------------------|-----------------------------------------------------------------------------------|
| url                           | URL of the article (non-predictive)                                               |
| timedelta                     | Days between the article publication and the dataset acquisition (non-predictive) |
| n_tokens_title                | Number of words in the title                                                      |
| n_tokens_content              | Number of words in the content                                                    |
| n_unique_tokens               | Rate of unique words in the content                                               |
| n_non_stop_words              | Rate of non-stop words in the content                                             |
| n_non_stop_unique_tokens      | Rate of unique non-stop words in the content                                      |
| num_hrefs                     | Number of links                                                                   |
| num_self_hrefs                | Number of links to other articles published by Mashable                           |
| num_imgs                      | Number of images                                                                  |
| num_videos                    | Number of videos                                                                  |
| average_token_length          | Average length of the words in the content                                        |
| num_keywords                  | Number of keywords in the metadata                                                |
| data_channel_is_lifestyle     | Is data channel 'Lifestyle'?                                                      |
| data_channel_is_entertainment | Is data channel 'Entertainment'?                                                  |
| data_channel_is_bus           | Is data channel 'Business'?                                                       |
| data_channel_is_socmed        | Is data channel 'Social Media'?                                                   |
| data_channel_is_tech          | Is data channel 'Tech'?                                                           |
| data_channel_is_world         | Is data channel 'World'?                                                          |
| kw_min_min                    | Min. shares of the Worst keyword in the article                                   |
| kw_max_min                    | Max. shares of the Worst keyword in the article                                   |
| kw_avg_min                    | Avg. shares of the Worst keyword in the article                                   |
| kw_min_max                    | Min. shares of the best keyword in the article                                    |
| kw_max_max                    | Max. shares of the best keyword in the article                                    |
| kw_avg_max                    | Avg. shares of the best keyword in the article                                    |
| kw_min_avg                    | Min. shares of the average keyword in the article                                 |
| kw_max_avg                    | Max. shares of the average keyword in the article                                 |
| kw_avg_avg                    | Avg. shares of the average keyword in the article                                 |
| self_reference_min_shares     | Min. shares of referenced articles in Mashable                                    |
| self_reference_max_shares     | Max. shares of referenced articles in Mashable                                    |
| self_reference_avg_sharess    | Avg. shares of referenced articles in Mashable                                    |
| weekday_is_monday             | Was the article published on a Monday?                                            |
| weekday_is_tuesday            | Was the article published on a Tuesday?                                           |
| weekday_is_wednesday          | Was the article published on a Wednesday?                                         |
| weekday_is_thursday           | Was the article published on a Thursday?                                          |
| weekday_is_friday             | Was the article published on a Friday?                                            |
| weekday_is_saturday           | Was the article published on a Saturday?                                          |
| weekday_is_sunday             | Was the article published on a Sunday?                                            |
| is_weekend                    | Was the article published on the weekend?                                         |
| LDA_00                        | Closeness to LDA topic 0                                                          |
| LDA_01                        | Closeness to LDA topic 1                                                          |
| LDA_02                        | Closeness to LDA topic 2                                                          |
| LDA_03                        | Closeness to LDA topic 3                                                          |
| LDA_04                        | Closeness to LDA topic 4                                                          |
| global_subjectivity           | Text subjectivity                                                                 |
| global_sentiment_polarity     | Text sentiment polarity                                                           |
| global_rate_positive_words    | Rate of positive words in the content                                             |
| global_rate_negative_words    | Rate of negative words in the content                                             |
| rate_positive_words           | Rate of positive words among non-neutral tokens                                   |
| rate_negative_words           | Rate of negative words among non-neutral tokens                                   |
| avg_positive_polarity         | Avg. polarity of positive words                                                   |
| min_positive_polarity         | Min. polarity of positive words                                                   |
| max_positive_polarity         | Max. polarity of positive words                                                   |
| avg_negative_polarity         | Avg. polarity of negative words                                                   |
| min_negative_polarity         | Min. polarity of negative words                                                   |
| max_negative_polarity         | Max. polarity of negative words                                                   |
| title_subjectivity            | Title subjectivity                                                                |
| title_sentiment_polarity      | Title polarity                                                                    |
| abs_title_subjectivity        | Absolute subjectivity level                                                       |
| abs_title_sentiment_polarity  | Absolute polarity level                                                           |
| **shares (Target)**           | **Number of shares**                                                              |
| popular (DO NOT USE)          | whether the article is popular (yes/no)                                           |

Reference:

Fernandes, K., Vinagre, P., & Cortez, P. (2015, September). A proactive intelligent decision support system for predicting the popularity of online news. In Portuguese Conference on Artificial Intelligence (pp. 535-546). Springer, Cham.

---
### Import Packages and Read the Data

**Before starting the assignment, import all necessary libraries and read the dataset into the Python environment.**

In [2]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
from sklearn import tree

df = pd.read_csv('online_news_popularity.csv')
from pydotplus import graph_from_dot_data
df.head()

Unnamed: 0,url,timedelta,n_tokens_title,n_tokens_content,n_unique_tokens,n_non_stop_words,n_non_stop_unique_tokens,num_hrefs,num_self_hrefs,num_imgs,num_videos,average_token_length,num_keywords,channel,kw_min_min,kw_max_min,kw_avg_min,kw_min_max,kw_max_max,kw_avg_max,kw_min_avg,kw_max_avg,kw_avg_avg,self_reference_min_shares,self_reference_max_shares,self_reference_avg_sharess,weekday,is_weekend,LDA_00,LDA_01,LDA_02,LDA_03,LDA_04,global_subjectivity,global_sentiment_polarity,global_rate_positive_words,global_rate_negative_words,rate_positive_words,rate_negative_words,avg_positive_polarity,min_positive_polarity,max_positive_polarity,avg_negative_polarity,min_negative_polarity,max_negative_polarity,title_subjectivity,title_sentiment_polarity,abs_title_subjectivity,abs_title_sentiment_polarity,shares,popular
0,http://mashable.com/2013/01/07/amazon-instant-...,731,12,219,0.663594,1.0,0.815385,4,2,1,0,4.680365,5,entertainment,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,496.0,496.0,496.0,monday,0,0.500331,0.378279,0.040005,0.041263,0.040123,0.521617,0.092562,0.045662,0.013699,0.769231,0.230769,0.378636,0.1,0.7,-0.35,-0.6,-0.2,0.5,-0.1875,0.0,0.1875,593,no
1,http://mashable.com/2013/01/07/ap-samsung-spon...,731,9,255,0.604743,1.0,0.791946,3,1,1,0,4.913725,4,bussiness,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,monday,0,0.799756,0.050047,0.050096,0.050101,0.050001,0.341246,0.148948,0.043137,0.015686,0.733333,0.266667,0.286915,0.033333,0.7,-0.11875,-0.125,-0.1,0.0,0.0,0.5,0.0,711,no
2,http://mashable.com/2013/01/07/apple-40-billio...,731,9,211,0.57513,1.0,0.663866,3,1,1,0,4.393365,6,bussiness,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,918.0,918.0,918.0,monday,0,0.217792,0.033334,0.033351,0.033334,0.682188,0.702222,0.323333,0.056872,0.009479,0.857143,0.142857,0.495833,0.1,1.0,-0.466667,-0.8,-0.133333,0.0,0.0,0.5,0.0,1500,yes
3,http://mashable.com/2013/01/07/astronaut-notre...,731,9,531,0.503788,1.0,0.665635,9,0,1,0,4.404896,7,entertainment,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,monday,0,0.028573,0.4193,0.494651,0.028905,0.028572,0.42985,0.100705,0.041431,0.020716,0.666667,0.333333,0.385965,0.136364,0.8,-0.369697,-0.6,-0.166667,0.0,0.0,0.5,0.0,1200,no
4,http://mashable.com/2013/01/07/att-u-verse-apps/,731,13,1072,0.415646,1.0,0.54089,19,19,20,0,4.682836,7,tech,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,545.0,16000.0,3151.157895,monday,0,0.028633,0.028794,0.028575,0.028572,0.885427,0.513502,0.281003,0.074627,0.012127,0.860215,0.139785,0.411127,0.033333,1.0,-0.220192,-0.5,-0.05,0.454545,0.136364,0.045455,0.136364,505,no


---
### Introduction to Regression Trees

**1- Watch this [video](https://ohiouniversity.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=403295c8-1da1-4c46-a3ed-acd9002069dd) for an intorudction to regression trees.**

**2- Briefly describe how regression trees work. (10 pts)**

##### A regression tree is built through a process known as binary recursive partitioning, which is aprocess that splits the data into partitions or branches, and then continues splitting each partition into smaller groups as the method moves up each branch.

**3- What are the similarities of classification and regression tree models? (10 pts)**

#### They are both used for classification of variables and are not limited to binary classification

**4- What are the differences of classification and regression tree models? (10 pts)**

###### Regression algorithms seek to predict a continuous quantity and classification algorithms seek to predict a class label and the way we measure the accuracy of regression and classification models differs.

**5- How is MSE used in regression trees? (10 pts)**

###### Decision trees regression normally use mean squared error (MSE) to decide to split a node in two or more sub-nodes, suppose we are doing a binary tree the algorithm first will pick a value, and split the data into two subset.

**6- Why does overfitting happen in regression trees? and how can it be avoided? (10 pts)**

###### When you are trying to estimate too many parameters it increased the test set error. To avoid overfitting a regression model, you should draw a random sample that is large enough to handle all of the terms that you expect to include in your model.

---
### Regression Trees in Python

**7- Watch this [video](https://ohiouniversity.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=bd5b0d61-6837-4d54-b2b4-acd9002071b8) to learn about implementing regression trees in Python. The video's dataset is included in the assignment zipped folder, in case you want to replicate the codes.**

**8- Check if there are any missing values and take care of them if needed. (5 pts)**

In [3]:
df.isna().sum()

url                             0
timedelta                       0
n_tokens_title                  0
n_tokens_content                0
n_unique_tokens                 0
n_non_stop_words                0
n_non_stop_unique_tokens        0
num_hrefs                       0
num_self_hrefs                  0
num_imgs                        0
num_videos                      0
average_token_length            0
num_keywords                    0
channel                         0
kw_min_min                      0
kw_max_min                      0
kw_avg_min                      0
kw_min_max                      0
kw_max_max                      0
kw_avg_max                      0
kw_min_avg                      0
kw_max_avg                      0
kw_avg_avg                      0
self_reference_min_shares       0
self_reference_max_shares       0
self_reference_avg_sharess      0
weekday                         0
is_weekend                      0
LDA_00                          0
LDA_01        

**9- Detect and eliminate the outliers of these variables: ```['LDA_02', 'LDA_03', 'LDA_04']``` (10 pts)**

In [4]:
df_clean = df.copy()
var_list = ['LDA_02', 'LDA_03', 'LDA_04']
for var in var_list:
    iqr = df_clean.quantile(0.75)[var] - df_clean.quantile(0.25)[var]
    ub = df_clean.quantile(0.75)[var] + 1.5*iqr
    lb = df_clean.quantile(0.25)[var] - 1.5*iqr
    df_clean = df_clean[(df_clean[var] >= lb) & (df_clean[var] <= ub)]
df_clean.shape

(36103, 51)

**10- Dummy encode all categorical variables. (5 pts)**

In [5]:
cat_vars = ['channel','weekday']
df1 = pd.get_dummies(df_clean, columns=cat_vars, drop_first=True)
df1

Unnamed: 0,url,timedelta,n_tokens_title,n_tokens_content,n_unique_tokens,n_non_stop_words,n_non_stop_unique_tokens,num_hrefs,num_self_hrefs,num_imgs,num_videos,average_token_length,num_keywords,kw_min_min,kw_max_min,kw_avg_min,kw_min_max,kw_max_max,kw_avg_max,kw_min_avg,kw_max_avg,kw_avg_avg,self_reference_min_shares,self_reference_max_shares,self_reference_avg_sharess,is_weekend,LDA_00,LDA_01,LDA_02,LDA_03,LDA_04,global_subjectivity,global_sentiment_polarity,global_rate_positive_words,global_rate_negative_words,rate_positive_words,rate_negative_words,avg_positive_polarity,min_positive_polarity,max_positive_polarity,avg_negative_polarity,min_negative_polarity,max_negative_polarity,title_subjectivity,title_sentiment_polarity,abs_title_subjectivity,abs_title_sentiment_polarity,shares,popular,channel_entertainment,channel_lifestyle,channel_other,channel_social_media,channel_tech,channel_world,weekday_monday,weekday_saturday,weekday_sunday,weekday_thursday,weekday_tuesday,weekday_wednesday
0,http://mashable.com/2013/01/07/amazon-instant-...,731,12,219,0.663594,1.0,0.815385,4,2,1,0,4.680365,5,0,0.0,0.000,0,0,0.0000,0.000000,0.000000,0.000000,496.0,496.0,496.000000,0,0.500331,0.378279,0.040005,0.041263,0.040123,0.521617,0.092562,0.045662,0.013699,0.769231,0.230769,0.378636,0.100000,0.70,-0.350000,-0.600,-0.200000,0.500000,-0.187500,0.000000,0.187500,593,no,1,0,0,0,0,0,1,0,0,0,0,0
1,http://mashable.com/2013/01/07/ap-samsung-spon...,731,9,255,0.604743,1.0,0.791946,3,1,1,0,4.913725,4,0,0.0,0.000,0,0,0.0000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0,0.799756,0.050047,0.050096,0.050101,0.050001,0.341246,0.148948,0.043137,0.015686,0.733333,0.266667,0.286915,0.033333,0.70,-0.118750,-0.125,-0.100000,0.000000,0.000000,0.500000,0.000000,711,no,0,0,0,0,0,0,1,0,0,0,0,0
2,http://mashable.com/2013/01/07/apple-40-billio...,731,9,211,0.575130,1.0,0.663866,3,1,1,0,4.393365,6,0,0.0,0.000,0,0,0.0000,0.000000,0.000000,0.000000,918.0,918.0,918.000000,0,0.217792,0.033334,0.033351,0.033334,0.682188,0.702222,0.323333,0.056872,0.009479,0.857143,0.142857,0.495833,0.100000,1.00,-0.466667,-0.800,-0.133333,0.000000,0.000000,0.500000,0.000000,1500,yes,0,0,0,0,0,0,1,0,0,0,0,0
3,http://mashable.com/2013/01/07/astronaut-notre...,731,9,531,0.503788,1.0,0.665635,9,0,1,0,4.404896,7,0,0.0,0.000,0,0,0.0000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0,0.028573,0.419300,0.494651,0.028905,0.028572,0.429850,0.100705,0.041431,0.020716,0.666667,0.333333,0.385965,0.136364,0.80,-0.369697,-0.600,-0.166667,0.000000,0.000000,0.500000,0.000000,1200,no,1,0,0,0,0,0,1,0,0,0,0,0
4,http://mashable.com/2013/01/07/att-u-verse-apps/,731,13,1072,0.415646,1.0,0.540890,19,19,20,0,4.682836,7,0,0.0,0.000,0,0,0.0000,0.000000,0.000000,0.000000,545.0,16000.0,3151.157895,0,0.028633,0.028794,0.028575,0.028572,0.885427,0.513502,0.281003,0.074627,0.012127,0.860215,0.139785,0.411127,0.033333,1.00,-0.220192,-0.500,-0.050000,0.454545,0.136364,0.045455,0.136364,505,no,0,0,0,0,1,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39638,http://mashable.com/2014/12/27/protests-contin...,8,11,223,0.653153,1.0,0.825758,5,3,1,0,4.923767,6,-1,459.0,91.000,0,843300,484083.3333,0.000000,4301.332394,2665.713159,2000.0,5700.0,3633.333333,0,0.551338,0.033337,0.033347,0.033335,0.348642,0.552041,0.268878,0.031390,0.004484,0.875000,0.125000,0.573469,0.214286,0.80,-0.250000,-0.250,-0.250000,0.000000,0.000000,0.500000,0.000000,1200,no,0,0,0,0,0,0,0,0,0,0,0,1
39639,http://mashable.com/2014/12/27/samsung-app-aut...,8,11,346,0.529052,1.0,0.684783,9,7,1,1,4.523121,8,-1,671.0,173.125,26900,843300,374962.5000,2514.742857,4004.342857,3031.115764,11400.0,48000.0,37033.333330,0,0.025038,0.025001,0.151701,0.025000,0.773260,0.482679,0.141964,0.037572,0.014451,0.722222,0.277778,0.333791,0.100000,0.75,-0.260000,-0.500,-0.125000,0.100000,0.000000,0.400000,0.000000,1800,yes,0,0,0,0,1,0,0,0,0,0,0,1
39640,http://mashable.com/2014/12/27/seth-rogen-jame...,8,12,328,0.696296,1.0,0.885057,9,7,3,48,4.405488,7,-1,616.0,184.000,6500,843300,192985.7143,1664.267857,5470.168651,3411.660830,2100.0,2100.0,2100.000000,0,0.029349,0.028575,0.231866,0.681635,0.028575,0.564374,0.194249,0.039634,0.009146,0.812500,0.187500,0.374825,0.136364,0.70,-0.211111,-0.400,-0.100000,0.300000,1.000000,0.200000,1.000000,1900,yes,0,0,0,1,0,0,0,0,0,0,0,1
39641,http://mashable.com/2014/12/27/son-pays-off-mo...,8,10,442,0.516355,1.0,0.644128,24,1,12,1,5.076923,8,-1,691.0,168.250,6200,843300,295850.0000,1753.882353,6880.687034,4206.439195,1400.0,1400.0,1400.000000,0,0.159004,0.025025,0.025207,0.643794,0.146970,0.510296,0.024609,0.033937,0.024887,0.576923,0.423077,0.307273,0.136364,0.50,-0.356439,-0.800,-0.166667,0.454545,0.136364,0.045455,0.136364,1900,yes,0,0,1,0,0,0,0,0,0,0,0,1


**11- Partition the data (Consider 80% of the data as train). (5 pts)**

In [15]:
var_list = ['n_tokens_title', 'n_tokens_content',
           'n_unique_tokens', 'n_non_stop_words', 'n_non_stop_unique_tokens',
           'num_hrefs', 'num_self_hrefs', 'num_imgs', 'num_videos',
           'average_token_length', 'num_keywords', 'kw_min_min', 'kw_max_min',
           'kw_avg_min', 'kw_min_max', 'kw_max_max', 'kw_avg_max', 'kw_min_avg',
           'kw_max_avg', 'kw_avg_avg', 'self_reference_min_shares',
           'self_reference_max_shares', 'self_reference_avg_sharess', 'is_weekend',
           'LDA_00', 'LDA_01', 'LDA_02', 'LDA_03', 'LDA_04', 'global_subjectivity',
           'global_sentiment_polarity', 'global_rate_positive_words',
           'global_rate_negative_words', 'rate_positive_words',
           'rate_negative_words', 'avg_positive_polarity', 'min_positive_polarity',
           'max_positive_polarity', 'avg_negative_polarity',
           'min_negative_polarity', 'max_negative_polarity', 'title_subjectivity',
           'title_sentiment_polarity', 'abs_title_subjectivity',
           'abs_title_sentiment_polarity',
           'channel_entertainment', 'channel_lifestyle', 'channel_other',
           'channel_social_media', 'channel_tech', 'channel_world',
           'weekday_monday', 'weekday_saturday', 'weekday_sunday',
           'weekday_thursday', 'weekday_tuesday', 'weekday_wednesday']

X=df1[var_list]
y=df1['shares']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

**12- Using proper input variables, build a regression tree that predicts the number of times a news article is shared. After building your model, do the following: (30 pts)**
* __Calculate the $ r^2 $ and MSE of the model on the train data,__
* __Visulaize the tree,__
* __Set the parameters of the regression tree such that it does not overfit the data.__

In [18]:
dec_tree = tree.DecisionTreeRegressor(ccp_alpha=4000000)
dec_tree.fit(X_train, y_train)

y_train_hat = dec_tree.predict(X_train)

print("Train R2:", r2_score(y_train, y_train_hat))
print("Train MSE:", mean_squared_error(y_train, y_train_hat))

Train R2: 0.4871272819164989
Train MSE: 83234679.25844014


In [19]:
text_representation = tree.export_text(dec_tree, feature_names=var_list)
print(text_representation)

|--- kw_avg_avg <= 3640.89
|   |--- n_tokens_content <= 2586.50
|   |   |--- value: [2758.75]
|   |--- n_tokens_content >  2586.50
|   |   |--- n_tokens_content <= 2592.00
|   |   |   |--- value: [663600.00]
|   |   |--- n_tokens_content >  2592.00
|   |   |   |--- value: [6786.16]
|--- kw_avg_avg >  3640.89
|   |--- self_reference_min_shares <= 254350.00
|   |   |--- self_reference_avg_sharess <= 5054.75
|   |   |   |--- value: [4334.07]
|   |   |--- self_reference_avg_sharess >  5054.75
|   |   |   |--- self_reference_avg_sharess <= 5099.56
|   |   |   |   |--- n_tokens_content <= 670.00
|   |   |   |   |   |--- value: [6533.33]
|   |   |   |   |--- n_tokens_content >  670.00
|   |   |   |   |   |--- value: [690400.00]
|   |   |   |--- self_reference_avg_sharess >  5099.56
|   |   |   |   |--- kw_avg_avg <= 3648.39
|   |   |   |   |   |--- self_reference_max_shares <= 7200.00
|   |   |   |   |   |   |--- value: [441000.00]
|   |   |   |   |   |--- self_reference_max_shares >  7200.00

In [20]:
dot_data = tree.export_graphviz(dec_tree, feature_names=var_list, rounded=True, filled=True)
graph = graph_from_dot_data(dot_data)
graph.write_pdf('Regressor.pdf')

True

**13- Test the tree you built on the test data by calculating the $ r^2 $ and MSE of the model on the test data: (10 pts)**

In [23]:
y_test_hat = dec_tree.predict(X_test)

print("Train R2:", r2_score(y_test, y_test_hat))
print("Train MSE:", mean_squared_error(y_test, y_test_hat))

Train R2: -0.969702835227076
Train MSE: 162695993.6693348


**14- Comparing your train and test results, do you see any evidence of overfitting? Explain. (10 pts)**

###### There is evidence of overfitting because MSE is a very large value.

**15- Which variables are the most important ones? Sort and show the input variables based on their importance. (5 pts)**

In [24]:
df2 = pd.DataFrame({'Variable':var_list, 'Importance': dec_tree.feature_importances_}).sort_values(by=['Importance'], ascending=False)
df2

Unnamed: 0,Variable,Importance
1,n_tokens_content,0.34585
18,kw_max_avg,0.305618
9,average_token_length,0.157117
21,self_reference_max_shares,0.07747
22,self_reference_avg_sharess,0.0592
20,self_reference_min_shares,0.026982
19,kw_avg_avg,0.022983
13,kw_avg_min,0.004779
41,title_subjectivity,0.0
0,n_tokens_title,0.0


**16- Why do you think the results of variabel importances might not be reliable? (10 pts)**

##### There is evidence of overfitting and the dataset is very large and complex

---
### Bonus Question

**17- When the classification counterpart of the problem was analyzed, the results were decent. However, the regression problem yielded poor results. What do you think is the reason? (20 pts)**