# 2021 Data Science and Machine Learning Pretest

*   This Colab is read-only. Please save a copy of it on your Drive to edit it by going to `Menu > File > Save a copy in Drive`.
*   Rename your Colab in the following format and replace 109423000 with your student ID:
> `Copy 109423000 of Data Science and Machine Learning pretest v21.02.ipynb`
*   You are required to complete this pretest **on your own**.
*   When you have completed the questions below, make sure you turn on the **share/edit/view persmission**.


# Question 1: Text preprocessing
Most of the text data acquired through web crawling and review can be noisy. When handling this kind of text data, preprocessing is an important step to ensure the quality of the dataset. There are multiple ways of doing text preprocessing. Below is an example flow of preprocessing text data. 

1. lowercase 
2. decontracting 
3. remove tags, punctuations, numbers
3. tokenization 
4. stopword removal 
5. lemmatization 
6. stemming


 

## 1-1. Please briefly explain what each step is doing.(30%)


1. Standardize text data in single format. Since the word 'love' is same as 'LoVE'.
2. English has a couple of contractions(abbreviations/縮寫). For instance 'aren't' stands for 'are not'. We should split/decontract contractions in full sentence in case of consistency(一致性).
3. In many cases, we want to remove all the non-words charactersthe (e.g. punctuation marks, HTML tags) and it’s easy to remove them with regex.
4. Since the data all we have are sentences. But the basic document unit are 'tokens', not 'sentences'. A *token* is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing.
    - *Sentence tokenization* (also called sentence segmentation) is the problem of dividing a string of written language into its component sentences.
    - *Word tokenization* (also called word segmentation) is the problem of dividing a string of written language into its component words.
5. Stop words are words which are **filtered out** before or after processing of text. They are the words in any language which does not add much meaning to a sentence. Stopwords could cause noise, that’s why we want to remove these irrelevant words. Common stop words: “and”, “the”, “a”

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:
```js
am, are, is => be //統一使用字根
car, cars, car's, cars' => car //去詞類變化
```
The result of this mapping of text will be something like:
```
the boy's cars are different colors =>
the boy car be differ color
```

6. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words(基於語言學/字典定義), normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .
7. Stemmers use language-specific rules, but they require less knowledge than a lemmatizer. It usually refers to a crude heuristic process(猜字根) that chops off the ends of words, which increases recall while harming precision. 

References:
- https://nlp.stanford.edu/IR-book/html/htmledition/contents-1.html
- https://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63
- https://medium.com/analytics-vidhya/natural-language-processing-for-developers-912ee0fda979 (with github code)
- Tokenization https://www.analyticsvidhya.com/blog/2020/05/what-is-tokenization-nlp/
- Stop words https://medium.com/@saitejaponugoti/stop-words-in-nlp-5b248dadad47#:~:text=In%20computing%2C%20stop%20words%20are,universal%20list%20of%20stop%20words.
- [Youtube | Natural Language Processing In 10 Minutes](https://www.youtube.com/watch?v=5ctbvkAMQO4)

## 1-2. Please use the sample data and do the preprocessing following the provided flow.(70%)

In [1]:
documents = ["Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as 'Jumbo'",
        "I WAS VISITING MY FRIEND NATE THE OTHER MORNING FOR COFFEE , HE CAME OUT OF HIS STORAGE ROOM WITH ( A PACKET OF McCANNS INSTANT IRISH OATMEAL .) HE SUGGESTED THAT I TRY IT FOR MY OWN USE ,IN MY STASH . SOMETIMES NATE DOSE NOT GIVE YOU A CHANCE TO SAY NO , SO I ENDED UP TRYING THE APPLE AND CINN . FOUND IT TO BE VERY TASTEFULL WHEN MADE WITH WATER OR POWDERED MILK . IT GOES GOOD WITH O.J. AND COFFEE AND A SLICE OF TOAST AND YOUR READY TO TAKE ON THE WORLD...OR THE DAY AT LEAST..  JERRY REITH...",
        "I don't know if it's the cactus or the tequila or just the unique combination of ingredients, but the flavour of this hot sauce makes it one of a kind!  We picked up a bottle once on a trip we were on and brought it back home with us and were totally blown away!  When we realized that we simply couldn't find it anywhere in our city we were bummed.<br /><br />Now, because of the magic of the internet, we have a case of the sauce and are ecstatic because of it.<br /><br />If you love hot sauce..I mean really love hot sauce, but don't want a sauce that tastelessly burns your throat, grab a bottle of Tequila Picante Gourmet de Inclan.  Just realize that once you taste it, you will never want to use any other sauce.<br /><br />Thank you for the personal, incredible service!",
        "Product received is as advertised.<br /><br /><a href='http://www.amazon.com/gp/product/B001GVISJM'>Twizzlers, Strawberry, 16-Ounce Bags (Pack of 6)</a>",
        "this was sooooo deliscious but too bad i ate em too fast and gained 2 pds! my fault",
        "Deal was awesome!  Arrived before Halloween as indicated and was enough to satisfy trick or treaters.  I love the quality of this product and it was much less expensive than the local store's candy.",
        "I love these.........very tasty!!!!!!!!!!!  Infact, I think I am addicted to them.<br />Buying them in packs of 6 bags - is very reasonable than going to Target and getting a bag.  Savings are about a $1.00 a bag.  I use subscribe and save on these and several other product.  I love subscribe and save!!!!!!!!!!!",
        "I LOVE spicy ramen, but for whatever reasons this thing burns my stomach badly and the burning sensation doesn't go away for like 3 hours! Not sure if that is healthy or not .... and you can buy this at Walmart for $0.28, way cheaper than Amazon.",
        "Makes a tasty, super easy meal, fast. BUT high in calories.<br /><br />The instructions say to saute the veggies first but I recommend cooking the chicken first. The chicken takes longer to cook and the raw chicken ontop of veggies just makes a slimy mess. I made it with snow peas and carrots only. I dont like the little corn.  Added some red pepper flakes for heat and served ontop of rice.  It came out wonderful! Dinner on the table in less than 30mins.",
        "Love this sugar.  I also get muscavado sugar and they are both great to use in place of regular white sugar. Recommend!",
        "This is just Fantastic Chicken Noodle soup, the best I have ever eaten, with large hearty chunks of chicken,and vegetables and nice large noodles. This soup is just so full bodied, and is seasoned just right.  I am so glad Amazon carries this product.  I just can't find it here in Vermont."]

In [17]:
import pandas as pd
df= pd.DataFrame(documents,columns=['sentence'])
df

Unnamed: 0,sentence
0,Product arrived labeled as Jumbo Salted Peanut...
1,I WAS VISITING MY FRIEND NATE THE OTHER MORNIN...
2,I don't know if it's the cactus or the tequila...
3,Product received is as advertised.<br /><br />...
4,this was sooooo deliscious but too bad i ate e...
5,Deal was awesome! Arrived before Halloween as...
6,I love these.........very tasty!!!!!!!!!!! In...
7,"I LOVE spicy ramen, but for whatever reasons t..."
8,"Makes a tasty, super easy meal, fast. BUT high..."
9,Love this sugar. I also get muscavado sugar a...


In [43]:
import re

# decontracting
def decontract(phrase):
    # specific
    phrase = re.sub(r"won\'t", "will not", phrase)
    phrase = re.sub(r"can\'t", "can not", phrase)

    # general
    phrase = re.sub(r"n\'t", " not", phrase)
    phrase = re.sub(r"\'re", " are", phrase)
    phrase = re.sub(r"\'s", " is", phrase)
    phrase = re.sub(r"\'d", " would", phrase)
    phrase = re.sub(r"\'ll", " will", phrase)
    phrase = re.sub(r"\'t", " not", phrase)
    phrase = re.sub(r"\'ve", " have", phrase)
    phrase = re.sub(r"\'m", " am", phrase)
    return phrase

# remove non-alpha/number charater (Punctuation Marks)
def text(phrase):
    # By Python definition '\W == [^a-zA-Z0-9_], which excludes all numbers, letters and _ (but we need to include whitespace \s)
    phrase = re.sub(r'[^a-zA-Z0-9_\s]+', '', phrase)
    return phrase

# stopword removal & tokenization
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# nltk.download('stopwords') # must download at first execution
# nltk.download('punkt')

def tokenize_rm_stopwords(phrase): 
    text_tokens = word_tokenize(phrase)
    tokens_without_sw = [word for word in text_tokens if not word in stopwords.words('english')]
    return tokens_without_sw

# stem/lemmatize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import wordnet
# nltk.download('wordnet') 
lemmatizer = WordNetLemmatizer()

def lemmatize(token_lst):
    return " ".join([lemmatizer.lemmatize(word) for word in token_lst])
    


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Weber\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.


In [46]:
# Unit test
sample_text = "Oh man, this's pretty cool. We will do more such things."
res = decontract(sample_text)
res = text(res)
print(res)
res = tokenize_rm_stopwords(res) # ㄚㄚㄚ砍太多字惹吧
print(res)
res = lemmatize(res)

res

Oh man this is pretty cool We will do more such things
['Oh', 'man', 'pretty', 'cool', 'We', 'things']


'Oh man pretty cool We thing'

In [47]:
# modulize, can't change order
def process_script(phrase):
    phrase = phrase.lower()
    res = decontract(phrase)
    res = text(res)
    res = tokenize_rm_stopwords(res)
    res = lemmatize(res)
    return res


In [48]:
# show results
df['processed'] = df.apply(lambda row: process_script(row['sentence']), axis = 1)
df

Unnamed: 0,sentence,processed
0,Product arrived labeled as Jumbo Salted Peanut...,product arrived labeled jumbo salted peanutsth...
1,I WAS VISITING MY FRIEND NATE THE OTHER MORNIN...,visiting friend nate morning coffee came stora...
2,I don't know if it's the cactus or the tequila...,know cactus tequila unique combination ingredi...
3,Product received is as advertised.<br /><br />...,product received advertisedbr br hrefhttpwwwam...
4,this was sooooo deliscious but too bad i ate e...,sooooo deliscious bad ate em fast gained 2 pd ...
5,Deal was awesome! Arrived before Halloween as...,deal awesome arrived halloween indicated enoug...
6,I love these.........very tasty!!!!!!!!!!! In...,love thesevery tasty infact think addicted the...
7,"I LOVE spicy ramen, but for whatever reasons t...",love spicy ramen whatever reason thing burn st...
8,"Makes a tasty, super easy meal, fast. BUT high...",make tasty super easy meal fast high caloriesb...
9,Love this sugar. I also get muscavado sugar a...,love sugar also get muscavado sugar great use ...


# Question 2: DataFrame handling



*   Please download the datasets from the following link, https://www.kaggle.com/aaron7sun/stocknews
*   Save the downloaded files on your own drive and load the file for later use.
- There are two channels of data provided in this dataset:

  - **News data:** Crawled historical news headlines from Reddit WorldNews Channel . They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. (Range: 2008-06-08 to 2016-07-01)

  - **Stock data:** Dow Jones Industrial Average (DJIA) is used to "prove the concept". (Range: 2008-08-08 to 2016-07-01)



In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
import pandas as pd
corpus_root = 'drive/My Drive/Colab Notebooks/datasets/' 

In [1]:
# load local file (optional)
# date type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>
import pandas as pd
news_df = pd.read_csv('RedditNews.csv', parse_dates =["Date"])
stock_df = pd.read_csv('upload_DJIA_table.csv', parse_dates =["Date"], index_col ="Date")

In [2]:
news_df = pd.read_csv(corpus_root+'RedditNews.csv')
stock_df = pd.read_csv(corpus_root+'upload_DJIA_table.csv')

NameError: name 'corpus_root' is not defined

In [3]:
news_df.head()

Unnamed: 0,Date,News
0,2016-07-01,A 117-year-old woman in Mexico City finally re...
1,2016-07-01,IMF chief backs Athens as permanent Olympic host
2,2016-07-01,"The president of France says if Brexit won, so..."
3,2016-07-01,British Man Who Must Give Police 24 Hours' Not...
4,2016-07-01,100+ Nobel laureates urge Greenpeace to stop o...


In [4]:
stock_df.head(15)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141
2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234
2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688
2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703
2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234
2016-06-24,17946.630859,17946.630859,17356.339844,17400.75,239000000,17400.75
2016-06-23,17844.109375,18011.070312,17844.109375,18011.070312,98070000,18011.070312
2016-06-22,17832.669922,17920.160156,17770.359375,17780.830078,89440000,17780.830078
2016-06-21,17827.330078,17877.839844,17799.800781,17829.730469,85130000,17829.730469
2016-06-20,17736.869141,17946.359375,17736.869141,17804.869141,99380000,17804.869141



## 2-1. Define a *funtion* aims to know the weekly (7 days) Stock **Close value** trend. (80%)
- Implement a *funtion* that determines the weekly (7 days) Stock **Close value** trend using the Stock data and **then record the trend in the News data in a new column "label" with 0, 1 and -1**. For example, 
  - On 2016-06-13, the market closed at 17732.480469. Seven days later, 2016-06-20, the market closed **higher** at 17804.869141. In this scenario, all entries corresponding to 2016-06-13 in the News data will be **marked 1** in the "label" column. Dates 2016-06-14 and 2016-06-15 will also be marked 1 because 2016-06-21 and 2016-06-22 closed higher, respectively.
  - On the other hand, 2016-06-17 will be **marked 0** because 2016-06-24 (7 days later) closed **lower**.
  - If a given date does not have a corresponding date for 7 days later, the given date will be **marked -1**.




In [5]:
import datetime
def label(date, close):
  '''
  please answer here, you can add any parameters if you want.
  but don't import other libraries, this notebook already prepared the libraries which all you need !
  remember that, the standard for evaluation include your:
    1. Time Complexity (80%)
    2. Program Logic (10%)
    3. Creativity (10%)
  '''
  date_7d = date + pd.Timedelta(7, unit="d")

  try:
    # find 7 days later, mark rises(1) and falls(0)
    close_7d = stock_df.loc[date_7d]['Close']
    price_delta = close_7d - close
    # print(str(date)+" -> "+str(date_7d)+", delta= "+str(price_delta))
    return 1 if (price_delta > 0) else 0
  
  except:
    # there's no closing value 7 days later, mark as NaN (-1)
    # print("except: "+str(date_7d.date()))
    return -1

In [6]:
'''
than apply your defined function on stock_df here.
return results store into a new columns name "Label"

'''
label_lst = []
for index, row in stock_df.iterrows():
    label_lst.append(label(index, row['Close']))
    # print(index, row['Close'])

# label_lst
# stock_df

In [9]:
stock_df['Label'] = label_lst
stock_df.tail(20)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close,Label
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-09-05,11185.629883,11245.150391,11037.849609,11220.959961,198300000,11220.959961,1
2008-09-04,11532.480469,11532.480469,11176.019531,11188.230469,229200000,11188.230469,1
2008-09-03,11506.009766,11554.379883,11416.530273,11532.879883,174250000,11532.879883,0
2008-09-02,11545.629883,11790.169922,11471.900391,11516.919922,177090000,11516.919922,0
2008-08-29,11713.230469,11713.230469,11543.389648,11543.959961,166910000,11543.959961,0
2008-08-28,11499.870117,11715.179688,11499.790039,11715.179688,149150000,11715.179688,0
2008-08-27,11412.459961,11554.459961,11381.769531,11502.509766,120580000,11502.509766,1
2008-08-26,11383.55957,11436.240234,11340.410156,11412.870117,119800000,11412.870117,1
2008-08-25,11626.19043,11626.269531,11362.629883,11386.25,148610000,11386.25,-1
2008-08-22,11426.790039,11632.129883,11426.790039,11628.05957,138790000,11628.05957,0


## 2-2. Map your label to news data (20%)

- join dataframes https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html
- fillna https://www.geeksforgeeks.org/replace-nan-values-with-zeros-in-pandas-dataframe/

In [16]:
def news_label(news_df, stock_df):
  '''
  Map your close value trend label in news data by date, also name it "Label"
  if a date does not appear in the stock data, also label it as -1.
  e.g. all news in 2016-06-30 will be label -1
  '''
  # Create new dataframe with labels
  df = news_df.join(stock_df, on='Date')
  
  # Apply the function 
  df['Label'] = df['Label'].fillna(-1) 
  return df[["Date", "News", "Label"]]


In [19]:
res = news_label(news_df, stock_df)
res.head(10)

Unnamed: 0,Date,News,Label
0,2016-07-01,A 117-year-old woman in Mexico City finally re...,-1.0
1,2016-07-01,IMF chief backs Athens as permanent Olympic host,-1.0
2,2016-07-01,"The president of France says if Brexit won, so...",-1.0
3,2016-07-01,British Man Who Must Give Police 24 Hours' Not...,-1.0
4,2016-07-01,100+ Nobel laureates urge Greenpeace to stop o...,-1.0
5,2016-07-01,Brazil: Huge spike in number of police killing...,-1.0
6,2016-07-01,Austria's highest court annuls presidential el...,-1.0
7,2016-07-01,"Facebook wins privacy case, can track any Belg...",-1.0
8,2016-07-01,Switzerland denies Muslim girls citizenship af...,-1.0
9,2016-07-01,China kills millions of innocent meditators fo...,-1.0


In [20]:
res.tail(10)

Unnamed: 0,Date,News,Label
73598,2008-06-08,"b""S. Korean protesters, police clash in beef r...",-1.0
73599,2008-06-08,"b""Oil reserves 'will last decades' - a BBC Sco...",-1.0
73600,2008-06-08,b'Cameras designed to detect terrorist facial ...,-1.0
73601,2008-06-08,b'Israeli peace activists protest 41 years of ...,-1.0
73602,2008-06-08,"b""A 5.1 earthquake hits China's Southern Qingh...",-1.0
73603,2008-06-08,b'Man goes berzerk in Akihabara and stabs ever...,-1.0
73604,2008-06-08,b'Threat of world AIDS pandemic among heterose...,-1.0
73605,2008-06-08,b'Angst in Ankara: Turkey Steers into a Danger...,-1.0
73606,2008-06-08,"b""UK: Identity cards 'could be used to spy on ...",-1.0
73607,2008-06-08,"b'Marriage, they said, was reduced to the stat...",-1.0


In [22]:
res.iloc[200:210]

Unnamed: 0,Date,News,Label
200,2016-06-23,Today The United Kingdom decides whether to re...,0.0
201,2016-06-23,"E-cigarettes should not be banned in public, m...",0.0
202,2016-06-23,Report: China is still harvesting organs from ...,0.0
203,2016-06-23,"Man opens fire at cinema complex in Germany, s...",0.0
204,2016-06-23,"Erdoan: Europe, you dont want us because were ...",0.0
205,2016-06-23,Asian millionaires now control more wealth tha...,0.0
206,2016-06-23,A Japanese porn industry association has apolo...,0.0
207,2016-06-23,University students are being warned when clas...,0.0
208,2016-06-23,Afghan interpreters 'betrayed' by UK and US,0.0
209,2016-06-23,Contagious cancer cells are spreading between ...,0.0


# Question 3: Compute cosine similarity of TF-IDF (term frequency–inverse document frequency)
-  **Cosine similarity** is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. 
- **TF-IDF** is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus








## 3-1. Please answer why we can use cosine similarity to measure TF-IDF representation. Is there any other representation methods also can be measured by cosine similarity? (20%)

## 3-2. Define a converting function to compute tf-idf vector from a list of ducoments. (40%)

In [None]:
documents = ['terrible service this time','terrible terrible service','most terrible service','terrible service and experience','what a terrible service','so terrible service experience','what a terrible disappointment','what a terrible place','this time it was so horrible','the staff was horrible']

In [None]:
'''
Answer here
you can define other functions to support the defined function if you need.

TF-IDF dataframe show as the following table.
'''
import numpy as np
import math
def computTFIDF(documents):

  return 

In [None]:
import pandas as pd
tf_idf_list = computTFIDF(documents)
df = pd.DataFrame(computTFIDF(documents))
df

Unnamed: 0,terrible,service,this,time,most,and,experience,what,a,so,disappointment,place,it,was,horrible,the,staff
0,0.024228,0.055462,0.174743,0.174743,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.064607,0.07395,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.032303,0.07395,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.024228,0.055462,0.0,0.0,0.0,0.25,0.174743,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.024228,0.055462,0.0,0.0,0.0,0.0,0.0,0.13072,0.13072,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.024228,0.055462,0.0,0.0,0.0,0.0,0.174743,0.0,0.0,0.174743,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.024228,0.0,0.0,0.0,0.0,0.0,0.0,0.13072,0.13072,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
7,0.024228,0.0,0.0,0.0,0.0,0.0,0.0,0.13072,0.13072,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.116495,0.116495,0.0,0.0,0.0,0.0,0.0,0.116495,0.0,0.0,0.166667,0.116495,0.116495,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.174743,0.174743,0.25,0.25


## 3-3. Define a scoring function to compute the cosine similarity between two input vector. (30%)

In [None]:
'''
Answer here
return the cosine similarity between the given two vectors
Apply the function which you designed to all sentences, and show your scoring results as the following table.
'''
def cosine_sim(vec_a, vec_b):

  return score

## 3-4. Show the cross comparation table for the given sentences. (10%)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.0,0.226813,0.055972,0.046299,0.074014,0.056587,0.007397,0.007397,0.517451,0.0
1,0.226813,1.0,0.224349,0.185576,0.296664,0.226813,0.051111,0.051111,0.0,0.0
2,0.055972,0.224349,1.0,0.045796,0.073209,0.055972,0.007317,0.007317,0.0,0.0
3,0.046299,0.185576,0.045796,1.0,0.060557,0.432244,0.006053,0.006053,0.0,0.0
4,0.074014,0.296664,0.073209,0.060557,1.0,0.074014,0.57302,0.57302,0.0,0.0
5,0.056587,0.226813,0.055972,0.432244,0.074014,1.0,0.007397,0.007397,0.258725,0.0
6,0.007397,0.051111,0.007317,0.006053,0.57302,0.007397,1.0,0.357407,0.0,0.0
7,0.007397,0.051111,0.007317,0.006053,0.57302,0.007397,0.357407,1.0,0.0,0.0
8,0.517451,0.0,0.0,0.0,0.0,0.258725,0.0,0.0,1.0,0.305206
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.305206,1.0
