# Doctor and Veterinary Classification using NLP

This notebook is for building a model which will correctly classify a number of given reddit users as practicing doctors, practicng veterinary or others based on each user's comments 

The dataset for this task would be sourced from a databased whose link is given as

[postgresql://niphemi.oyewole:W7bHIgaN1ejh@ep-delicate-river-a5cq94ee-pooler.us-east-2.aws.neon.tech/Vetassist?statusColor=F8F8F8&env=&name=redditors%20db&tLSMode=0&usePrivateKey=false&safeModeLevel=0&advancedSafeModeLevel=0&driverVersion=0&lazyload=false](postgresql://niphemi.oyewole:W7bHIgaN1ejh@ep-delicate-river-a5cq94ee-pooler.us-east-2.aws.neon.tech/Vetassist?statusColor=F8F8F8&env=&name=redditors%20db&tLSMode=0&usePrivateKey=false&safeModeLevel=0&advancedSafeModeLevel=0&driverVersion=0&lazyload=false)

However, trying to access the database with the given link would result in errors

Therefore, a modified version of the link would be used

## Module Importations and Data Retrieval

Before continuing, needed libraries would be imported below

In [1]:
import re             # for regrex operations
import string         # for removing punctuations
import numpy as np    # for mathematical calculations
import pandas as pd    # for working with structured data (dataframes)
from sqlalchemy import create_engine # for connecting to database
from nltk.tokenize import word_tokenize
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.initializers import Constant
from sklearn.model_selection import train_test_split

In [2]:
from tensorflow.keras.preprocessing.text import Tokenizer
from autocorrect import Speller
from nltk.corpus import stopwords

In [81]:
import random

The modified link to access the database is defined below

In [3]:
# # define the connection link
# conn_str = "postgresql://niphemi.oyewole:endpoint=ep-delicate-river-a5cq94ee-pooler;W7bHIgaN1ejh@ep-delicate-river-a5cq94ee-pooler.us-east-2.aws.neon.tech/Vetassist?sslmode=allow"

# # create connection to the databse
# engine =  create_engine(conn_str)

First, lets take a look at the tables in the database

In [4]:
# define sql query for retrieving the tables in the database
sql_for_tables = """
SELECT
    table_schema || '.' || table_name
FROM
    information_schema.tables
WHERE
    table_type = 'BASE TABLE'
AND
    table_schema NOT IN ('pg_catalog', 'information_schema');
"""

In [5]:
# # retrieve the tables in a dataframe
# tables_df = pd.read_sql_query(sql_for_tables, engine)

In [6]:
# tables_df

There are two tables in the database as shown above

Each table would be saved in a pandas dataframe

In [7]:
sql_for_table1 = """
SELECT
    *
FROM
    public.reddit_usernames_comments;
"""

> Note: The code below may take a while to run. If it fails, reconnect the engine above then rerun the cell

In [8]:
# user_comment_df = pd.read_sql_query(sql_for_table1, engine)

Lets save the table as a csv file

In [9]:
user_comment_df = pd.read_csv("reddit_usernames_comments.csv")

In [10]:
# user_comment_df.to_csv('reddit_usernames_comments.csv', index=False)

In [11]:
sql_for_table2 = """
SELECT
    *
FROM
    public.reddit_usernames;
"""

> Note: The code below may take a while to run. If it fails, reconnect the engine above then rerun the cell

In [12]:
# user_info_df = pd.read_sql_query(sql_for_table2, engine)

In [13]:
user_info_df = pd.read_csv("reddit_usernames.csv")

Lets save the table as a csv file

In [14]:
# user_info_df.to_csv('reddit_usernames.csv', index=False)

Lets take a look at the tables one after the other

In [15]:
user_comment_df.head()

Unnamed: 0.1,Unnamed: 0,username,comments
0,0,LoveAGoodTwist,"Female, Kentucky. 4 years out. Work equine on..."
1,1,wahznooski,"As a woman of reproductive age, fuck Texas|As ..."
2,2,Churro_The_fish_Girl,what makes you want to become a vet?|what make...
3,3,abarthch,"I see of course there are changing variables, ..."
4,4,VoodooKing,I have 412+ and faced issues because wireguard...


In [16]:
user_comment_df = user_comment_df.drop(columns="Unnamed: 0")

In [17]:
user_comment_df.head()

Unnamed: 0,username,comments
0,LoveAGoodTwist,"Female, Kentucky. 4 years out. Work equine on..."
1,wahznooski,"As a woman of reproductive age, fuck Texas|As ..."
2,Churro_The_fish_Girl,what makes you want to become a vet?|what make...
3,abarthch,"I see of course there are changing variables, ..."
4,VoodooKing,I have 412+ and faced issues because wireguard...


In [18]:
user_comment_df.shape

(3276, 2)

In [19]:
user_info_df.head()

Unnamed: 0.1,Unnamed: 0,username,isused,subreddit,created_at
0,0,LoveAGoodTwist,True,Veterinary,2024-05-02
1,1,drawntage,True,Veterinary,2024-05-02
2,2,LinkPast84,True,Veterinary,2024-05-02
3,3,heatthequestforfire,True,Veterinary,2024-05-02
4,4,Most-Exit-5507,True,Veterinary,2024-05-02


In [20]:
user_info_df = user_info_df.drop(columns="Unnamed: 0")

In [21]:
user_info_df.head()

Unnamed: 0,username,isused,subreddit,created_at
0,LoveAGoodTwist,True,Veterinary,2024-05-02
1,drawntage,True,Veterinary,2024-05-02
2,LinkPast84,True,Veterinary,2024-05-02
3,heatthequestforfire,True,Veterinary,2024-05-02
4,Most-Exit-5507,True,Veterinary,2024-05-02


In [22]:
user_info_df.shape

(8259, 4)

## Data Exploration

This table (now dataframe) contains usernames of users and their comments

Lets look at a comment in order to understand how it is structured

In [23]:
# print all comments for first user
user_comment_df["comments"][0]

'Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a week Jan-June/July then variable in the off season. No limit on PTO - took ~5 weeks last year. One paid conference a year (registration/travel/ 1/2 hotel/ transportation) or online CE program. All licensures & professional group fees covered. Cell phone allowance and mileage reimbursement.|Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a week Jan-June/July then variable in the off season. No limit on PTO - took ~5 weeks last year. One paid conference a year (registration/travel/ 1/2 hotel/ transportation) or online CE program. All licensures & professional group fees covered. Cell phone allowance and mileage reimbursement.|Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a wee

In [24]:
# split comments into individual comments
first_comments = user_comment_df["comments"][0].split("|")

# get the number of comments for first user
len(first_comments)

16

In [25]:
# remove repeated comments
unique_comment = []
for comment in first_comments:
    if comment in unique_comment:
        continue
    else:
        unique_comment.append(comment)

In [26]:
print(f"Length of unique comments for first user: {len(unique_comment)}")
print()
print(unique_comment)

Length of unique comments for first user: 1

['Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a week Jan-June/July then variable in the off season. No limit on PTO - took ~5 weeks last year. One paid conference a year (registration/travel/ 1/2 hotel/ transportation) or online CE program. All licensures & professional group fees covered. Cell phone allowance and mileage reimbursement.']


It can be seen that the comment column contains multiple comments separated with "|"

It can also be seen that there are repeated comments

Lets check for missing values

In [27]:
user_comment_df.isna().sum()

username    1
comments    0
dtype: int64

In [28]:
user_comment_df[user_comment_df["username"].isna() == True]

Unnamed: 0,username,comments
23,,[deleted]|[deleted]|[deleted]|[deleted]|[delet...


In [29]:
user_comment_df.iloc[23]["username"] = "None"

In [30]:
user_comment_df.iloc[23]

username                                                 None
comments    [deleted]|[deleted]|[deleted]|[deleted]|[delet...
Name: 23, dtype: object

In [31]:
user_comment_df.iloc[23]["comments"]

'[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[removed]|Can I ask a question about really basic vetmed certification? I’m in an area that has a serious shortage of emergency trained vets, so much so that there’s been a pivot to regular vets not doing emergency triage, and not being able to recognize emergencies. \n\nIs there a basic certification that’s available so that pet owners can know when it’s time for the ER?|[deleted]|[deleted]|I agree with some of the below threads. Pay varies from state and I’ve also found big cities tend to pay more than hospitals in burbs or rural areas. For instance, I’m not certified and as a tech in Boston, MA I make $27/hour but in Chicago, IL I made $23/hour. That being said, I live with my boyfriend and having dual incomes is honestly the only way I can afford to live.\n\nI know moving for a job is a big thing consider but maybe not a bad idea to see what’s out there. I’ve also learned to not be afraid to advocate for yoursel

In [32]:
user_comment_df[user_comment_df["username"] == "None"]

Unnamed: 0,username,comments
23,,[deleted]|[deleted]|[deleted]|[deleted]|[delet...


There are no missig values

Let's check if there are duplicate usernames

In [33]:
if user_comment_df["username"].nunique() == user_comment_df.shape[0]:
    print("There are no duplicated usernames")
else:
    print("There are duplicated usernames")

There are no duplicated usernames


Lets explore the second dataframe also

In [34]:
user_info_df.head()

Unnamed: 0,username,isused,subreddit,created_at
0,LoveAGoodTwist,True,Veterinary,2024-05-02
1,drawntage,True,Veterinary,2024-05-02
2,LinkPast84,True,Veterinary,2024-05-02
3,heatthequestforfire,True,Veterinary,2024-05-02
4,Most-Exit-5507,True,Veterinary,2024-05-02


In [35]:
user_info_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8259 entries, 0 to 8258
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   username    8258 non-null   object
 1   isused      8259 non-null   bool  
 2   subreddit   8259 non-null   object
 3   created_at  8259 non-null   object
dtypes: bool(1), object(3)
memory usage: 201.8+ KB


From the summary above, we se that there are no missing values as each feature has exactly 8259 values which is total entries in the dataset

Let's check if there are duplicate usernames

In [36]:
if user_info_df["username"].nunique() == user_info_df.shape[0]:
    print("There are no duplicated usernames")
else:
    print("There are duplicated usernames")

There are duplicated usernames


In [37]:
user_info_df["username"].nunique()

8258

In [38]:
user_info_df.shape[0]

8259

In [39]:
user_info_df[user_info_df["username"].duplicated() == True]

Unnamed: 0,username,isused,subreddit,created_at


At this point lets create a function to preprocess the comments

## Data Preprocessing

Lets define functions to clean the dataset

In [40]:
def remove_web_link(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = re.sub(r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+",
                              "", text_list[i].strip())
    return " | ".join(text_list)

In [41]:
def remove_directories(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = re.sub(r"(/[a-zA-Z0-9_]+)+(/)*(.[a-zA-Z_]+)*",
                              "", text_list[i]).strip()
    return " | ".join(text_list)

In [42]:
def remove_deleted_comments(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = re.sub(r"\[deleted\]", "", text_list[i].strip())
    return " | ".join(text_list)

In [43]:
def remove_stopwords(text):
    text_list = text.split("|")
    stop_words = set(stopwords.words('english'))
    for i in range(len(text_list)):
        text_list[i] = " ".join([word for word in text_list[i].split() if word.lower() not in stop_words])
    return " | ".join(text_list)

In [44]:
def remove_punctuations(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = "". join([l if l not in string.punctuation else " " for l in text_list[i]])
        #text_list[i] = ''.join([l for l in text_list[i] if l not in string.punctuation])
    return " | ".join(text_list)

In [45]:
def remove_non_alphabets(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = re.sub(r"[^a-zA-Z ]", "", text_list[i].strip())
    return " | ".join(text_list)

In [46]:
# def autocorrect_spelling(text):
#     spell = Speller()
#     text_list = text.split("|")
#     for i in range(len(text_list)):
#         text_list[i] = spell(text_list[i])
#     return " | ".join(text_list)

In [47]:
def remove_unneeded_spaces(text):
    text_list = text.split("|")
    for i in range(len(text_list)):
        text_list[i] = re.sub(r"(\s)+", " ", text_list[i].strip())
    return " | ".join(text_list)

In [48]:
def remove_repeated_sentence(text):
    text_list = text.split("|")
    unique_comment = []
    for comment in text_list:
        if comment.strip() in unique_comment:
            continue
        else:
            unique_comment.append(comment.strip())
    return " | ".join(unique_comment)

In [49]:
def nlp_preprocessing(text):
    text = remove_web_link(text)
    text = remove_directories(text)
    text = remove_deleted_comments(text)
    text = remove_stopwords(text)
    text = remove_punctuations(text)
    text = remove_non_alphabets(text)
    # text = autocorrect_spelling(text)
    text = remove_unneeded_spaces(text)
    text = remove_repeated_sentence(text)
    text = text.lower()
    return text

## Hand Engineering

Lets check out the unique values in the subreddit feature as well as the count of each value

In [50]:
subreddit_count = user_info_df['subreddit'].value_counts()
subreddit_count

subreddit
Veterinary          6170
MysteriumNetwork     967
medicine             409
HeliumNetwork        400
orchid               303
vet                   10
Name: count, dtype: int64

In [51]:
subreddit_list = list(subreddit_count.index)

Lets explore each of this subreddit categories starting from the least (the bottom)

In [52]:
# get the number of vet subscribers that are in the first dataset

# initialize counter
user_count = 0
# create container for vet subcribers also in the first dataframe
vet_subscribers = []

# for each username who is a subcriber of vet
for user in user_info_df[user_info_df['subreddit'] == "vet"]["username"]:
    # if username is found in table1
    if not user_comment_df[user_comment_df["username"] == user].empty:
        # increment counter by 1
        user_count += 1
        # capture the username
        vet_subscribers.append(user)

print("Vet Subreddit Count")
print("Table1: {}".format(subreddit_count["vet"]))
print(f"Table2: {user_count}")

Vet Subreddit Count
Table1: 10
Table2: 9


One of the subscribers of vet is not in the first dataset

At this point it would be better to combine both dataset into one

Lets do that

In [53]:
reddit_user_df = pd.merge(user_comment_df, user_info_df,
                          on="username", how="left")

In [54]:
reddit_user_df.head()

Unnamed: 0,username,comments,isused,subreddit,created_at
0,LoveAGoodTwist,"Female, Kentucky. 4 years out. Work equine on...",True,Veterinary,2024-05-02
1,wahznooski,"As a woman of reproductive age, fuck Texas|As ...",True,Veterinary,2024-05-02
2,Churro_The_fish_Girl,what makes you want to become a vet?|what make...,True,Veterinary,2024-05-02
3,abarthch,"I see of course there are changing variables, ...",True,MysteriumNetwork,2024-05-02
4,VoodooKing,I have 412+ and faced issues because wireguard...,False,MysteriumNetwork,2024-05-03


In [55]:
reddit_user_df.iloc[23]["comments"]

'[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[deleted]|[removed]|Can I ask a question about really basic vetmed certification? I’m in an area that has a serious shortage of emergency trained vets, so much so that there’s been a pivot to regular vets not doing emergency triage, and not being able to recognize emergencies. \n\nIs there a basic certification that’s available so that pet owners can know when it’s time for the ER?|[deleted]|[deleted]|I agree with some of the below threads. Pay varies from state and I’ve also found big cities tend to pay more than hospitals in burbs or rural areas. For instance, I’m not certified and as a tech in Boston, MA I make $27/hour but in Chicago, IL I made $23/hour. That being said, I live with my boyfriend and having dual incomes is honestly the only way I can afford to live.\n\nI know moving for a job is a big thing consider but maybe not a bad idea to see what’s out there. I’ve also learned to not be afraid to advocate for yoursel

In [56]:
reddit_user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3276 entries, 0 to 3275
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   username    3276 non-null   object
 1   comments    3276 non-null   object
 2   isused      3275 non-null   object
 3   subreddit   3275 non-null   object
 4   created_at  3275 non-null   object
dtypes: object(5)
memory usage: 128.1+ KB


In [57]:
reddit_user_df["comments"][0]

'Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a week Jan-June/July then variable in the off season. No limit on PTO - took ~5 weeks last year. One paid conference a year (registration/travel/ 1/2 hotel/ transportation) or online CE program. All licensures & professional group fees covered. Cell phone allowance and mileage reimbursement.|Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a week Jan-June/July then variable in the off season. No limit on PTO - took ~5 weeks last year. One paid conference a year (registration/travel/ 1/2 hotel/ transportation) or online CE program. All licensures & professional group fees covered. Cell phone allowance and mileage reimbursement.|Female, Kentucky.  4 years out. Work equine only private practice. Base salary $85k plus bonuses/production which was $20k 2023. 6 days a wee

In [58]:
reddit_user_df_processed = reddit_user_df.copy()
reddit_user_df_processed["comments"] = reddit_user_df["comments"].apply(nlp_preprocessing)

In [59]:
reddit_user_df_processed["comments"][0]

'female kentucky years out work equine private practice base salary k plus bonuses k days week jan june limit pto took weeks last year one paid conference year registration transportation online ce program licensures professional group fees covered cell phone allowance mileage reimbursement'

### My Approach to Building the Model

The following are the approaches used to solve this problem
<ol>
    <li>All users would be categorized as others unless proven otherwise from the comments</li>
    <li>Comments are independent of each other (meaning a comment is not continued in another comment)</li>
    <li>When there is indication of user's category in a comment, other comments do not matter (i.e. when users state that they are doctors in a comment, even if other comment are not related to this, the user is still a doctor</li>
    <li>Comments made by a user would be splitted and considered separate data to capture the independence among comments</li>
    <li>When a user is a doctor or a veterinarian, at least one word in the comment that shows the profession must be related to doctor or veterinarian (i.e. when no word in a users comment is realted (similar) to doctor or medicine, automatically the user is not a doctor)</li>
    <li>Any user automatically found from above to not be a doctor or veterinarian would be automatically classified as Others</li>
    <li>The model (to be built) would be built on for only comments having at least a word related to doctor, medicine, veterinarian, animal, hospital or clinic</li>
</ol>

Split comments

In [60]:
user_separated_comment_dict = {
    "username" : [],
    "comment" : [],
    "subreddit" : [],
    "former_index" : []
}

for i in reddit_user_df_processed.index:
    for comment in reddit_user_df_processed.iloc[i]["comments"].split("|"):
        user_separated_comment_dict["username"].append(reddit_user_df_processed.iloc[i]["username"])
        user_separated_comment_dict["comment"].append(comment.strip())
        user_separated_comment_dict["subreddit"].append(reddit_user_df_processed.iloc[i]["subreddit"])
        user_separated_comment_dict["former_index"].append(i)
        
user_separated_comment_df = pd.DataFrame(user_separated_comment_dict)

In [61]:
user_separated_comment_df

Unnamed: 0,username,comment,subreddit,former_index
0,LoveAGoodTwist,female kentucky years out work equine private ...,Veterinary,0
1,wahznooski,woman reproductive age fuck texas,Veterinary,1
2,Churro_The_fish_Girl,makes want become vet,Veterinary,2
3,abarthch,see course changing variables dimension change...,MysteriumNetwork,3
4,abarthch,mean far aware people already use torrent prot...,MysteriumNetwork,3
...,...,...,...,...
11185,Real_Use_3216,earn production everything touch preventatives...,Veterinary,3275
11186,Real_Use_3216,focus practicing good medicine surgery efficie...,Veterinary,3275
11187,Real_Use_3216,hard,Veterinary,3275
11188,Real_Use_3216,am crossfit its first thing every workday,Veterinary,3275


In [62]:
user_separated_comment_df.tail(30)

Unnamed: 0,username,comment,subreddit,former_index
11160,daliadeimos,good point,Veterinary,3269
11161,daliadeimos,clinic work collects payment euths first clien...,Veterinary,3269
11162,daliadeimos,euthanized cat week able void bladder own hes ...,Veterinary,3269
11163,Unhappy_Passenger_86,one also coming difficult situation trying pur...,Veterinary,3270
11164,B1u3Chips_,im looking applying veterinary nursing college...,Veterinary,3271
11165,B1u3Chips_,could study college veterinary nursing univers...,Veterinary,3271
11166,Daktari2018,good sticking standards care caring enough spe...,Veterinary,3272
11167,Daktari2018,wonderful wanting know more knowing learn driv...,Veterinary,3272
11168,Daktari2018,its tough come tight group esp area its hard t...,Veterinary,3272
11169,Daktari2018,call company tell length time theyll still gua...,Veterinary,3272


In [63]:
user_separated_comment_df.iloc[11169]["comment"]

'call company tell length time theyll still guarantee vaccine'

Next step is to identify comments where at least a word realated to any of doctor, medicine, veterinarian, animal, hospital or clinic was mentioned

The simiarity index to be used would be cosine similarity and a threshold of 0.7 would be used

To do this, the words would need to be embedded. I would be making use of Glove embedding

First thing is to extract the embedding vectors

In [64]:
embeddings_index = dict()

with open("glove.6B.100d.txt", encoding="utf8") as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

In [65]:
print(f"{len(embeddings_index)} words found")

400000 words found


Next step is to define the vocabulary size

Since we are only concerned with this dataset, the number of unique words in this dataset would form the vocabulary size

In [66]:
words_in_dataset = set()

for comment in user_separated_comment_df["comment"]:
    for word in comment.split():
        words_in_dataset.add(word.lower())

In [67]:
vocabulary_size = len(words_in_dataset)

first_word = list(embeddings_index.keys())[0]
embedding_dim = len(embeddings_index[first_word])

The next step is to tokenize the words

For this the words in Glove embedding would be used as the trainin

In [68]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(words_in_dataset)

In [69]:
vocabulary_size

18084

The next step is to create embedding matrix

In [70]:
embedding_matrix = np.zeros((vocabulary_size, embedding_dim))

for word, index in tokenizer.word_index.items():
    if index <= vocabulary_size:
        embeddig_vector = embeddings_index.get(word)
        if embeddig_vector is not None:
            embedding_matrix[index-1] = embeddig_vector

In [71]:
print(embedding_matrix.shape)

(18084, 100)


Now lets check the comments containing at least a word that is related to any of doctor, medicine, veterinarian, animal, hospital or clinic as mentioned earlier

In [72]:
base_words = ["doctor", "medicine", "veterinarian", "vet", "animal", "hospital", "clinic", "surgery", "treat"]
base_words_embeddings = []
base_words_magnitudes = []

for base_word in base_words:
    embedding = embeddings_index.get(base_word)
    base_words_embeddings.append(embedding)
    
    magnitude = np.sqrt(np.sum(np.square(embedding)))
    base_words_magnitudes.append(magnitude)

In [73]:
def check_cosine_similarity(text, base_words_embeddings, base_words_magnitudes):
    similarity_list = []
    
    for word in text.split():
        word_embedding = embeddings_index.get(word.lower())
        if np.any(word_embedding):
            word_magnitude = np.sqrt(np.sum(np.square(word_embedding)))

            for i in range(len(base_words_embeddings)):
                similarity = np.dot(word_embedding, base_words_embeddings[i]) / (word_magnitude * base_words_magnitudes[i])
                similarity_list.append(similarity)
    return np.max(similarity_list)

In [74]:
def check_cosine_similarity2(text, base_words_embeddings, base_words_magnitudes):
    similarity_list = []
    word_embeddings = []
    
    for word in text.split():
        word_embedding = embeddings_index.get(word.lower())
        if np.any(word_embedding):
            word_embeddings.append(word_embedding)
    
    if np.any(word_embeddings):
        word_embedding = np.mean(word_embeddings)
        word_magnitude = np.sqrt(np.sum(np.square(word_embedding)))
        for i in range(len(base_words_embeddings)):
            similarity = np.dot(word_embedding, base_words_embeddings[i]) / (word_magnitude * base_words_magnitudes[i])
            similarity_list.append(similarity)
    return np.max(similarity_list)

Checking for similarities does not seem to help

Now, some comments would be labbelled by hand

The approach I would be using for this is to have maximum of 30 comments (or less where comments are not up to 30) from each subreddit to form my training set

Lets take a loo at the number of comments in each subreddit

In [75]:
subreddit_count = user_separated_comment_df['subreddit'].value_counts()
subreddit_count

subreddit
Veterinary          7451
MysteriumNetwork    3506
orchid                28
HeliumNetwork         14
vet                    9
medicine               8
Name: count, dtype: int64

I would be starting from the bottom up to the top

In [76]:
# check all comments in the specified subreddit and label
subreddit = "medicine"
indices = user_separated_comment_df[user_separated_comment_df["subreddit"] == subreddit]["former_index"]
comment_list = reddit_user_df.iloc[indices]["comments"]
for i in indices.values:
    print(f"Comment of user with index {i}")
    print(comment_list[i])
    print()
    print()

Comment of user with index 1438
The elderly man is recovering from hip replacement surgery.


Comment of user with index 1439
The teenage boy was treated for a sports injury.


Comment of user with index 1440
The woman is expecting a baby and visited for a prenatal check-up.


Comment of user with index 1441
I just performed an appendectomy on a patient.


Comment of user with index 1442
The patient’s blood pressure is stabilizing after the medication.


Comment of user with index 1443
The MRI scan revealed a tumor in the patient’s brain.


Comment of user with index 1444
I prescribed antibiotics for the patient’s bacterial infection.


Comment of user with index 1445
The patient’s EKG showed signs of a possible heart attack.




All users in the "medicine" subreddit are medical doctors

It should be noted that these users all have single comments

In [77]:
# check all comments in the specified subreddit and label
subreddit = "vet"
indices = user_separated_comment_df[user_separated_comment_df["subreddit"] == subreddit]["former_index"]
comment_list = reddit_user_df.iloc[indices]["comments"]
for i in indices.values:
    print(f"Comment of user with index {i}")
    print(comment_list[i])
    print()
    print()

Comment of user with index 1446
The puppy was brought in for its first round of vaccinations.


Comment of user with index 1447
The adult horse was treated for laminitis.


Comment of user with index 1448
The juvenile bird was treated for a wing injury.


Comment of user with index 1449
The senior cat was brought in for a routine health check-up.


Comment of user with index 1450
I just performed a neutering procedure on a cat.


Comment of user with index 1451
The dog’s condition is improving after the deworming treatment.


Comment of user with index 1452
The X-ray showed a fracture in the bird’s wing.


Comment of user with index 1453
I prescribed flea prevention medication for the puppy.


Comment of user with index 1454
The horse’s blood test revealed signs of equine infectious anemia.




All users in the "vet" subreddit are Veterinarians

It should be noted that these users all have single comments

In [79]:
# check all comments in the specified subreddit and label
subreddit = "HeliumNetwork"
indices = user_separated_comment_df[user_separated_comment_df["subreddit"] == subreddit]["former_index"].unique()
comment_list = reddit_user_df.iloc[indices]["comments"]
for i in indices:
    print(f"Comment of user with index {i}")
    print(comment_list[i])
    print()
    print()

Comment of user with index 93
I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was compromised. They told to not click in any links|Get rid of windows 😂|I’m getting this too|They just told one of the accounts was 

All users in the "HeliumNetwork" subreddit are neither doctors nor veterinarians

It should be noted that some users have multiple comments and some have repated comments. Reapeted comments would be taken care of during preprocessing

In [80]:
# check all comments in the specified subreddit and label
subreddit = "orchid"
indices = user_separated_comment_df[user_separated_comment_df["subreddit"] == subreddit]["former_index"].unique()
comment_list = reddit_user_df.iloc[indices]["comments"]
for i in indices:
    print(f"Comment of user with index {i}")
    print(comment_list[i])
    print()
    print()

Comment of user with index 205
Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen!|Yassss Queen! | Yassss Queen! ---> /r/MysteriumNetwork/comments/n0es73/the_wait_is_over_mysterium_network_decentralised/gw7eeyk/ | Yea. I am excited about this project too. I am happy to see their [collaboration with Storj](https://mysterium.network/blog/mysterium-and-storj-labs-join-forces/). I am still reviewing their whitepaper and comparing it to the Orchid Protocol. If any one has any cliff's notes I'd appreciate it :). I am interested in using VPNs for increased DAPP security (DDoS attacks). ---> /r/MysteriumNetwork/comments/mrg7bb/just_bought_my_first_myst_tokens_and_feel/guo6ytu/ | My first observation is that Mysterium highlights the intention of splitting up packets to traverse different paths along the VPN network which protects a user from a malicious

All users in the "orchid" subreddit are neither doctors nor veterinarians

It should be noted that some users have multiple comments and some have repated comments. Reapeted comments would be taken care of during preprocessing

The number of users and comments in the remaining two sebreddits are much. Therefore, only 50 comments of users of each category would be added to the training set

In [109]:
random.seed(123)

In [110]:
# randomly select 50 comments made by users in the MysteriumNetwork subreddit
MysteriumNetwork_subreddit_index = user_separated_comment_df[user_separated_comment_df["subreddit"] == "MysteriumNetwork"].index
selected_MysteriumNetwork = random.sample(list(MysteriumNetwork_subreddit_index), 50)

In [111]:
# randomly select 50 comments made by users in the Veterinary subreddit
Veterinary_subreddit_index = user_separated_comment_df[user_separated_comment_df["subreddit"] == "Veterinary"].index
selected_Veterinary = random.sample(list(Veterinary_subreddit_index), 50)

It is time to form my training set

In [146]:
train_indices = []
medicine_indices = list(user_separated_comment_df[user_separated_comment_df["subreddit"] == "medicine"].index)
train_indices.extend(medicine_indices)
vet_indices = list(user_separated_comment_df[user_separated_comment_df["subreddit"] == "vet"].index)
train_indices.extend(vet_indices)
HeliumNetwork_indices = list(user_separated_comment_df[user_separated_comment_df["subreddit"] == "HeliumNetwork"].index)
train_indices.extend(HeliumNetwork_indices)
orchid_indices = list(user_separated_comment_df[user_separated_comment_df["subreddit"] == "orchid"].index)
train_indices.extend(orchid_indices)
train_indices.extend(selected_MysteriumNetwork)
train_indices.extend(selected_Veterinary)

In [147]:
train_set_df = user_separated_comment_df.iloc[train_indices].copy()

In [148]:
train_set_df.loc[:, "Label"] = ""

In [149]:
train_set_df.head()

Unnamed: 0,username,comment,subreddit,former_index,Label
5564,test_doctor2,elderly man recovering hip replacement surgery,medicine,1438,
5565,test_doctor3,teenage boy treated sports injury,medicine,1439,
5566,test_doctor4,woman expecting baby visited prenatal check up,medicine,1440,
5567,test_doctor5,performed appendectomy patient,medicine,1441,
5568,test_doctor6,patients blood pressure stabilizing medication,medicine,1442,


In [150]:
train_set_df.reindex()

Unnamed: 0,username,comment,subreddit,former_index,Label
5564,test_doctor2,elderly man recovering hip replacement surgery,medicine,1438,
5565,test_doctor3,teenage boy treated sports injury,medicine,1439,
5566,test_doctor4,woman expecting baby visited prenatal check up,medicine,1440,
5567,test_doctor5,performed appendectomy patient,medicine,1441,
5568,test_doctor6,patients blood pressure stabilizing medication,medicine,1442,
...,...,...,...,...,...
9129,daabilge,tbh kind matter picking battles ideally we d p...,Veterinary,2692,
5321,_rosanna_,literally leaving clinic right similar problem...,Veterinary,1345,
4569,extinctplanet,applied internships jobs tab aza com also loca...,Veterinary,1072,
6477,socialdistraction,prices vet care going la its also harder get a...,Veterinary,1726,


I would label the comments checked above

In [155]:
train_set_df.loc[train_set_df["subreddit"] == "medicine", "Label"] = "Medical Doctor"

In [157]:
train_set_df.loc[train_set_df["subreddit"] == "vet", "Label"] = "Veterinarian"

In [158]:
train_set_df.loc[train_set_df["subreddit"] == "HeliumNetwork", "Label"] = "Other"

In [159]:
train_set_df.loc[train_set_df["subreddit"] == "orchid", "Label"] = "Other"

In [160]:
train_set_df["Label"].value_counts()

Label
                  100
Other              42
Veterinarian        9
Medical Doctor      8
Name: count, dtype: int64

Then, I will now label comments by users in the last two subreddits taking MysteriumNetwork first

For this step, I would print out the preprocessed comment in my training set and also the group of comment the comment is extracted from in raw (unprocessed) format because the preprocessed comment would usually lack meaning

In [179]:
for index in selected_MysteriumNetwork:
    former_index = user_separated_comment_df.iloc[index]["former_index"]
    print(f"Comment of user with index {index}")
    print(train_set_df.loc[index, "comment"])
    print()
    print()
    print(reddit_user_df.iloc[former_index]["comments"])
    category = input("Input user category")
    train_set_df.loc[index, "Label"] = category
    print()
    print()
    print("="*20)

Comment of user with index 454
hey sorry ot question safe run node cloud machine i ve oracle free tier i m thinking starting node oracle policies crypto mining strict i m secure it even know myst token mined classic proof work way


Hey, sorry for the OT question, but is it safe run a node on a Cloud machine? I've an oracle free tier and I'm thinking about starting a node here but oracle policies about crypto mining are very strict, so I'm not very secure to do it. (Even if I know that Myst token are not mined in the Classic proof of work way)|Hey, sorry for the OT question, but is it safe run a node on a Cloud machine? I've an oracle free tier and I'm thinking about starting a node here but oracle policies about crypto mining are very strict, so I'm not very secure to do it. (Even if I know that Myst token are not mined in the Classic proof of work way)|Hey, sorry for the OT question, but is it safe run a node on a Cloud machine? I've an oracle free tier and I'm thinking about startin

Input user categoryOther


Comment of user with index 673
node public ip address fine


Use the internal IP address.|Use the internal IP address.|No, you can only host 1 node per public IP address. You can have as many nodes as you wish but again you will need to have that many public IP addresses.|If you at all worried then just enable B2B services only then your node will only transfer business traffic.|Your nodes are working it's a visual bug in the display of monitoring failed.|Yes.|Kryptex handles your payments, I would check with them.|The token have been sent to your selected wallet: [https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns](https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns)

&#x200B;

Doesn't look like you have configured Metamask correctly. 

Please note you must first enable your wallet under the Polygon Network and then import custom token MYST from the Polygon Network in order to see the receive

Input user categoryOther


Comment of user with index 5636
use see activity dashboard mysteriumm wireshark


what do you use to see their activity? is there a dashboard on mysteriumm for that or do you Wireshark ---> /r/MysteriumNetwork/comments/xeow68/should_i_change_my_ip_address_every_now_and_then/iojpaab/ | thank you I saw this once and couldn't find it again ---> /r/MysteriumNetwork/comments/x50jy8/as_a_node_runner_what_happens_if_i_format_my_sd/imyhtlt/
Input user categoryOther


Comment of user with index 2514
thats great hope mntd also make thing


We like it fresh and simple Good job!|Thats great! I hope MNTD also make the same thing!!!|We like it fresh and simple Good job!|Thats great! I hope MNTD also make the same thing!!!|We like it fresh and simple Good job!|Thats great! I hope MNTD also make the same thing!!!|We like it fresh and simple Good job!|Thats great! I hope MNTD also make the same thing!!!|We like it fresh and simple Good job!|Thats great! I hope MNTD also make 

Input user categoryOther


Comment of user with index 757
restart system running mysterium try connecting still working check mysterium log say ip port number using


Use the internal IP address.|Use the internal IP address.|No, you can only host 1 node per public IP address. You can have as many nodes as you wish but again you will need to have that many public IP addresses.|If you at all worried then just enable B2B services only then your node will only transfer business traffic.|Your nodes are working it's a visual bug in the display of monitoring failed.|Yes.|Kryptex handles your payments, I would check with them.|The token have been sent to your selected wallet: [https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns](https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns)

&#x200B;

Doesn't look like you have configured Metamask correctly. 

Please note you must first enable your wallet under the Polygon Network and the

Input user categoryOther


Comment of user with index 6026
great thanks


Great thanks ---> /r/MysteriumNetwork/comments/s9uwn6/does_mystberry_act_like_a_normal_linux_distro/htr8sm6/ | NetworkChuck actually did a video setting up a node on a headless 32-bit rasbianOS install, I wound up following that to add it to my pi that was already running pi-hole and my other apps. Wound up not using the mystberry image.

[https://www.youtube.com/watch?v=El19X-zHt-c](https://www.youtube.com/watch?v=El19X-zHt-c) ---> /r/MysteriumNetwork/comments/s9uwn6/does_mystberry_act_like_a_normal_linux_distro/humpmle/
Input user categoryOther


Comment of user with index 381
yeah gonna hack right already smaller range


I have 50000 to 60000 udp change it in settings and router. ---> /r/MysteriumNetwork/comments/137fcnb/new_to_this_i_just_wanna_have_reassurance_and/jixryxz/ | If you go to the website where u can view the nice status and all. There are settings which u can adjust accordingly.

U could be conne

Input user categoryOther


Comment of user with index 3154
also ran scprime node started selling licenses node runners me feels bit scammy in compensate lack paying costumers lets generate revenue charging owners node yearly license fee


You can limit traffic to verified business accounts only (used for web/data scraping etc) instead of allowing anyone to use your node. Can be set directly from your node’s dashboard. ---> /r/MysteriumNetwork/comments/12x217l/questions_on_using_and_giving_bandwidth/jhi6w6m/ | Looks like Powershell isn’t installed on your Windows machine? ---> /r/MysteriumNetwork/comments/12ie1uf/i_keep_running_into_this_error/jfte0b4/ | https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.3 ---> /r/MysteriumNetwork/comments/12ie1uf/i_keep_running_into_this_error/jfte354/ | Windows defender flags quite a few VPN clients as malware.

https://www.malwarebytes.com/blog/news/2017/09/explained-false-positives/amp -

Input user categoryOther


Comment of user with index 6129
wont trouble this nothing new tor like dont worry


Yep people were using torrent p2p and no one got hacked… Besides if you are running it on docker which windows users do… so nothing to worry about . 

Still unsure use an old laptop and make it separate… Your funds are safu…

P2P and Decentralisation makes hacking impossible.. Don’t worry. ---> /r/MysteriumNetwork/comments/oqktx6/can_we_trust_mysterium_network_node_provider/h6end5n/ | You won’t be in trouble….This is nothing new..Tor is just like that ..Don’t worry ---> /r/MysteriumNetwork/comments/oqktx6/can_we_trust_mysterium_network_node_provider/h6gcyzr/ | Good thing I am not in west ---> /r/MysteriumNetwork/comments/oqktx6/can_we_trust_mysterium_network_node_provider/helcdc7/ | Good thing  I am from india ---> /r/MysteriumNetwork/comments/oqktx6/can_we_trust_mysterium_network_node_provider/heted87/
Input user categoryOther


Comment of user with index 452
run node hyper v

Input user categoryOther


Comment of user with index 2106
ok people proper agreement isp wondering case really nice isp


Ok so people are doing it 
With out proper agreement from the isp 
I was wondering if that was the case or really nice ISP 😅 ---> /r/MysteriumNetwork/comments/stiz99/best_isp_in_australia_to_run_a_myst_node/hx8cxk2/
Input user categoryOther


Comment of user with index 3282
variable apr


If u jailbreak it u might be able to install that apk|Your fire stick|iOS
This is going to be 👍|YES was waiting for this for a while|If u jailbreak it u might be able to install that apk|Your fire stick|iOS
This is going to be 👍|YES was waiting for this for a while|If u jailbreak it u might be able to install that apk|Your fire stick|iOS
This is going to be 👍|YES was waiting for this for a while|If u jailbreak it u might be able to install that apk|Your fire stick|iOS
This is going to be 👍|YES was waiting for this for a while|If u jailbreak it u might be able to install that apk|Y

Input user categoryOther


Comment of user with index 3937
latest update may see actual github activity yet team working r d researching setting frameworks researching testing required libraries etc result current stage provide us clear knowledge technologies use use order create functioning payments prepare frameworks top work done


From the latest update:

While you may not see any actual GitHub activity yet — that is because our team is working on the R&D, researching and setting up frameworks, researching and testing required libraries, etc... The result of current stage will provide us with a clear knowledge which technologies to use and how to use them in order to create functioning payments and to prepare frameworks on top of all the work that is being done.|From the latest update:

While you may not see any actual GitHub activity yet — that is because our team is working on the R&D, researching and setting up frameworks, researching and testing required libraries, etc... The r

Input user categoryOther


Comment of user with index 2632
well put though want stress dissolve reassemble level privacy definitely goal going happen night previously communicated plan get steps first step deploying initial network nodes testing sorts use cases it discovery connected nodes establishing vpn connection detected nodes payments services provided etc touching deleting posts didnt come us enigma whole team what happened deleted posts one team this believe person wrote comments deleted themselves


I am excited to see what we as a team can deliver in the close/far future. We haven't yet finally planned Q2, but we are looking to work with larger pool of developers and also looking to add CMO and CFO to the team preparing for future developments.
Also looking forward to product advancements as we are getting ready to open the network to 3rd party Service Providers(Node operators) and 3'rd party apps (currently we are getting ready to release our I'st app - "Mysterion")

As for 

Input user categoryOther


Comment of user with index 5655
running usb port router


I found a solution blindly googling every router setting and found a solution. I set the node in the DMZ, have I made a horrible mistake? ---> /r/MysteriumNetwork/comments/xdr4e1/help/iozwjec/ | This is the very reason the network needs to exist! ---> /r/MysteriumNetwork/comments/xdodtt/notice_of_action_under_the_digital_millennium/iocdtb9/ | Awesome, exactly what I was hoping to get ---> /r/MysteriumNetwork/comments/ww9ym3/what_are_your_tips_for_new_node_runners/ilncom7/ | I am running it on the usb port in the router ---> /r/MysteriumNetwork/comments/ww9ym3/what_are_your_tips_for_new_node_runners/ilkfese/
Input user categoryOther


Comment of user with index 674
currently worked that s looks now


Use the internal IP address.|Use the internal IP address.|No, you can only host 1 node per public IP address. You can have as many nodes as you wish but again you will need to have that many public IP addre

Input user categoryOther


Comment of user with index 3440
describing possible outcomes likely ones even extremely unlikely event authorities contacted traffic ip node interface including connectivity logs huge amounts proof running vpn node way somebody driving speed limit possible caught happen time doesnt personally time im even part country node prove wasnt present activity occurred crypto world referred fud


You are describing possible outcomes, but not likely ones.  
Even in the extremely unlikely event that authorities contacted you about traffic at your IP, you do have your node interface, including connectivity logs,  and huge amounts of proof that you are running a VPN node.  
In the same way as somebody driving above the speed limit, it is possible you will be caught and it does happen, but 99.99% of the time it doesn’t.  
For me personally, 99% of the time I’m not even in the same part of the country as my node, so can prove I wasn’t present when an activity occurred.  
In

Input user categoryOther


Comment of user with index 2005
thats strange as refreshing wizard let set node free also dont think get answer support monday ive seen option unlocked basically deleted everything system node settings ive made new docker container wont weird errors please let know


Hi, I have asked about the same issue. I think there is a general issue with the payment system as I have tried to register a new node today myself and for all 3 transactions, the banks authorized them and they sent the money but the platform gave me the "payment error" message. I am currently waiting for a response after sending a message to them and I have also been told that however, every day at 11 am (Greenwich time) you can use the free registration method for your nodes.  


I really hope this helps :) ---> /r/MysteriumNetwork/comments/t1imnv/anyone_can_help_me_with_my_02_myst_i_tried_buying/hygiwc8/ | That’s very strange as, after refreshing, the wizard let me set my node up for free. Als

Input user categoryOther


Comment of user with index 2682
users choose mysterium avoid usual issues come using known vpn addresses running node trough vpn basically ruining whole point someone needed that would using vpn directly


Most users who choose Mysterium do so to AVOID usual issues that come with using known VPN addresses. So by running your node trough VPN you are basically ruining the whole point - if someone needed that, they would be using such VPN directly.|MysteriumVPN 2.0 has some disappointingly simple UI with not much customization. For example I can unmark "refresh IP address" if I want to keep connecting to same IP, but I can not save and connect to specific nodes I liked. 

It also flat out changed my IP in one of the sessions without saying so, and UI did not see IP change either - I just had connection interruption in the browser and my side window that monitors IP popped to show that I have new IP. No indication of that drop and reconnection to another IP was a

Input user categoryOther


Comment of user with index 733
apr variable


Use the internal IP address.|Use the internal IP address.|No, you can only host 1 node per public IP address. You can have as many nodes as you wish but again you will need to have that many public IP addresses.|If you at all worried then just enable B2B services only then your node will only transfer business traffic.|Your nodes are working it's a visual bug in the display of monitoring failed.|Yes.|Kryptex handles your payments, I would check with them.|The token have been sent to your selected wallet: [https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns](https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns)

&#x200B;

Doesn't look like you have configured Metamask correctly. 

Please note you must first enable your wallet under the Polygon Network and then import custom token MYST from the Polygon Network in order to see the received tokens in you

Input user categoryOther


Comment of user with index 404
upnp always on using last year even income good yesterday reinstalled whole thing paid get registered still same dont even change anything router cause took way much time configure find proper channels area all


same is happening to me too. From IN. no active sessions, not a single dime was earned last month.|no the internal ip was set specifically on dhcp client list. but that port forwarding I haven't done anything in the router. neither did I change anything in the router. It was getting connected to clients automatically.|yes, there is an upnp settings in router page and it has been turned on from the start.|I added my pi local ip to DMZ of my router after reinstalling and registering again. Now everything is working fine.|same is happening to me too. From IN. no active sessions, not a single dime was earned last month.|no the internal ip was set specifically on dhcp client list. but that port forwarding I haven't done anyt

Input user categoryOther


Comment of user with index 3778
also mysterium s app rating inflated worker mysterium could get sales get profit turn salary s may increased c per review spending bucks might good extra work money


Some aren’t residential (try reconnecting till you get one) and some are residential yet still detected as a vpn, and you won’t be traced back ;)|Some aren’t residential (try reconnecting till you get one) and some are residential yet still detected as a vpn, and you won’t be traced back ;)|Some aren’t residential (try reconnecting till you get one) and some are residential yet still detected as a vpn, and you won’t be traced back ;)|Some aren’t residential (try reconnecting till you get one) and some are residential yet still detected as a vpn, and you won’t be traced back ;)|Some aren’t residential (try reconnecting till you get one) and some are residential yet still detected as a vpn, and you won’t be traced back ;)|Some aren’t residential (try reconnecting ti

Input user categoryOther


Comment of user with index 833
may wintun get


Use the internal IP address.|Use the internal IP address.|No, you can only host 1 node per public IP address. You can have as many nodes as you wish but again you will need to have that many public IP addresses.|If you at all worried then just enable B2B services only then your node will only transfer business traffic.|Your nodes are working it's a visual bug in the display of monitoring failed.|Yes.|Kryptex handles your payments, I would check with them.|The token have been sent to your selected wallet: [https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns](https://polygonscan.com/address/0x6A548987399FAbBbAbb667F59a69A7555F6448CC#tokentxns)

&#x200B;

Doesn't look like you have configured Metamask correctly. 

Please note you must first enable your wallet under the Polygon Network and then import custom token MYST from the Polygon Network in order to see the received tokens in y

Input user categoryOther


Comment of user with index 5725
home hosted vpns get blocked tried friend lives dubai quite good blocking commercial vpns never blocked fastweb home ip milan


If they are home hosted VPNs, you won't get blocked. Tried with my friend who lives in Dubai, they are quite good in blocking commercial VPNs, but never blocked my Fastweb home IP here in Milan. ---> /r/MysteriumNetwork/comments/wq7761/is_myst_vpn_good_at_being_undetected/ikkvi56/ | Roughly 3.5 MYST. Very bad. ---> /r/MysteriumNetwork/comments/tuxlzk/hi_my_last_two_weeks_of_activity_from_a_node/i38tnlh/ | I totally agree with you. After hosting a node for more than two months, for me the risks are higher than the gain. I will give the project other 4 months to see if something changes, then I will quit. ---> /r/MysteriumNetwork/comments/tiq12f/mysterium_concerns/i1gm7fd/
Input user categoryOther


Comment of user with index 309
hope helps you cannot understand issue kernel supposed docker skips kind is

Input user categoryOther


Comment of user with index 1831
know raspberry pi b can maybe pi zero w sure half ram b would issue


Sorted

Clone the github page     

[https://github.com/NebraLtd/myst-rockpi](https://github.com/NebraLtd/myst-rockpi)

cd myst-rockpi/scripts

sudo bash install.sh|Sorted

Clone the github page     

[https://github.com/NebraLtd/myst-rockpi](https://github.com/NebraLtd/myst-rockpi)

cd myst-rockpi/scripts

sudo bash install.sh|Sorted

Clone the github page     

[https://github.com/NebraLtd/myst-rockpi](https://github.com/NebraLtd/myst-rockpi)

cd myst-rockpi/scripts

sudo bash install.sh|Sorted

Clone the github page     

[https://github.com/NebraLtd/myst-rockpi](https://github.com/NebraLtd/myst-rockpi)

cd myst-rockpi/scripts

sudo bash install.sh|Sorted

Clone the github page     

[https://github.com/NebraLtd/myst-rockpi](https://github.com/NebraLtd/myst-rockpi)

cd myst-rockpi/scripts

sudo bash install.sh|Sorted

Clone the github page     

[https://g

Input user categoryOther


Comment of user with index 3337
first week earnings slow depends location said time increases exponentially like first week made worry electricity cost also download honeygain iproyal pawns app peerprofit increase profit tbh honeygain slow always take payout crypto jumptask think cross minimum paypal peerprofit good download use refferal somewhere these referrals benefit greatly


17.45% for me ---> /r/MysteriumNetwork/comments/12nnw0i/question_on_staking/jggtkf9/ | it happened to me too it was working fine a week ago but suddenly connections dropped and everything

what i did was restarded everything 

the router the device on which the node was installed

then got into the router settings: ensured that all was correct

port forwarding(UDP:range given in walkthrough) then checked if all the settings were as per the guide

then clear the DNS cache(change it to [1.1.1.1](https://1.1.1.1)  {my advice-it is cloudfare DNS-fastest}

then again started up the node


Input user categoryOther


Comment of user with index 1718
agreed running server ur place right vps


Agreed.. so your running the server at ur place right ? Not a VPS ---> /r/MysteriumNetwork/comments/ta0roo/how_am_i_mining_so_many_different_cryptos_on_one/ilmbbed/
Input user categoryOther


Comment of user with index 2796
can t


No you can't ---> /r/MysteriumNetwork/comments/14qe3x4/can_i_use_vpn_on_mysterium_node/jqnn5il/ | So it's a data center node right  first of all have you purchased diff ips from provider? and have you setupped seperate network for all nodes ? Because you can't have more than 1node/ip

2ndly what are you using to host it docker? Seperate vms? 

From what i can deduce ita a routing issue ---> /r/MysteriumNetwork/comments/1035ypu/connection_failed/j2yop7c/ | First of all hello fellow selfhoster :) so it worked before ? 

Btw if you are saying you cloned original have you regenerated their identity again? ---> /r/MysteriumNetwork/comments/1035ypu/connection_fail

In [180]:
train_set_df["Label"].value_counts()

Label
Other             92
                  50
Veterinarian       9
Medical Doctor     8
Name: count, dtype: int64

Now, for the final sunreddit, Veterinary

For this part, I would like to remove repeated comments when outputting the original group of comments to make the task easier

In [186]:
for index in selected_Veterinary:
    former_index = user_separated_comment_df.iloc[index]["former_index"]
    print(f"Comment of user with index {index}")
    print(train_set_df.loc[index, "comment"])
    print()
    print()
    print(remove_repeated_sentence(reddit_user_df.iloc[former_index]["comments"]))
    category = input("Input user category")
    train_set_df.loc[index, "Label"] = category
    print()
    print()
    print("="*20)

Comment of user with index 1413
looking euthanasia house calls scared general practice work think bad idea want euthanasia house calls think enjoy mobile work limited scope i e one thing good thing second think avoiding crazy clients exchanging crazy clients grief think avoiding stress needing learn bunch stuff there s thing routine euthanasia routine ear infection routine surgery create possibly stress less stress every owner s worst day bit counsellor well vet words avoid things that s want do providing euthanasia is think one best last gifts give patients we re blessed able provide it


Whereas you say they're nitpicking, I'm wondering whether they're really helping  you to become better at what you do - giving you advice and correction to teach you things (because you're education will go on for decades).   Only you know if their attitude is respectful or disrespectful.....The act of nit picking what you're doing isn't in and of itself disrespectful.

You worked hard to get your DV

Input user categoryVeterinarian


Comment of user with index 10036
mark mine one colour nailpolish wreck instrument removes needed also come washing etc


I won't repeat what was already said, but just wanted to suggest getting some experience in the field before diving head first into this path. Working in vet is sometimes not what people think it is, and I'd hate for you to do all this work just to discover this isn't something you like after all. If after working/volunteering in the field you still want to go ahead, then absolutely it's possible! | Keep in mind, regarding tuition, you may end up needing to go to a school other than the one you have free tuition for vet school, so I wouldn't write that aspect off altogether | I mark mine with one colour of nailpolish. Doesn't wreck the instrument and can be removes if needed but also doesn't come off with washing etc | This might not be true for vet-related jobs so take this with a grain of salt, but when I moved to the states with m

Input user categoryOther


Comment of user with index 6547
thank much super helpful really appreciate


2025 graduate here. Given that the AVMA released the 2023 average new grad salary being $133k for private practice (forgive me if my figure is wrong), would anyone be willing to give me a reasonable new grad base salary (including production) for a private NON corporate practice in a HCOL area? I will be (hopefully) signing somewhere in the next month, and would like to hear what an appropriate salary would be to be better prepared. Thank you all for sharing! | Thank you so much, super helpful! Really appreciate it | Thank you so much! Will also be taking the NAVLE during the Nov-Dec 2024 window and am seeking accommodations as well. Do you happen to know the general time frame of when the registration window begins? You are so right… it is a great idea to have the documentation ready. I am just wondering when that should be! | Thank you so much!
Input user categoryOther


Comment of

Input user categoryMedical Doctor


Comment of user with index 10466
surgery pregnancy totally safe long equipment date functional youre comfortable talk doctor boss see compromise made


I did surgery through my pregnancy. Totally safe as long as all equipment is up to date and functional. 
But if you’re not comfortable, talk with your doctor and your boss and see if a compromise can be made.. | Reasons I didn’t bother to specialize. It was much less stressful to go straight into general practice with fixed hours and a competitive salary than going for an exhausting underpaid internship. Where I did my clinical rotations, the internship salary was like 32,000 a year. With our outrageous student loans, no one could live on that. It’s ridiculous.

I’ve always thought since I was in vet school that the system is broken, but I don’t think it’ll ever be fixed. | Fellow Rossie here 

I hated fourth year clinics. I never thought about quitting because frankly it just wasn’t an option. I was 

Input user categoryOther


Comment of user with index 6316
thanks much advice


Thanks so much for your advice.
Input user categoryOther


Comment of user with index 599
doctors working vet schools also research addition clinical teaching duties lot willing let students work lab help current projects thats way get name something also going specialty youre interested radiology case asking would interested mentoring helping write case report way shorter easier full research study essentially find grown up lol help get started edit add shouldve assigned advisor started vet school probably help find project jump


Steps to becoming a Radiologist:

Vet school:
x. Be within the top 1/3 of your class in Vet school (4 years)
x. Try and get published while in vet school (even just a case report is incredibly helpful)
x. Get to know your radiology department in vet school; if at all possible try to get a student worker position in the department. Ask questions, don't be annoying, read Thrall tex

Input user categoryOther


Comment of user with index 10040
dont enjoy medicine happy veterinarian think its important evaluate want vet is animals instead find similar career suited passions veterinarians take ton biology youre that fun time school


If you don’t enjoy medicine, you will not be happy as a veterinarian. I think it’s important to evaluate why you want to be a vet (is it the animals?) and instead find a similar career that it more suited for your passions. Veterinarians take a ton of biology, so if you’re not into that, you will not have a fun time in school.
Input user categoryVeterinarian


Comment of user with index 9032
thank much sharing experience truly appreciate its good perspective thank saying dog perfect couldnt understand happening last months especially punctuated aggressive episodes weeks vet prescribed fluoxetine see would help didnt everything got much worse hindsight see mentation changes lethargy offness blamed fluoxetine probably due whatever going hea

Input user categoryMedical Doctor


Comment of user with index 8968
neb scores still


Anyone else suffering WAY too much right now? The waiting game is absolutely horrible. I just feel so alone because I don't know anyone else taking the exam since I am a foreign vet and it's just been too much :( all I have is reddit
I have no idea if my scores really were correct, and I told my entire family I had passed... having a hard time coping. I know stressing about it won't make a difference, but I can't help it | Now they have taken my score down from the NEB CVMA portal... anyone else from Canada with the same issue? | I still have the no documents available message! | Canadians/international students: scores available through the NEB portal. | Yeah, I'm terrified honestly. Counting every minute and stressed out of my mind. | Did you just get this email? Or was it earlier in the day before all hell broke loose? | My NEB scores still aren't up :( | Well, what makes me calmer is that on the 

Input user categoryVeterinarian


Comment of user with index 11030
dont really like medicine want vet please stay far far away field


This. | You need to leave, ASAP. This is a classic toxic clinic, and they will burn you out, fast. Every clinic is NOT like this....please find another one for your own sanity. You are not stupid, they are assholes. | Everyone I work with is covered in tatts. If vetmed didn't allow it, they wouldn't have any staff. I personally would avoid face tattoos. I have a coworker who comes in with a new face tattoo every week, and while I believe self expression is important, imo it comes across tacky and unprofessional. Might get hate for that but just doesn't seem classy to me and doesn't help our cause of trying to convince owners that we know what we're doing. It's wrong to have stereotypes like that, but it still is what it is. | Your dosimeter badge measures the radiation you are being exposed to. In most places this is a legal requirement. If they don't h

Input user categoryVeterinarian


Comment of user with index 7171
biggest sticking point corporate medicine boss wants video chat weekly say im busy half time barely interact meanwhile privately owned practice worked previously owner entering exam room question doctors front clients time would want joking around afterward ya kidding non veterinarian behind desk different state setting expected patients day day much better its maybe soulless feel like opening own its either corporate relief me


This is my biggest sticking point for corporate medicine. My boss wants to have a video chat once weekly that I just say I’m too busy for half the time and I barely have to interact. Meanwhile, the privately owned practice I worked in previously had the owner entering the exam room to question doctors in front of clients all the time and then would want to be joking around afterward. Are ya kidding?  

The non-veterinarian behind a desk in a different state setting my expected patients/surgeries

Input user categoryMedical Doctor


Comment of user with index 1372
kind unit have pocus manufacturers online support online courses


Whereas you say they're nitpicking, I'm wondering whether they're really helping  you to become better at what you do - giving you advice and correction to teach you things (because you're education will go on for decades).   Only you know if their attitude is respectful or disrespectful.....The act of nit picking what you're doing isn't in and of itself disrespectful.

You worked hard to get your DVM, but they also worked hard to get their licence/registration, so don't minimize that.  Remember they're a part of your team, working towards the same goal, so respect their effort and input too.  My techs know more about some things than I do, and I ask their advice about those things when I need them.  

As for calling you "doctor", I personally don't see it as a big deal in regards to showing you respect - after all, you're not her doctor.   I ask all my

Input user categoryMedical Doctor


Comment of user with index 10680
leptospirosis urine rabies saliva big ones bites cause bacterial infections


Yeah, my biggest critique of a lot of these hospitals is a lack of teaching hospitals. My school has interns, residents, 4th years, 3rd years, and students from the Caribbean schools without a teaching hospital. We barely have the case load and staff to adequately teach these students. Adding more students in classes does nothing to remedy the need for more places to practice and hone clinical skills such as S/N, diagnosis, client communication, etc, etc, etc. Many of these schools have no plans to build a clinical hospital alongside. So their students will be sent in to hospitals that may not be able to adequately teach them. | I would go for cheapest. I went to a tiny state school for undergraduate, was one of the top picks for my first choice school and my friend got into Cornell. You can save money, and use your time wisely to get involv

Input user categoryOther


Comment of user with index 9684
french physician ren leriche every surgeon carries within small cemetery time time goes pray a place bitterness regret must look explanation failures make mistakes one perfect man ever walk earth things end well either focus good do patients help lives save


French physician René Leriche: "Every surgeon carries within himself a small cemetery, where from time to time he goes to pray-a place of bitterness and regret, where he must look for an explanation for his failures".

You will make mistakes.  There was only one perfect man to ever walk the earth, and things didn't end so well for Him either.

You have to focus on all the good you do, the patients you help, the lives you save. | Graduating from what?  High school, undergrad? | Your doctor would be very pleased to get an effusive 5-star google review.
Input user categoryMedical Doctor


Comment of user with index 10457
go whichever school leave least amount debt done


I li

Input user categoryVeterinarian


Comment of user with index 5321
literally leaving clinic right similar problem actually numerous problems one them new clinic excited


Even as a new grad, my coworkers call me Dr (last) or just (last) if we’re close. No one has ever even tried to call me by my first name? If they did I would gently and kindly correct them on my work preferences. Either it’s a weird workplace culture or you need to take some hints from experienced staff for them to respect you | Literally leaving a clinic right now for a similar problem (actually there are numerous problems this is just one of them). My new clinic is excited to have me :) | That offer sucks in my opinion. I wouldn’t take less than 100k and 15 days off. 20% production no negative accrual is good though. CE should not be deducted from your pay, most places it’s an additional 1-2k+ they reimburse you when you pay/travel for CE. Also licensing fees and test fees you can have then reimburse (NAVLE, state fe

Input user categoryOther


Comment of user with index 6477
prices vet care going la its also harder get appointment without long wait time


Prices for vet care are going up in LA, and it’s also harder to get an appointment without a long wait time.
Input user categoryOther


Comment of user with index 6494
technicians assholes you everyone new point career would speak managing director hospital manager bullying condescending comments need stopped said start looking kinder hospital setting gaining experience new grad would recommend doctor practice excellent collaboration grow skills best luck


The technicians are being assholes to you. Everyone is new at some point in their career. I would speak with the managing director or hospital manager. Their bullying, condescending comments need to be stopped. That being said you should start looking for a kinder hospital setting while you are gaining your experience. As you are a new grad, I would recommend a 3-5 doctor practice so you have e

In [187]:
train_set_df.head()

Unnamed: 0,username,comment,subreddit,former_index,Label
5564,test_doctor2,elderly man recovering hip replacement surgery,medicine,1438,Medical Doctor
5565,test_doctor3,teenage boy treated sports injury,medicine,1439,Medical Doctor
5566,test_doctor4,woman expecting baby visited prenatal check up,medicine,1440,Medical Doctor
5567,test_doctor5,performed appendectomy patient,medicine,1441,Medical Doctor
5568,test_doctor6,patients blood pressure stabilizing medication,medicine,1442,Medical Doctor


In [188]:
train_set_df["Label"].value_counts()

Label
Other             116
Veterinarian       24
Medical Doctor     19
Name: count, dtype: int64

In [189]:
train_set_df.to_csv("train_set.csv", index=False)

At this point is would be better to build the model on the 4 subreddit we have checked so far then predict the category for the remianing users who subscribe to the remaining 2 categories

## Model Building

For the model building, I would be using glove embedding matrix to embed the words

In [None]:
# loading glove word vectors (words embeddings) into dictionary
embedding_index = {}

with open('glove.6B.100d.txt', encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embedding_index[word] = coefs