# Step1: Extracting Keywords from Facebook Posts

In this section, we'll perform keyword extraction on a dataset of Facebook group posts. The steps include reading the data, preprocessing the content, and applying the YAKE (Yet Another Keyword Extractor) algorithm to extract keywords.

### Process Overview
- Start by loading the dataset from the CSV file `facebookgroups_posts.csv`.
- The content of each post is preprocessed using the `preprocess_tweet` function.
- The preprocessed content is then concatenated into a single document.
- YAKE is applied to extract keywords from the concatenated document.
- The extracted keywords are saved to a DataFrame and exported to a CSV file for further analysis.

**Note**: The 'is_keyword' column in the resulting DataFrame is left blank and should be filled by domain experts based on their knowledge.

In [None]:
import pandas as pd
import swifter
from utils import preprocess_tweet
import yake

In [None]:
facebook_posts_df = pd.read_csv('../data/facebookgroups_posts.csv')

In [None]:
facebook_posts_df['processed_content'] = facebook_posts_df['content'].swifter.apply(preprocess_tweet)

In [None]:
# apply yake
concatted_doc = ' '.join(facebook_posts_df['processed_content'].tolist())
yake_kw_extractor = yake.KeywordExtractor(n=3, top=2000)
yake_keywords = yake_kw_extractor.extract_keywords(concatted_doc)

In [None]:
yake_keywords_df = pd.DataFrame(yake_keywords, columns=['keyword', 'score'])

yake_keywords_df.to_csv('../data/intermediate/output/step_1_yake_keywords.csv', index=False)
yake_keywords_df['is_keyword'] = None # column added to be filled by domain experts later
yake_keywords_df.to_csv('../data/intermediate/input/step_2_twitter_keywords.csv', index=False)