
# **AI Society Final Project: ASU Event RecSys**



## Jupyter Notebooks and Python
For workshops, we typically use **Google Colab**, a service that allows you to run python code in a nice containerized environment. It even provides free GPU/TPU runtimes!
___

In Google Colab, you write code using **Jupyter notebooks**. Notebooks are comprised of text and code cells, which can be run by hitting `Shift + Enter`. For your convenience, here are a few more nifty keyboard shortcuts:

* `b`: New cell below
* `a`: New cell above


## Hardware Needed:
Any computer with access to the internet and web browser

* 👉🏻Link to Dataset:
https://drive.google.com/file/d/1UeoAi3Aab-0Gx-B62aGqzeL3GY6SkQBE/view?usp=sharing

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Let's import the data here.
data =

This dataset includes event details like name, description, location, host organization, and perks. Let's preprocess this data to make it suitable for recommendations.

In [None]:
# Let's print a few entries to see if data is properly loaded.


##**Exploring the Data**
Here, we’re answering questions like:

How many rows and columns does this data have?
Are there missing values?
What are the ranges of numerical data?

In [None]:
# Let's find out all the columns/features.


In [None]:
# Let's check the dimentions of the dataset.


Relevant features for output. (From consumers point of view)
`['event_name', 'datetime', 'description', 'host_org', 'event_perks', 'location']`

In [None]:
# Let's create another dataset named df for recommendation output.


In [None]:
df['event_name'].value_counts().head(10).plot(kind='bar', title='Top Events')
plt.show()

## Preprocessing

In [None]:
#Fill the na values with unknown in host_org, event_perks features in df dataframe (for df dataframe)
df['host_org']
df['event_perks']

In [None]:
# For data dataframe let's fill na with nothing to for making it string type.
data['host_org'] = data['host_org'].fillna('')
data.event_perks = data.event_perks.fillna('')
data.categories = data.categories.fillna('')

In [None]:
# Let's create a feature with all the essential information.


In [None]:
# Importing text preprocessing libraries.
import re
import string
!pip install contractions
import contractions
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')

In [None]:
def preprocess_text(data):

    # Convert to lowercase
    if isinstance(data, pd.Series):
        data = data.astype(str).apply(lambda x: x.lower())
    else:
        data = data.lower()

    # Remove HTML tags
    data = re.sub(r'<.*?>', '', data)

    # Remove URLs
    data = re.sub(r'https?://[^\s]+', '', data)

    # Remove mentions
    data = re.sub(r'@\w+', '', data)

    # Remove hashtags
    data = re.sub(r'#\w+', '', data)

    # Remove special characters and punctuation
    data = re.sub(r'[^a-zA-Z0-9\s]', '', data)

    # Remove punctuation
    data = data.translate(str.maketrans('', '', string.punctuation))

    # Remove digits
    data = ''.join([i for i in data if not i.isdigit()])

    # Remove extra whitespace
    data = re.sub(r'\s+', ' ', data.strip())

    # Performing contractions
    data = contractions.fix(data)

    # Remove stop words using NLTK
    stop = nltk.corpus.stopwords.words('english')
    data = ' '.join([x for x in data.split() if x not in (stop)])

    return data

In [None]:
# Apply this preprocess_text function to out information feature.


## **TF-IDF Vectorization**

*In this section, we will use the `TfidfVectorizer` from the `sklearn` library to convert our preprocessed text data into TF-IDF features. This method helps in representing the importance of words in the documents relative to the entire corpus.*

## **What is TF-IDF?**

*TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.*

### **Formula**

- **TF (Term Frequency)**:
  TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)

- **IDF (Inverse Document Frequency)**:
  IDF(t) = log(Total number of documents / Number of documents with term t in it)

- **TF-IDF**:
  TF-IDF(t) = TF(t) * IDF(t)

## **Using TfidfVectorizer**

*We use the `TfidfVectorizer` to transform the processed text into a TF-IDF matrix, which we then convert into a DataFrame.*

In [None]:
def recommned(user_input):

  # Preprocess user input


  # Combining ifidf from data.
  from sklearn.feature_extraction.text import TfidfVectorizer
  vectorizer = TfidfVectorizer(max_features=10000)

  # Fit the vectorizer on the combined text data


  # Transform the user input


  # Transform the text data


  # Calculate cosine similarity
  from sklearn.metrics.pairwise import cosine_similarity
  cosine_sim = cosine_similarity(user_ifidf, df_tfidf)

  # Get top 5 recommendations
  top_indices = cosine_sim.argsort().flatten()[-5:][::-1]
  recommendations =
  return recommendations

## Finally finished! Let's find some interesting events.

In [None]:
recommned('I like music, free food, and networking')

Let's try another prompt: `As a business student, I’m interested in networking events, resume workshops, and opportunities to connect with recruiters. Any weekday events that offer chances to meet professionals or get career tips?`

Let's try another prompt: `I’m a computer science student and like tech events and workshops where I can learn new skills and meet people in my field. Are there any good events on campus?"`

----------------------------
**You can try another prompts of your choice and play with the model**