# LinkedIn Views Analytics 🔍

LinkedIn lacks comprehensive content analytics, only providing insights on individual posts.

But what if we crave an overview of our posts' performance (likes, views, comments) over the past month?

Solution? A Python program to visualize trends in post views over recent days. 📊 Here's a peek at the results:

[LinkedIn Views Analytics Notebook](https://www.kaggle.com/code/zulqarnainalipk/linkediln-views-analytics/notebook)

---

## Table of Contents

- [Introduction](#introduction)
- [Solution Overview](#solution-overview)
- [How to Use](#how-to-use)
- [Contributing](#contributing)
- [Feedback](#feedback)

## Introduction

LinkedIn provides limited content analytics, focusing primarily on individual post metrics. However, users often desire a broader perspective on their posts' performance, encompassing metrics like likes, views, and comments over a longer period. This README introduces a Python program designed to address this need by visualizing trends in post views over recent days.

## Solution Overview

This Python program utilizes Selenium, a portable framework for testing web applications, to navigate LinkedIn and extract relevant data. By analyzing the views of posts over time, users can gain insights into their content's performance trends.

## How to Use

To utilize this program effectively, follow these steps:

1. **Install Dependencies:** Ensure you have Python installed on your system along with the necessary libraries, particularly Selenium.
   
2. **Run the Program:** Execute the Python script provided in the notebook or download and run it locally.

3. **Input LinkedIn Credentials:** Provide your LinkedIn username and password when prompted to access the desired data.

4. **Review Results:** Once the program completes its execution, review the generated visualizations to analyze post view trends.

## Contributing

Contributions to enhance the functionality or usability of this program are welcome! Please fork the repository, make your changes, and submit a pull request. Ensure your code adheres to best practices and includes appropriate documentation.

## Feedback

Your feedback is essential for the continuous improvement of this project. If you have any comments, suggestions, or encounter any issues, please don't hesitate to reach out. You can contact the project owner via email at [zulqar445ali@gmail.com](mailto:zulqar445ali@gmail.com).

🔧 **1. Installing Selenium:**
To begin, I install Selenium, a powerful tool for automating web testing tasks.

🤔 **What's Selenium?**
Selenium is a versatile framework designed for testing web applications. It supports various programming languages, including Python.

🕸️ **Key Features:**
- **Cross-Language Compatibility:** Selenium supports multiple programming languages, making it accessible for developers.
- **Browser Navigation:** With Selenium, developers can navigate through web browsers like Google Chrome (or others, but here we'll focus on Chrome).
- **Python Integration:** Selenium provides a Python library equipped with methods tailored for web automation tasks.

In [1]:
!pip install selenium

Collecting selenium
  Downloading selenium-4.18.1-py3-none-any.whl.metadata (6.9 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.24.0-py3-none-any.whl.metadata (4.9 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl.metadata (4.7 kB)
Collecting sortedcontainers (from trio~=0.17->selenium)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.18.1-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m53.1 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25hDownloading trio-0.24.0-py3-none-any.whl (460 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m460.2/460.2 k

## import Packages for Managing Web Scraping

In [2]:
#importing packages for managing web scrapping
from selenium import webdriver

In [3]:
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2


In [4]:
from bs4 import BeautifulSoup
import re
import time

In [None]:
#request user input for LinkedIn username and password:
print("Please enter the exact LinkedIn username you use to login (email/phone?):")
username_string = str(input()) 
print()
print("Please enter the exact LinkedIn password:")
password_string = str(input())
print()
print("Please enter your usernmae exactly how it appears in your profile link (after '/in') :")
link_username = str(input())
print()
print("Please enter the number of the last posts you want to analyse:")
number_of_posts = int(input())

Please enter the exact LinkedIn username you use to login (email/phone?):


In [None]:
browser = webdriver.Chrome("chromedrivers.exe")

In [None]:
#open the LinkedIn login page and login under a specified account:
browser.get('https://www.linkedin.com/login')
#enter the specified information to login to LinkedIn:
elementID = browser.find_element_by_id('username')
elementID.send_keys(username_string)
elementID = browser.find_element_by_id('password')
elementID.send_keys(password_string)
elementID.submit()

In [None]:
#open the recent post activity page of the LinkedIn user you specified:
recent_activity_link = "https://www.linkedin.com/in/" + link_username + "-3456bb1b8/recent-activity/shares/"
browser.get(recent_activity_link)

## Scrap posts stats

In [None]:
#calculate number of scrolls depending on the input
number_of_scrolls = -(-number_of_posts // 5)  # 5 is LinkedIn's number of posts per scroll

In [None]:
#we need a loop because we have a particular number of scrolls...
views = []

SCROLL_PAUSE_TIME = 5

In [None]:
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")

for scroll in range(number_of_scrolls) : 
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)
    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

In [None]:
#query the contents (returns service reponse object with web contents, url headers, status and other):
src = browser.page_source
#beautiful soup instance:
soup = BeautifulSoup(src, features="lxml")   #lxml

### 🔍 Searching for Likes on LinkedIn

1. **Locating "Likes" on the Page:**
   To find the likes on LinkedIn, we'll search for "span" tags with specific attributes. You can do this by inspecting the page and identifying the relevant elements.

2. **Identifying Relevant "span" Tags:**
   Look for "span" tags with specific attributes that likely denote the likes on LinkedIn posts. These attributes can vary based on the structure of the page.

3. **Converting Tags to Strings:**
   Once we've located the relevant "span" tags, we'll convert them into strings. This step is crucial for extracting the desired information effectively.

4. **Extracting Desired Tags from Soup Contents:**
   Finally, we'll search for the specific tags ("<stuff>") within the soup contents. These tags likely contain the information we're looking for, such as the number of likes on LinkedIn posts.

In [None]:
likes_bs4tags = soup.find_all("span", attrs = {"class" : "v-align-middle social-details-social-counts__reactions-count"})
#converts a list of 1 string to int, appends to likes list
for tag in likes_bs4tags:
    strtag = str(tag)
    #the first argument in findall (below) is a regular expression (accounts for commas in the number)
    list_of_matches = re.findall('[,0-9]+',strtag)
    #converts the last element (string) in the list to int, appends to likes list
    last_string = list_of_matches.pop()
    without_comma = last_string.replace(',','')
    likes_int = int(without_comma)
    likes.append(likes_int)

In [None]:
#find VIEWS on LinkedIn
#same concept here
views_bs4tags = soup.find_all("span", attrs = {"class" : "icon-and-text-container t-14 t-black--light t-normal"})
for tag in views_bs4tags:
    strtag = str(tag)
    list_of_matches = re.findall('[,0-9]+',strtag)
    last_string = list_of_matches.pop()
    without_comma = last_string.replace(',','')
    views_int = int(without_comma)
    views.append(views_int)  
    
print(views)

## Data Visualization

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
# Reverse the lists
views.reverse()

# Convert lists into pandas DataFrames
views_df = pd.DataFrame(views, columns =['Views'])

In [None]:
# Get rid of the outliers
#   remove data points if further than 3 standard deviations away...
views_df_no_outliers = views_df[np.abs(views_df-views_df.median()) <= (3*views_df.std())]

#   replace NaN values (deleted outliers) with the median values
views_df_no_outliers['Views'].fillna((views_df_no_outliers['Views'].median()), inplace=True)

In [None]:
print('**************************')
print('********* VIEWS **********')
print('**************************')
coefficients_views, residuals_views, _, _, _ = np.polyfit(range(len(views_df_no_outliers)),views_df_no_outliers,1,full=True)
mse_views = (residuals_views[0])/(len(views_df_no_outliers))
nrmse_views = (np.sqrt(mse_views))/(views_df_no_outliers.max() - views_df_no_outliers.min())
slope_views = coefficients_views[0]
print('Slope: ' + str(slope_views))
print('NRMSE Error: ' + str(nrmse_views))
plt.plot(views_df_no_outliers)
plt.plot([slope_views*x + coefficients_views[1] for x in range(len(views_df_no_outliers))])
plt.title('LinkedIn Post Views for ' + link_username)
plt.xlabel('Posts')
plt.ylabel('Views')
plt.savefig(link_username + '-linkedin-views-last-' + str(number_of_posts) + '-posts-GRAPH.png', dpi=600)
plt.show()
plt.clf()

In [None]:
# Save dataframes as CSV files 
views_df_no_outliers.to_csv(link_username + '-linkedin-views-last-' + str(number_of_posts) + '-posts.csv')

## Keep Exploring! 👀

Thank you for delving into this notebook! If you found it insightful or beneficial, I encourage you to explore more of my projects and contributions on my profile.

👉 [Visit my Profile](https://www.kaggle.com/zulqarnainalipk) 👈

[GitHub](https://github.com/zulqarnainali01) | [LinkedIn](https://www.linkedin.com/in/zulqarnain-ali-a9867a273/)

## Share Your Thoughts! 🙏

Your feedback is invaluable! Your insights and suggestions drive our ongoing improvement. If you have any comments, questions, or ideas to contribute, please feel free to reach out.

📬 Contact me via email: [zulqar445ali@gmail.com](mailto:zulqar445ali@gmail.com)

I extend my sincere gratitude for your time and engagement. Your support inspires us to create even more valuable content.