# Scraping Active Security Prices from HSX website

### What?
This program will scrape the latest security price of a requested security from the HSX website 

### Why? 
The latest security price is needed to calculate the market value and realised gain/loss 

### How? 
1. Find out which security positions are open
2. Take input parameter of the name of the security through a iteration of all the open positions
3. Use the input name to find the web page of interest (eg. https://www.hsx.com/security/view/(insert security symbol)
4. Scrape the latest price by first extracting whole chunk as string and then use string slicing to obtain price
5. Tidy up df and save to CSV file

## Main Program

#### Import Libraries

In [1]:
import platform
import pyodbc
import pandas as pd
import requests
from bs4 import BeautifulSoup as soup

#### System Parameters

In [2]:
def db_connect_str_from_env ():
    if platform.system() == 'Windows':
        driver = 'DRIVER={SQL Server};'
    elif platform.system() == 'Darwin': #MacOS
        if platform.machine() == 'arm64': #M1 chip
            driver = 'DRIVER=/opt/homebrew/lib/libmsodbcsql.18.dylib;'
        else:
            driver = 'DRIVER=/Library/simba/sqlserverodbc/lib/libsqlserverodbc_sbu.dylib;'
    
    return driver + 'SERVER=dlyle.database.windows.net;DATABASE=HSX;UID=student;PWD=Viz(Data);'

DB = db_connect_str_from_env()

#### Extract data from database into dataframe

In [3]:
# Extract only security symbols of open positions 
SQL = """
  SELECT DISTINCT Security_symbol
FROM (
SELECT Security_symbol, User_name, sum(signed_quant) AS balance
FROM (
    SELECT Security_Symbol, User_Name, 
    CASE WHEN Action = 'Buy' or Action = 'Short' THEN Quantity
    ELSE -1 * Quantity END AS signed_quant
    FROM Trades
    WHERE User_Name in ('will_ho', 'rkhoo', 'lucasee')
) AS signed_table
GROUP BY Security_Symbol, User_Name
HAVING sum(signed_quant) > 0
) AS open_positions
"""
# create df
data = pd.read_sql(SQL, pyodbc.connect(DB))




#### Functions

In [4]:
# Function to scrape data
def fetch_latest_price(_symbol):
    # Find webpage to extract price
    url = f"https://www.hsx.com/security/view/{_symbol}"
    # Request
    r = requests.get(url)
    #check that url is not broken
    if r.status_code == 200:
        page = soup(r.content)
        # Extract out the portion of the page that we are interested in and convert it into a string
        focus = str(page.select('p[class="value"]'))
        # Locate start of information of interest
        start = focus.find('H') + 1
        # Locate end
        end = focus.find('.') + 2
        price = focus[start:end+1]
        return ([_symbol, price])
     # if url is broken
    else:
        print('Error', r.status_code, 'Page not found for:', url)

#### Main Program

In [5]:
# Empty list to store [_symbol, price]
info = []
# for each security symbol in data, fetch latest price
for symbol in data["Security_symbol"]:
    info.append(fetch_latest_price(symbol))
# convert info into a dataframe
latest_price_df = pd.DataFrame(info)
# give it useful column names
latest_price_df.columns = ["Security Symbol", "Latest price"]


In [6]:
# Convert df into a csv file
latest_price_df.to_csv('Latest_price.csv', index = False)


Unnamed: 0,Security Symbol,Latest price
0,50CEN,$46.42
1,ACORN,$32.84
2,ALASKA,$13.86
3,AMSTR,$14.57
4,ARMGT.OW,$0.79
5,ASCLB,$13.00
6,BADAM.OW,$67.00
7,CDSG,$12.02
8,DEAD3,$225.84
9,FANF2,$140.87
