# Scraping a List of All Counter Strike: Global Offensive Items

## Why?
Counter Strike has thousands of different items that can be trading and sold on the community market. On the internet, there is **no comprehensive list of skins and cosmetics** for us to use to collect data. Previous methods of analyzing the CS:GO market queried the community market for all items. This method, however, leaves out crucial information on a skin's quality and collection. Thus I built this notebook to collect a list of all skins and items from the website csgostash.com. 

## xlsx
.xlsx files were used instead of .csv to allow for symbols to be written. Many skins have asian characters or special characters that are necessary for querying later.

In [None]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import re


In [None]:
url = "https://csgostash.com/"

html_content = requests.get(url).text

soup = BeautifulSoup(html_content, "lxml")

### SCRAPING ALL GUN SKINS

To scrape all gun skins, I compiled a list of links to every weapon's page of skins on csgostash.com as well and pulled crucial information on collection and quality. Skins also have a unique quality called souvenir which can later be queried for as a separate item.

In [None]:
skins = pd.DataFrame(columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak", "Souvenir"])

In [None]:
# getting the list of all weapon links
skin_ul = soup.find_all("ul", {"class": "dropdown-menu navbar-dropdown-large"})

skin_links = []
for x in skin_ul[:4]: # sliced to just the guns
    skin_children = x.findChildren("a", href = True)

    for child in skin_children:
        skin_links.append(child["href"])

In [None]:
def get_skins(items):
    skins = []
    for weapon in reversed(items[:-1]): # sliced to ignore default
        try:
            skin = weapon.h3.text
            gun = link.split("/")[-1].replace("+", " ") # accounts for weapons like r8 which have a space
            quality = weapon.p.text.split()[0]
            if (quality == "Consumer") | (quality == "Industrial"):
                quality += " Grade"
            collection = weapon.find("div", {"class": "collection"}).text.replace("\n", "")
            st = weapon.find("div", {"class": "stattrak"})
            if st == None:
                stattrak = False
            else:
                stattrak = True

            sv = weapon.find("div", {"class": "souvenir"})
            if sv == None:
                souvenir = False
            else:
                souvenir = True
            skins.append([gun, collection, skin, quality, stattrak, souvenir])
        except Exception as e: # accounts for ad spaces
            pass

    return skins

In [None]:
for link in skin_links:
    soup = BeautifulSoup(requests.get(link).content, "lxml")
    items = soup.find_all("div", class_= "well result-box nomargin")
    skins = skins.append(pd.DataFrame(data = get_skins(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak", "Souvenir"]), ignore_index = True)

In [None]:
skins.to_excel("skins.xlsx")

### SCRAPING ALL KNIVES AND GLOVES

In order to scrape all knives, we used the collections pages on CS:GO STASH instead of just a list of knives. This is due to the fact knives can reappear in different collections, making it difficult to scrape the exact collection knives are in. 

Note: Knives are categorized by the first box they appeared in

In [None]:
knives = pd.DataFrame(columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak", "Souvenir"])

In [None]:
# fetching link of all collections
knives_ul = soup.find("ul", {"class": "dropdown-menu navbar-dropdown-small"})

knives_links = []
knives_children = knives_ul.findChildren("a", href = True)

for child in knives_children:
    knives_links.append(child["href"])

In [None]:
def get_knives(items):
    skins = []
    for weapon in reversed(items):
        try:
            x = weapon.h3.text.split(" | ")
            gun = "★ " + x[0] # star used in querying
            skin = x[1]
            collection = link.split("/")[-1].replace("-", " ")
            quality = weapon.p.text.split()[0]
            stattrak = True # all knives have a stattrak variant
            skins.append([gun, collection, skin, quality, stattrak])
        except:
            pass

    return skins

In [None]:
for link in knives_links[:-3]: # slice off last 3 as are not cases
    start = link + "?Knives=1" # navigates to the knives page
    url = link + "?Knives=1&page={}"

    soup = BeautifulSoup(requests.get(start).content, "lxml")
    try: # checks if there are multiple pages
        page_selector = soup.find("ul", {"class": "pagination"})
        page_children = page_selector.findChildren("a", href = True)

        pages = int(page_children[-2].text.split(None, 1)[0])
    except:
        pages = 1
    
    for page in range(pages, 0, -1): # loops through the different pages to get all items
        soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
        items = soup.find_all("div", class_= "well result-box nomargin")
        knives = knives.append(pd.DataFrame(data = get_knives(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)


In [None]:
def get_gloves(items):
    skins = []
    for weapon in reversed(items):
        try:
            x = weapon.h3.text.split(" | ")
            gun = x[0]
            skin = x[1]
            collection = link.split("/")[-1].replace("-", " ")
            quality = weapon.p.text.split()[0]
            stattrak = False # gloves dont have a stattrak version
            skins.append([gun, collection, skin, quality, stattrak]) 
        except:
            pass

    return skins

In [None]:
# loops through all gloves
glove_links = ["https://csgostash.com/case/179/Glove-Case", "https://csgostash.com/case/238/Clutch-Case"]

for link in glove_links:
    start = link + "?Gloves=1" # navigates to the gloves page
    url = link + "?Gloves=1&page={}"

    soup = BeautifulSoup(requests.get(start).content, "lxml")
    try: # checks if there are multiple pages
        page_selector = soup.find("ul", {"class": "pagination"})
        page_children = page_selector.findChildren("a", href = True)

        pages = int(page_children[-2].text.split(None, 1)[0])
    except:
        pages = 1
    for page in range(pages, 0, -1): # loops through the different pages to get all items
        soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
        items = soup.find_all("div", class_= "well result-box nomargin")
        knives = knives.append(pd.DataFrame(data = get_gloves(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
knives.to_excel("knivesgloves.xlsx")

### SCRAPING ALL STICKERS

Stickers can vary as they can be apart of different tournaments and collections. Thus we categorize them by their collection if it exists or by the tournament they were released in. Certain collections (slid3, team roles, and pinup) have multiple foil and holo stickers due to a Valve mess up on the item name. These were added later manually and the sticker list reflects this. 

In [None]:
stickers = pd.DataFrame(columns = ["Weapon", "Collection", "Skin", "Quality"])

In [None]:
# function to pull tournament stickers
def get_stickers(items):
    sticks = []
    for x in reversed(items): 
        try:
            weapon = "Sticker"
            name = x.h3.text
            quality = x.p.text.split()[0]
            if (quality == "High"):
                quality += " Grade"
            collection = x.h4.text
            sticks.append([weapon, collection, name, quality])
        except Exception as e: # accounts for ad spaces
            pass

    return sticks

In [None]:
# iterates through the tournament stickers pages
start = "https://csgostash.com/stickers/tournament"
url = "https://csgostash.com/stickers/tournament" + "?page={}"

soup = BeautifulSoup(requests.get(start).content, "lxml")
try: # checks if there are multiple pages
    page_selector = soup.find("ul", {"class": "pagination"})
    page_children = page_selector.findChildren("a", href = True)

    pages = int(page_children[-2].text.split(None, 1)[0])
except:
    pages = 1

for page in range(pages, 0, -1): # loops through the different pages to get all items
    soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
    items = soup.find_all("div", class_= "well result-box nomargin")
    stickers = stickers.append(pd.DataFrame(data = get_stickers(items), columns = ["Weapon", "Collection", "Skin", "Quality"]), ignore_index = True)

In [None]:
# function to pull normal stickers
def get_stickers(items):
    sticks = []
    for x in reversed(items): 
        try:
            weapon = "Sticker"
            name = x.h3.text
            quality = x.p.text.split()[0]
            if (quality == "High"):
                quality += " Grade"
            if quality == "Contraband": # contraband doesnt have collection
                collection = None
            else:
                collection = x.find("p", {"class": "nomargin item-resultbox-collection-container-info"}).text.replace("\n", "")
            sticks.append([weapon, collection, name, quality])
        except Exception as e: # accounts for ad spaces
            pass

    return sticks

In [None]:
# iterates through the regular stickers pages
start = "https://csgostash.com/stickers/regular"
url = "https://csgostash.com/stickers/regular" + "?page={}"

soup = BeautifulSoup(requests.get(start).content, "lxml")
try: # checks if there are multiple pages
    page_selector = soup.find("ul", {"class": "pagination"})
    page_children = page_selector.findChildren("a", href = True)

    pages = int(page_children[-2].text.split(None, 1)[0])
except:
    pages = 1

for page in range(pages, 0, -1): # loops through the different pages to get all items
    soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
    items = soup.find_all("div", class_= "well result-box nomargin")
    stickers = stickers.append(pd.DataFrame(data = get_stickers(items), columns = ["Weapon", "Collection", "Skin", "Quality"]), ignore_index = True)

In [None]:
stickers.to_excel("stickers.xlsx")

### SCRAPING ALL CASES
Cases are a valuable trading commodity that used to be a viable investment choice. To scrape them, we have to be careful of major sticker capsules which are placed in the collection under their respective major.

In [None]:
# hand created a list of list for the types of cases not included on csgostash
uncategorized = [["Patch Pack", "Half-Life: Alyx Patch Pack", "Half-Life: Alyx Patch Pack"], ["Patch Pack", "CS:GO Patch Pack", "CS:GO Patch Pack"], ["Music Kit Box", "Masterminds Music Kit Box", "Masterminds Music Kit Box"], ["Must Kit Box", "StatTrak™ Radicals Box", "StatTrak™ Radicals Box"], ["Pins Capsule", "Collectible Pins Capsule Series 1", "Collectible Pins Capsule Series 1"], ["Pins Capsule", "Collectible Pins Capsule Series 2", "Collectible Pins Capsule Series 2"], ["Pins Capsule", "Collectible Pins Capsule Series 3", "Collectible Pins Capsule Series 3"], ["Pins Capsule", "Half-Life: Alyx Collectible Pins Capsule", "Half-Life: Alyx Collectible Pins Capsule"], ["Graffiti Box", "Community Graffiti Box 1", "Community Graffiti Box 1"], ["Graffiti Box", "CS:GO Graffiti Box", "CS:GO Graffiti Box"], ["Graffiti Box", "Perfect World Graffiti Box", "Perfect World Graffiti Box"]]

cases = pd.DataFrame(data = uncategorized, columns = ["Weapon", "Collection", "Skin"])

In [None]:
case_links = ["https://csgostash.com/containers/skin-cases", "https://csgostash.com/containers/souvenir-packages", "https://csgostash.com/containers/gift-packages", "https://csgostash.com/containers/sticker-capsules", "https://csgostash.com/containers/autograph-capsules"]

In [None]:
def get_cases_sc(items, item_type): # get cases with the same collection as their name
    case_list = []
    for x in reversed(items): 
        try:
            name = x.h4.text # cases have text in h4 instead
            if "201" in name and "eSports" not in name: # checks if its a major sticker capsule
                words = name.split()
                for x in range(0, len(words)+1):
                    if "201" in words[x]:
                        if words[x-1] == "Columbus":
                            collection = " ".join(words[x-2:x+1])
                            break
                        else:
                            collection = " ".join(words[x-1:x+1])
                            break
            else:
                collection = name
            case_list.append([item_type, collection, name])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
for link in (case_links[0], case_links[2], case_links[3], case_links[4]): # links i can get all data from just a header
    start = link
    url = link + "?page={}"

    soup = BeautifulSoup(requests.get(start).content, "lxml")
    try: # checks if there are multiple pages
        page_selector = soup.find("ul", {"class": "pagination"})
        page_children = page_selector.findChildren("a", href = True)

        pages = int(page_children[-2].text.split(None, 1)[0])
    except:
        pages = 1

    for page in range(pages, 0, -1): # loops through the different pages to get all items
        soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
        item_type = soup.find("div", {"class": "col-lg-12 text-center col-widen content-header"}).text.replace("\n", "")
        items = soup.find_all("div", class_= "well result-box nomargin")
        cases = cases.append(pd.DataFrame(data = get_cases_sc(items, item_type), columns = ["Weapon", "Collection", "Skin"]), ignore_index = True)

In [None]:
def get_cases_souv(items, item_type): # get souvenir cases
    case_list = []
    for x in reversed(items): 
        try:
            name = x.h4.text # cases have text in h4 instead
            collection = x.find("div", {"class": "containers-details-link"}).text.replace("\n", "")
            case_list.append([item_type, collection, name])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = case_links[1]
url = case_links[1] + "?page={}"

soup = BeautifulSoup(requests.get(start).content, "lxml")
try: # checks if there are multiple pages
    page_selector = soup.find("ul", {"class": "pagination"})
    page_children = page_selector.findChildren("a", href = True)

    pages = int(page_children[-2].text.split(None, 1)[0])
except:
    pages = 1

for page in range(pages, 0, -1): # loops through the different pages to get all items
    soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
    item_type = soup.find("div", {"class": "col-lg-12 text-center col-widen content-header"}).text.replace("\n", "")
    items = soup.find_all("div", class_= "well result-box nomargin")
    cases = cases.append(pd.DataFrame(data = get_cases_souv(items, item_type), columns = ["Weapon", "Collection", "Skin"]), ignore_index = True)

In [None]:
cases.to_excel("cases.xlsx")

### SCRAPING OTHER
There are also a collection of miscellaneous items we can scrape for further insight. Each of these items are scraped separately as their pages all differ in some way or another

In [None]:
other = pd.DataFrame(columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"])

In [None]:
def get_agents(items): # get agents
    case_list = []
    for x in reversed(items): 
        try:
            weapon = "Agents"
            name = x.h3.text # agents are in h3
            collection = "Shattered Web Case"
            quality = x.p.text.split()[0]
            stattrak = False
            case_list.append([weapon, collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/agents"

soup = BeautifulSoup(requests.get(start).content, "lxml")
items = soup.find_all("div", class_= "well result-box nomargin")
other = other.append(pd.DataFrame(data = get_agents(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
def get_patches(items): # get agents patches
    case_list = []
    for x in reversed(items): 
        try:
            weapon = "Patch"
            name = x.h3.text # patches are in h3
            name = name.replace(" Patch", "") # replace patch in name for easier scraping later
            quality = x.p.text.split()[0]
            collection = x.find("p", {"class": "nomargin item-resultbox-collection-container-info"}).text.replace("\n", "")
            stattrak = False
            case_list.append([weapon, collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/patches"

soup = BeautifulSoup(requests.get(start).content, "lxml")
items = soup.find_all("div", class_= "well result-box nomargin")
other = other.append(pd.DataFrame(data = get_patches(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
def get_kits(items): # get music kits
    case_list = []
    for x in reversed(items): 
        try:
            weapon = "Music Kit"
            title = x.h3.text # name is in h3
            author = " ".join(x.h4.text.split()[1:]) # get author
            name = "{0}, {1}".format(author, title)
            quality = "High Grade"
            collection = x.find("p", {"class": "nomargin item-resultbox-collection-container-info"})
            if collection == None:
                collection = None
            elif collection.text.replace("\n", "") == "Found in 2 boxes": #  finds the master mind collections
                collection = "Masterminds Music Kit Box"
            else:
                collection = collection.text

            st = x.find("div", {"class": "stattrak"})
            if st == None:
                stattrak = False
            elif st.text.replace("\n", "") == "StatTrak Available":
                stattrak = True
            elif st.text.replace("\n", "") == "StatTrak Only": # if stattrak only, make the name show it to make querying data easier later
                name = "StatTrak™ Music Kit | " + name
                stattrak = False
                    
            case_list.append([weapon, collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/music"

soup = BeautifulSoup(requests.get(start).content, "lxml")
items = soup.find_all("div", class_= "well result-box nomargin")
other = other.append(pd.DataFrame(data = get_kits(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
def get_pins(items): # get pins
    case_list = []
    for x in reversed(items): 
        try:
            weapon = "Collectible Pins"
            name = x.h3.text # pins are in h3
            quality = x.p.text.split()[0]
            if quality == "High":
                quality = "High Grade"
            collection = x.find("p", {"class": "nomargin item-resultbox-collection-container-info"}).text.replace("\n", "")
            stattrak = False
            case_list.append([weapon, collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/pins"

soup = BeautifulSoup(requests.get(start).content, "lxml")
items = soup.find_all("div", class_= "well result-box nomargin")
other = other.append(pd.DataFrame(data = get_pins(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
def get_graffiti(items): # get graffiti
    case_list = []
    for x in reversed(items): 
        
        try:
            weapon = "Sealed Graffiti"
            name = x.h3.text.replace("\n", "") # graffitis are in h3
            if "201" in name:
                words = name.split()
                for i in range(0, len(words)+1):
                    if "201" in words[i]: # checks if its a major sticker capsule
                        collection = " ".join(words[i-1:i+1])
                        break
            else:
                collection = x.find("p", {"class": "nomargin item-resultbox-collection-container-info"}).text.replace("\n", "")
            quality = x.p.text.split()[0]
            if quality == "High":
                quality = "High Grade"
            elif quality == "Base": # base grade graffitis are not in a collection
                quality = "Base Grade"
                collection = None
            stattrak = False
            
            if quality == "Base Grade": # consumer sprays have a variant of colors which all must be added
                colors = ("Wire Blue", "SWAT Blue", "Monarch Blue", "Jungle Green", "Blood Red", "Dust Brown", "Desert Amber", "Brick Red", # some items dont have all these colors
                        "Cash Green", "Frog Green", "Battle Green", "Monster Purple", "Shark White", "Tracer Yellow", "Bazooka Pink", 
                        "Violent Violet", "Princess Pink", "Tiger Orange", "War Pig Pink")
                for c in colors:
                    colored_name = name + f" ({c})"
                    case_list.append([weapon, collection, colored_name, quality, stattrak])
            else:
                case_list.append([weapon, collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/graffiti"
url = "https://csgostash.com/graffiti" + "?page={}"

soup = BeautifulSoup(requests.get(start).content, "lxml")
try: # checks if there are multiple pages
    page_selector = soup.find("ul", {"class": "pagination"})
    page_children = page_selector.findChildren("a", href = True)

    pages = int(page_children[-2].text.split(None, 1)[0])
except:
    pages = 1

for page in range(pages, 0, -1): # loops through the different pages to get all items
    soup = BeautifulSoup(requests.get(url.format(page)).content, "lxml")
    items = soup.find_all("div", class_= "well result-box nomargin")
    other = other.append(pd.DataFrame(data = get_graffiti(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
def get_keys(items): # get pins
    case_list = []
    for x in reversed(items): 
        try:
            weapon = "Items"
            name = x.h4.text # keys are in h4
            quality = None
            collection = None
            stattrak = False
            case_list.append(["Items", collection, name, quality, stattrak])
        except Exception as e: # accounts for ad spaces
            pass

    return case_list

In [None]:
start = "https://csgostash.com/items"

soup = BeautifulSoup(requests.get(start).content, "lxml")
items = soup.find_all("div", class_= "well result-box nomargin")
other = other.append(pd.DataFrame(data = get_keys(items), columns = ["Weapon", "Collection", "Skin", "Quality", "StatTrak"]), ignore_index = True)

In [None]:
other.to_excel("other.xlsx")