The purpose of this notebook is to load users' posts from <b>wykop.pl</b> and save them in a CSV file for further analysis. To achieve this, we will utilize the <b>Wykop API</b>. First, let's load the necessary modules:

In [1]:
import requests
import pandas as pd
import seaborn as sns

Now we will create a few helpful variables to establish some parameters.

In [2]:
public_key = "your_public_API_key"
private_key = "your_private_API_key"
tag = "gownowpis" # hashtag name we will download entries from
n_of_entries = 2000 # number of entries to load,must be multiple of 50
year = 2024
month = 2
save_folder_path = "data" # path to folder where you want to save CSV file with loaded entries

In order to use the <b>Wykop API</b>, we first have to generate a token using our public and private keys. This token will be necessary for other API methods. Token generation is easy and is done by sending a proper POST request to the <code>/auth</code> address.

In [None]:
data = {"data": {
    "key": public_key,
    "secret": private_key
  },
}
print("Authorization...")
auth_req = requests.post("https://wykop.pl/api/v3/auth", json=data)
token = auth_req.json()['data']['token']
print("Token:", token)

Now, in order to download entries from a specific hashtag, we need to send a GET request to <code>/tags/{tagName}/stream</code>, where <code>{tagName}</code> is the desired hashtag name. It is convenient to load the data into a pandas DataFrame. The maximum number of entries we can download at once is 50, so to load more than that, we perform multiple requests and then join them together.

In [4]:
headers = {
    'Authorization': f'Bearer {token}'
}
pages_tab = []
for i in range(int(n_of_entries/50)):
    params = {
        'page': i,
        'limit': 50, # max 50
        'sort': 'all',
        'type': 'entry',
        'year': year,
        'month': month
    }
    print("Downloading page", i+1)
    response = requests.get(f'https://wykop.pl/api/v3/tags/{tag}/stream', params=params, headers=headers)
    pages_tab.append(pd.DataFrame(response.json()['data']))
data = pd.concat(pages_tab)
print("Data loaded. Shape:", data.shape)
data.head()

Downloading page 1
Downloading page 2
Downloading page 3
Downloading page 4
Downloading page 5
Downloading page 6
Downloading page 7
Downloading page 8
Downloading page 9
Downloading page 10
Downloading page 11
Downloading page 12
Downloading page 13
Downloading page 14
Downloading page 15
Downloading page 16
Downloading page 17
Downloading page 18
Downloading page 19
Downloading page 20
Downloading page 21
Downloading page 22
Downloading page 23
Downloading page 24
Downloading page 25
Downloading page 26
Downloading page 27
Downloading page 28
Downloading page 29
Downloading page 30
Downloading page 31
Downloading page 32
Downloading page 33
Downloading page 34
Downloading page 35
Downloading page 36
Downloading page 37
Downloading page 38
Downloading page 39
Downloading page 40
Data loaded. Shape: (1290, 20)


Unnamed: 0,id,slug,author,device,created_at,voted,content,media,adult,tags,favourite,parent,votes,editable,deletable,comments,resource,actions,archive,status
0,75246945,mam-wrazenie-ze-w-telewizji-leca-tylko-3-rekla...,"{'username': 'Kopyto96', 'gender': None, 'comp...",,2024-02-29 22:52:00,0,"Mam wrażenie, że w telewizji lecą tylko 3 rekl...","{'photo': None, 'embed': None, 'survey': None}",False,"[reklama, telewizja, zalesie, gownowpis, niebi...",False,,"{'up': 2, 'down': 0, 'users': [{'username': 'W...",False,False,"{'items': [{'id': 264795361, 'slug': 'kopyto96...",entry,"{'create': False, 'update': False, 'delete': F...",False,visible
1,75246657,lizme-stupki-natychmiast-gownowpis,"{'username': 'vikop-ru', 'gender': 'f', 'compa...",,2024-02-29 22:25:00,0,Liżme stupki natychmiast\n\n#gownowpis,"{'photo': None, 'embed': None, 'survey': None}",False,[gownowpis],False,,"{'up': 0, 'down': 0, 'users': []}",False,False,"{'items': [{'id': 264794819, 'slug': 'vikop-ru...",entry,"{'create': False, 'update': False, 'delete': F...",False,visible
2,75246421,umyje-sie-dzis-od-stop-po-czubek-glowy-szarym-...,"{'username': 'ItsyBitsyPajonk', 'gender': 'm',...",,2024-02-29 22:08:17,0,Umyję się dziś od stóp po czubek głowy szarym ...,"{'photo': None, 'embed': None, 'survey': None}",False,"[gownowpis, zzyciapajonka]",False,,"{'up': 2, 'down': 0, 'users': [{'username': 'p...",False,False,"{'items': [{'id': 264793397, 'slug': 'umyje-si...",entry,"{'create': False, 'update': False, 'delete': F...",False,visible
3,75246367,heheszki-humorobrazkowy-gownowpis,"{'username': 'pogop', 'gender': 'm', 'company'...",,2024-02-29 22:04:33,0,#heheszki #humorobrazkowy #gownowpis,{'photo': {'key': 'xRQ1D37lOaMPgwX49B5n0jMy9V2...,False,"[heheszki, humorobrazkowy, gownowpis]",False,,"{'up': 105, 'down': 0, 'users': [{'username': ...",False,False,"{'items': [], 'count': 0}",entry,"{'create': False, 'update': False, 'delete': F...",False,visible
4,75246323,przegryw-kiciochpyta-gownowpis,"{'username': 'niedorzecznybubr', 'gender': 'm'...",,2024-02-29 22:02:05,0,#przegryw #kiciochpyta #gownowpis,"{'photo': None, 'embed': None, 'survey': {'key...",False,"[przegryw, kiciochpyta, gownowpis]",False,,"{'up': 0, 'down': 0, 'users': []}",False,False,"{'items': [{'id': 264793163, 'slug': 'niedorze...",entry,"{'create': False, 'update': False, 'delete': F...",False,visible


At the end, let's save the downloaded data into a CSV file for further analysis.

In [5]:
data.to_csv(f"{save_folder_path}/{tag}.csv")