# Garmin Connect scraper

**Garmin Connect** offers a map of user-submitted routes which we can easily filter. Those routes can be accessed by searching for the location (town name). Let's try downloading the *gpx* files for some routes to see if **Garmin** has implemented scraping protection on its website.

In [1]:
#Importing libraries.

import pandas as pd
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [2]:
#Initializing our webdriver (Chrome).

driver = webdriver.Chrome()

In [126]:
#Accessing Garmin Connect. At this point we will have to login manually.

driver.get('https://connect.garmin.com/modern/')

In [43]:
#Accessing the activity map.

driver.get('https://connect.garmin.com/modern/courses')

At this stage I'm setting the activity filters manually, since we're only testing the viability of large-scale *gpx* scraping.

In [13]:
#Clicking on the search box.

searchbox = driver.find_element_by_xpath('//*[@id="pageContainer"]/div/div[2]/div[1]/div[1]/input')
searchbox.click()

In [14]:
#Typing a town name and hitting enter.

searchbox.send_keys('Mataró')
time.sleep(0.5)
searchbox.send_keys(Keys.ENTER)

In [15]:
#Accessing the routes displayed on the current map.

routes = driver.find_elements_by_class_name('course-link')

In [16]:
#Accessing a single route.

routes[10].get_attribute('href')

'https://connect.garmin.com/modern/course/15465472'

In [17]:
#Storing all ride links on a list.

route_list = []

for i in routes:
    link = i.get_attribute('href')
    route_list.append(link)

In [19]:
#Inspecting one element of the list.

route_list[0]

'https://connect.garmin.com/modern/course/53159249'

In [20]:
#Let's open a single route link.

driver.get(route_list[0])

In [22]:
#Clicking on the dot button to show the download options.

options = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/button')
options.click()

In [23]:
#Locating the download button anc clicking. Success!

download = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/ul/li[2]/a')
download.click()

## Creating a function to download all routes in our list

Now that we've demonstrated the viability of downloading a *gpx* file, let's try to download all routes from our list until we hit a limit.

In [25]:
#We'll begin by creating a loop that performs the desired operation.

start = time.time()

def gpx_downloader(link):
    try:
        driver.get(link) #Accessing the route.
        cond = False
        while cond == False:
            try:
                options = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/button')
                time.sleep(0.2)
                cond = True
                options.click()
            except:
                cond = False
                time.sleep(0.3)
        cond = False
        while cond == False:
            try:
                download = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/ul/li[2]/a')
                time.sleep(0.2)
                cond = True
                download.click()
            except:
                cond = False
                time.sleep(0.3)
    except:
        time.sleep(5)
        gpx_downloader(link)
            
for i in route_list:
    gpx_downloader(i)
    
stop = time.time() 
duration = (stop - start)
print('Seconds:', int(duration))

Seconds: 208


In [27]:
#Packing the loop into a function that we can use on lists of links.

def downloader(link_list):
    start = time.time()

    def gpx_downloader(link):
        try:
            driver.get(link) #Accessing the route.
            cond = False
            while cond == False:
                try:
                    options = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/button')
                    time.sleep(0.2)
                    cond = True
                    options.click()
                except:
                    cond = False
                    time.sleep(0.3)
            cond = False
            while cond == False:
                try:
                    download = driver.find_element_by_xpath('//*[@id="main-card"]/div/div[4]/div[1]/div/ul/li[2]/a')
                    time.sleep(0.2)
                    cond = True
                    download.click()
                except:
                    cond = False
                    time.sleep(0.3)
        except:
            time.sleep(5)
            gpx_downloader(link)

    for i in link_list:
        gpx_downloader(i)

    stop = time.time() 
    duration = (stop - start)
    return print('Seconds:', int(duration))

In [28]:
#Testing the function.

downloader(route_list)

Seconds: 267


## Creating a function that grabs all route links from a destination

Now that we know how to download all *gpx* files from a list of route links we need a way to obtain all route links of a given destination (town).

### Locating elements to navigate the map

In [45]:
searchbox = driver.find_element_by_xpath('//*[@id="pageContainer"]/div/div[2]/div[1]/div[1]/input')
searchbox.click()
searchbox.send_keys('Mataró')
time.sleep(0.5)
searchbox.send_keys(Keys.ENTER)
time.sleep(0.5)
more_zoom = driver.find_element_by_xpath('//*[@id="pageContainer"]/div/div[2]/div[2]/a[1]')
more_zoom.click()
time.sleep(0.3)
more_zoom.click()

In [15]:
routes = driver.find_elements_by_class_name('course-link')

route_list = []

for i in routes:
    link = i.get_attribute('href')
    route_list.append(link)

In [35]:
#These little functions will navigate the Garmin map for us. They can be stacked as needed.

mapa = driver.find_element_by_xpath('//*[@id="leafletMap_399"]')

def down():
    for i in range(10):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_DOWN)
    
def up():
    for i in range(10):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_UP)
    
def left():
    for i in range(21):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_LEFT)
    
def right():
    for i in range(21):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_RIGHT)
        
def s_down():
    for i in range(5):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_DOWN)
    
def s_up():
    for i in range(5):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_UP)
    
def s_left():
    for i in range(10):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_LEFT)
    
def s_right():
    for i in range(10):
        time.sleep(0.4)
        mapa.send_keys(Keys.ARROW_RIGHT)

In [13]:
#This function appends the displayed routes to the main list.

def getroutes():
    time.sleep(2)
    routes = driver.find_elements_by_class_name('course-link')
    for i in routes:
        link = i.get_attribute('href')
        route_list.append(link)

In [6]:
#Second variation, 7x13 grid.

def short():
    start = time.time()

    left()
    left()
    left()
    up()
    up()
    up()
    up()
    up()
    up()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()

    stop = time.time() 
    duration = (stop - start) / 60
    return print('Minutes:', int(duration))

In [7]:
#Third variation, 9x17 grid.

def long():
    start = time.time()

    left()
    left()
    left()
    left()
    up()
    up()
    up()
    up()
    up()
    up()
    up()
    up()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()

    stop = time.time() 
    duration = (stop - start) / 60
    return print('Minutes:', int(duration))

In [8]:
#3x7 grid for cities between 50-100K inhabitants, using max zoom minus one.

def mid():
    start = time.time()

    left()
    up()
    up()
    up()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    

    stop = time.time() 
    duration = (stop - start)
    return print('Seconds:', int(duration))

In [9]:
#3x5 grid for towns under 50K inhabitants, max zoom minus one.

def town_50():
    start = time.time()

    left()
    up()
    up()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    right()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    up()
    getroutes()
    right()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()
    down()
    

    stop = time.time() 
    duration = (stop - start)
    return print('Seconds:', int(duration))

In [10]:
#1x3 custom grid for small towns.

def town_20_b():
    up()
    getroutes()
    down()
    getroutes()
    down()
    getroutes()

In [11]:
#1x2 custom grid for small towns.

def town_20():
    s_up()
    getroutes()
    down()
    getroutes()

In [44]:
#1x1 custom grid for smallest towns.

def town_10():
    getroutes()

In [46]:
len(route_list)

34853

In [47]:
#Creating a dictionary with the route links and saving it.

dict_links = {'links': route_list}
df = pd.DataFrame(dict_links)
df.to_csv('links_7000.csv', index=False)

In [18]:
#Importing our dataframe of towns.

towns = pd.read_csv('towns.csv')

In [21]:
#Creating a new dataframe with towns under 20.000 inhabitants.

df_20 = towns[towns['poblacion'] <= 20000]
df_20.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7715 entries, 414 to 8128
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   nombre     7715 non-null   object
 1   poblacion  7715 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 180.8+ KB


In [23]:
#Saving the town names as a list.

list_20 = df_20['nombre'].tolist()

In [24]:
list_20[0]

'Tolosa'

In [45]:
start = time.time()

count = 0

for i in list_20[6000:]:
    searchbox = driver.find_element_by_xpath('//*[@id="pageContainer"]/div/div[2]/div[1]/div[1]/input')
    searchbox.click()
    time.sleep(0.5)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.BACKSPACE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    searchbox.send_keys(Keys.DELETE)
    name = i + ', España'
    searchbox.send_keys(name)
    time.sleep(0.5)
    searchbox.send_keys(Keys.ENTER)
    time.sleep(3)
    town_10()
    count += 1
    print('Finished towns: ', count)
    
stop = time.time() 
duration = (stop - start) / 60
print('Minutes:', int(duration))

Finished towns:  1
Finished towns:  2
Finished towns:  3
Finished towns:  4
Finished towns:  5
Finished towns:  6
Finished towns:  7
Finished towns:  8
Finished towns:  9
Finished towns:  10
Finished towns:  11
Finished towns:  12
Finished towns:  13
Finished towns:  14
Finished towns:  15
Finished towns:  16
Finished towns:  17
Finished towns:  18
Finished towns:  19
Finished towns:  20
Finished towns:  21
Finished towns:  22
Finished towns:  23
Finished towns:  24
Finished towns:  25
Finished towns:  26
Finished towns:  27
Finished towns:  28
Finished towns:  29
Finished towns:  30
Finished towns:  31
Finished towns:  32
Finished towns:  33
Finished towns:  34
Finished towns:  35
Finished towns:  36
Finished towns:  37
Finished towns:  38
Finished towns:  39
Finished towns:  40
Finished towns:  41
Finished towns:  42
Finished towns:  43
Finished towns:  44
Finished towns:  45
Finished towns:  46
Finished towns:  47
Finished towns:  48
Finished towns:  49
Finished towns:  50
Finished 

Finished towns:  397
Finished towns:  398
Finished towns:  399
Finished towns:  400
Finished towns:  401
Finished towns:  402
Finished towns:  403
Finished towns:  404
Finished towns:  405
Finished towns:  406
Finished towns:  407
Finished towns:  408
Finished towns:  409
Finished towns:  410
Finished towns:  411
Finished towns:  412
Finished towns:  413
Finished towns:  414
Finished towns:  415
Finished towns:  416
Finished towns:  417
Finished towns:  418
Finished towns:  419
Finished towns:  420
Finished towns:  421
Finished towns:  422
Finished towns:  423
Finished towns:  424
Finished towns:  425
Finished towns:  426
Finished towns:  427
Finished towns:  428
Finished towns:  429
Finished towns:  430
Finished towns:  431
Finished towns:  432
Finished towns:  433
Finished towns:  434
Finished towns:  435
Finished towns:  436
Finished towns:  437
Finished towns:  438
Finished towns:  439
Finished towns:  440
Finished towns:  441
Finished towns:  442
Finished towns:  443
Finished town

Finished towns:  788
Finished towns:  789
Finished towns:  790
Finished towns:  791
Finished towns:  792
Finished towns:  793
Finished towns:  794
Finished towns:  795
Finished towns:  796
Finished towns:  797
Finished towns:  798
Finished towns:  799
Finished towns:  800
Finished towns:  801
Finished towns:  802
Finished towns:  803
Finished towns:  804
Finished towns:  805
Finished towns:  806
Finished towns:  807
Finished towns:  808
Finished towns:  809
Finished towns:  810
Finished towns:  811
Finished towns:  812
Finished towns:  813
Finished towns:  814
Finished towns:  815
Finished towns:  816
Finished towns:  817
Finished towns:  818
Finished towns:  819
Finished towns:  820
Finished towns:  821
Finished towns:  822
Finished towns:  823
Finished towns:  824
Finished towns:  825
Finished towns:  826
Finished towns:  827
Finished towns:  828
Finished towns:  829
Finished towns:  830
Finished towns:  831
Finished towns:  832
Finished towns:  833
Finished towns:  834
Finished town

Finished towns:  1171
Finished towns:  1172
Finished towns:  1173
Finished towns:  1174
Finished towns:  1175
Finished towns:  1176
Finished towns:  1177
Finished towns:  1178
Finished towns:  1179
Finished towns:  1180
Finished towns:  1181
Finished towns:  1182
Finished towns:  1183
Finished towns:  1184
Finished towns:  1185
Finished towns:  1186
Finished towns:  1187
Finished towns:  1188
Finished towns:  1189
Finished towns:  1190
Finished towns:  1191
Finished towns:  1192
Finished towns:  1193
Finished towns:  1194
Finished towns:  1195
Finished towns:  1196
Finished towns:  1197
Finished towns:  1198
Finished towns:  1199
Finished towns:  1200
Finished towns:  1201
Finished towns:  1202
Finished towns:  1203
Finished towns:  1204
Finished towns:  1205
Finished towns:  1206
Finished towns:  1207
Finished towns:  1208
Finished towns:  1209
Finished towns:  1210
Finished towns:  1211
Finished towns:  1212
Finished towns:  1213
Finished towns:  1214
Finished towns:  1215
Finished t

Finished towns:  1544
Finished towns:  1545
Finished towns:  1546
Finished towns:  1547
Finished towns:  1548
Finished towns:  1549
Finished towns:  1550
Finished towns:  1551
Finished towns:  1552
Finished towns:  1553
Finished towns:  1554
Finished towns:  1555
Finished towns:  1556
Finished towns:  1557
Finished towns:  1558
Finished towns:  1559
Finished towns:  1560
Finished towns:  1561
Finished towns:  1562
Finished towns:  1563
Finished towns:  1564
Finished towns:  1565
Finished towns:  1566
Finished towns:  1567
Finished towns:  1568
Finished towns:  1569
Finished towns:  1570
Finished towns:  1571
Finished towns:  1572
Finished towns:  1573
Finished towns:  1574
Finished towns:  1575
Finished towns:  1576
Finished towns:  1577
Finished towns:  1578
Finished towns:  1579
Finished towns:  1580
Finished towns:  1581
Finished towns:  1582
Finished towns:  1583
Finished towns:  1584
Finished towns:  1585
Finished towns:  1586
Finished towns:  1587
Finished towns:  1588
Finished t

In [48]:
df1 = pd.read_csv('links.csv')
df2 = pd.read_csv('links_0_500.csv')
df3 = pd.read_csv('links_500_2000.csv')
df4 = pd.read_csv('links_2000_4000.csv')
df5 = pd.read_csv('links_4000_6000.csv')
df6 = pd.read_csv('links_7000.csv')

In [49]:
df = pd.concat([df1, df2, df3, df4, df5, df6], axis=0)

In [52]:
df = df.drop_duplicates()

In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 79275 entries, 0 to 34851
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   links   79275 non-null  object
dtypes: object(1)
memory usage: 1.2+ MB


In [57]:
df.to_csv('df.csv', index=False)

In [54]:
list_links = df['links'].tolist()

In [56]:
len(list_links)

79275

cifyamuydu@biyac.com<br>
<br>
123456aA