# Tourists at Home

### Problem Statement

In a world still battling the spread of Covid-19, countries around the world have shut down their borders. In the wake of a devastating loss in tourism dollars, government bodies and organisations announced the launch of the SingapoRediscovers campaign on 22nd July 2020 and set aside 45 million dollars to boost domestic tourism. However, a new initiative in the form of SingapoRediscovers vouchers was announced less than a month later. This new initiative comes with a budget of 320 million dollars and aim to distribute $100 vouchers to all Singaporeans aged 18 and above valid for seven months from December 2020 to end-June 2021.

With so much stimulus dollars planned to revive our hardest-hit industry, just what exactly is domestic tourism? If you give it further thought, it is a concept that appears paradoxical - can you still be a tourist back home? But, if you were to ask around, it seems most everyone has an idea on what domestic tourism is. As soon as word about the $100 vouchers started circulating, it seems everyone began planning their next staycation or a trip to the USS. Is that all there is to domestic tourism, or is there more?

Representing STB, my project aims to uncover Singaporeans' perception of domestic tourism, and to predict where the $100 vouchers would most likely go.

This project hypothesises ... methodology? singaporeans do not think of domestic tourism as tourism, rather just activities they would be interested in doing. but they do understand what tourism is, as can be seen from their recommendations....


Through this, my project hopes to gather enough insight to understand who would most likely benefit from these stimulus dollars, and to propose recommendations on the next phase of campaigns.

In [None]:
# https://www.stb.gov.sg/content/stb/en/media-centre/media-releases/SingapoRediscovers-and-Expanded-Attractions-Guidelines.html
# https://www.stb.gov.sg/content/stb/en/media-centre/media-releases/Enterprise-Singapore-Sentosa-Development-Corporation-and-Singapore-Tourism-Board-team-up-with-industry-to-encourage-locals-to-rediscover-Singapore.html.html
# https://www.channelnewsasia.com/news/singapore/singaporediscovers-45-million-tourism-campaign-stb-singapoliday-12952932

In [1]:
import pandas as pd
import numpy as np
import time
from datetime import datetime

import requests
from selenium import webdriver
from bs4 import BeautifulSoup
import json
import re

### Instagram

In [2]:
# print datetime of scrape
print(f"Scrape performed on {datetime.now().date()} at {datetime.now().time()}.")

Scrape performed on 2020-10-11 at 19:37:15.953028.


In [3]:
%%time

# launch driver
driver = webdriver.Chrome()
url = "https://www.instagram.com/explore/tags/singaporediscovers/?hl=en/"
driver.get(url)
time.sleep(3)

# create empty set to add urls to
link_posts = set() # we use sets since we expect duplicate scraping as not all posts would have disappeared

# javascript function for insta's dynamic page to scroll to last post
# scrape posts url while scrolling to capture all elements before they disappear

# scrape post urls
tags = driver.find_elements_by_tag_name("a")
for tag in tags:
    link = tag.get_attribute("href")
    if "/p/" in link:
        link_posts.add(link)

# first scroll
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match = False
time.sleep(3)

# recursive scroll
while (match == False):
    # scrape post urls first
    tags = driver.find_elements_by_tag_name("a")
    for tag in tags:
        link = tag.get_attribute("href")
        if "/p/" in link:
            link_posts.add(link)
    # execute scroll
    lastCount = lenOfPage
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    time.sleep(3)
    # until last post
    if lastCount == lenOfPage:
        match = True

Wall time: 35min 38s


In [4]:
len(link_posts)

3108

In [5]:
def make_insta_dicts(list_urls):
    
    list_dict = [] # create empty list to append dicts of info
    
    for i, url in enumerate(list_urls):
        r = requests.get(url)
        
        if r.status_code == 200:
            
            print(f"Client response {i} received")
            # parse response as html
            html = BeautifulSoup(r.text, "lxml")
            # find body of post and convert to string
            script = html.find("script", text=lambda t: t.startswith("window._sharedData")).string
            # parse script as json obj
            post_json = json.loads(script.split("window._sharedData = ")[-1].rstrip(";"))
            # find where target info is stored
            core_json = post_json["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"]
            
            # try-except statement to extract target info since not all keys are present in each post's json
            try:
                post_id = core_json["id"]
            except:
                post_id = None
            try:
                post_slug =  core_json["shortcode"]
            except:
                post_slug = None
            try:
                unix_time = core_json["taken_at_timestamp"]
            except:
                unix_time = None
            try:
                date_time = datetime.utcfromtimestamp(core_json["taken_at_timestamp"]).strftime('%Y-%m-%d %H:%M:%S')
            except:
                date_time = None
            try:
                post_caption = core_json["edge_media_to_caption"]["edges"][0]["node"]["text"]
            except:
                post_caption = None
            try:
                hashtags = re.findall("\#\w+", core_json["edge_media_to_caption"]["edges"][0]["node"]["text"])
            except:
                hashtags = None
            try:
                topic_tags = [topic.strip() for topic in core_json["accessibility_caption"].split(":")[-1].replace("and", ",").split(",")]
            except:
                topic_tags = None
            try:
                is_video = core_json["is_video"]
            except:
                is_video = None
            try:
                is_ad = core_json["is_ad"]
            except:
                is_ad = None
            try:
                post_likes = core_json["edge_media_preview_like"]["count"]
            except:
                post_slug = None
            try:
                geo_tag = core_json["location"]["name"]
            except:
                geo_tag = None
            try:
                geo_slug = core_json["location"]["slug"]
            except:
                geo_slug = None
            try:
                owner_id = core_json["owner"]["id"]
            except:
                owner_id = None
            try:
                owner_verified = core_json["owner"]["is_verified"]
            except:
                owner_verified = None
            try:
                owner_privacy = core_json["owner"]["is_private"]
            except:
                owner_privacy = None
            try:
                owner_unpublished = core_json["owner"]["is_unpublished"]
            except:
                post_slug = None
            try:
                owner_total_posts = core_json["owner"]["edge_owner_to_timeline_media"]["count"]
            except:
                owner_total_posts = None
            try:
                owner_total_followers = core_json["owner"]["edge_followed_by"]["count"]
            except:
                owner_total_followers = None
                        
            # compile target info into dict format
            targets = ['post_id', 'post_slug', 'unix_time', 'date_time', 'post_caption', 'hashtags', 'topic_tags',
                       'is_video', 'is_ad', 'post_likes', 'geo_tag', 'geo_slug', 'owner_id', 'owner_verified',
                       'owner_privacy', 'owner_unpublished', 'owner_total_posts', 'owner_total_followers']
            dict_info = {}
            for variable in targets:
                dict_info[variable] = eval(variable)
            
            # append dict to list
            list_dict.append(dict_info)
            
        else:
            print(f"No response received for URL index {i}!")
            pass
            
        time.sleep(3) # sleep 3s between each request
        
    return list_dict # return appended list of dicts

In [6]:
%%time
list_dict = make_insta_dicts(link_posts)

Client response 0 received
Client response 1 received
Client response 2 received
Client response 3 received
Client response 4 received
Client response 5 received
Client response 6 received
Client response 7 received
Client response 8 received
Client response 9 received
Client response 10 received
Client response 11 received
Client response 12 received
Client response 13 received
Client response 14 received
Client response 15 received
Client response 16 received
Client response 17 received
Client response 18 received
Client response 19 received
Client response 20 received
Client response 21 received
Client response 22 received
Client response 23 received
Client response 24 received
Client response 25 received
Client response 26 received
Client response 27 received
Client response 28 received
Client response 29 received
Client response 30 received
Client response 31 received
Client response 32 received
Client response 33 received
Client response 34 received
Client response 35 received
Cl

Client response 287 received
Client response 288 received
Client response 289 received
Client response 290 received
Client response 291 received
Client response 292 received
Client response 293 received
Client response 294 received
Client response 295 received
Client response 296 received
Client response 297 received
Client response 298 received
Client response 299 received
Client response 300 received
Client response 301 received
Client response 302 received
Client response 303 received
Client response 304 received
Client response 305 received
Client response 306 received
Client response 307 received
Client response 308 received
Client response 309 received
Client response 310 received
Client response 311 received
Client response 312 received
Client response 313 received
Client response 314 received
Client response 315 received
Client response 316 received
Client response 317 received
Client response 318 received
Client response 319 received
Client response 320 received
Client respons

Client response 570 received
Client response 571 received
Client response 572 received
Client response 573 received
Client response 574 received
Client response 575 received
Client response 576 received
Client response 577 received
Client response 578 received
Client response 579 received
Client response 580 received
Client response 581 received
Client response 582 received
Client response 583 received
Client response 584 received
Client response 585 received
Client response 586 received
Client response 587 received
Client response 588 received
Client response 589 received
Client response 590 received
Client response 591 received
Client response 592 received
Client response 593 received
Client response 594 received
Client response 595 received
Client response 596 received
Client response 597 received
Client response 598 received
Client response 599 received
Client response 600 received
Client response 601 received
Client response 602 received
Client response 603 received
Client respons

Client response 852 received
Client response 853 received
Client response 854 received
Client response 855 received
Client response 856 received
Client response 857 received
Client response 858 received
Client response 859 received
Client response 860 received
Client response 861 received
Client response 862 received
Client response 863 received
Client response 864 received
Client response 865 received
Client response 866 received
Client response 867 received
Client response 868 received
Client response 869 received
Client response 870 received
Client response 871 received
Client response 872 received
Client response 873 received
Client response 874 received
Client response 875 received
Client response 876 received
Client response 877 received
Client response 878 received
Client response 879 received
Client response 880 received
Client response 881 received
Client response 882 received
Client response 883 received
Client response 884 received
Client response 885 received
Client respons

Client response 1131 received
Client response 1132 received
Client response 1133 received
Client response 1134 received
Client response 1135 received
Client response 1136 received
Client response 1137 received
Client response 1138 received
Client response 1139 received
Client response 1140 received
Client response 1141 received
Client response 1142 received
Client response 1143 received
Client response 1144 received
Client response 1145 received
Client response 1146 received
Client response 1147 received
Client response 1148 received
Client response 1149 received
Client response 1150 received
Client response 1151 received
Client response 1152 received
Client response 1153 received
Client response 1154 received
Client response 1155 received
Client response 1156 received
Client response 1157 received
Client response 1158 received
Client response 1159 received
Client response 1160 received
Client response 1161 received
Client response 1162 received
Client response 1163 received
Client res

Client response 1404 received
Client response 1405 received
Client response 1406 received
Client response 1407 received
Client response 1408 received
Client response 1409 received
Client response 1410 received
Client response 1411 received
Client response 1412 received
Client response 1413 received
Client response 1414 received
Client response 1415 received
Client response 1416 received
Client response 1417 received
Client response 1418 received
Client response 1419 received
Client response 1420 received
Client response 1421 received
Client response 1422 received
Client response 1423 received
Client response 1424 received
Client response 1425 received
Client response 1426 received
Client response 1427 received
Client response 1428 received
Client response 1429 received
Client response 1430 received
Client response 1431 received
Client response 1432 received
Client response 1433 received
Client response 1434 received
Client response 1435 received
Client response 1436 received
Client res

Client response 1678 received
Client response 1679 received
Client response 1680 received
Client response 1681 received
Client response 1682 received
Client response 1683 received
Client response 1684 received
Client response 1685 received
Client response 1686 received
Client response 1687 received
Client response 1688 received
Client response 1689 received
Client response 1690 received
Client response 1691 received
Client response 1692 received
Client response 1693 received
Client response 1694 received
Client response 1695 received
Client response 1696 received
Client response 1697 received
Client response 1698 received
Client response 1699 received
Client response 1700 received
Client response 1701 received
Client response 1702 received
Client response 1703 received
Client response 1704 received
Client response 1705 received
Client response 1706 received
Client response 1707 received
Client response 1708 received
Client response 1709 received
Client response 1710 received
Client res

Client response 1952 received
Client response 1953 received
Client response 1954 received
Client response 1955 received
Client response 1956 received
Client response 1957 received
Client response 1958 received
Client response 1959 received
Client response 1960 received
Client response 1961 received
Client response 1962 received
Client response 1963 received
Client response 1964 received
Client response 1965 received
Client response 1966 received
Client response 1967 received
Client response 1968 received
Client response 1969 received
Client response 1970 received
Client response 1971 received
Client response 1972 received
Client response 1973 received
Client response 1974 received
Client response 1975 received
Client response 1976 received
Client response 1977 received
Client response 1978 received
Client response 1979 received
Client response 1980 received
Client response 1981 received
Client response 1982 received
Client response 1983 received
Client response 1984 received
Client res

Client response 2226 received
Client response 2227 received
Client response 2228 received
Client response 2229 received
Client response 2230 received
Client response 2231 received
Client response 2232 received
Client response 2233 received
Client response 2234 received
Client response 2235 received
Client response 2236 received
Client response 2237 received
Client response 2238 received
Client response 2239 received
Client response 2240 received
Client response 2241 received
Client response 2242 received
Client response 2243 received
Client response 2244 received
Client response 2245 received
Client response 2246 received
Client response 2247 received
Client response 2248 received
Client response 2249 received
Client response 2250 received
Client response 2251 received
Client response 2252 received
Client response 2253 received
Client response 2254 received
Client response 2255 received
Client response 2256 received
Client response 2257 received
Client response 2258 received
Client res

Client response 2500 received
Client response 2501 received
Client response 2502 received
Client response 2503 received
Client response 2504 received
Client response 2505 received
Client response 2506 received
Client response 2507 received
Client response 2508 received
Client response 2509 received
Client response 2510 received
Client response 2511 received
Client response 2512 received
Client response 2513 received
Client response 2514 received
Client response 2515 received
Client response 2516 received
Client response 2517 received
Client response 2518 received
Client response 2519 received
Client response 2520 received
Client response 2521 received
Client response 2522 received
Client response 2523 received
Client response 2524 received
Client response 2525 received
Client response 2526 received
Client response 2527 received
Client response 2528 received
Client response 2529 received
Client response 2530 received
Client response 2531 received
Client response 2532 received
Client res

Client response 2774 received
Client response 2775 received
Client response 2776 received
Client response 2777 received
Client response 2778 received
Client response 2779 received
Client response 2780 received
Client response 2781 received
Client response 2782 received
Client response 2783 received
Client response 2784 received
Client response 2785 received
Client response 2786 received
Client response 2787 received
Client response 2788 received
Client response 2789 received
Client response 2790 received
Client response 2791 received
Client response 2792 received
Client response 2793 received
Client response 2794 received
Client response 2795 received
Client response 2796 received
Client response 2797 received
Client response 2798 received
Client response 2799 received
Client response 2800 received
Client response 2801 received
Client response 2802 received
Client response 2803 received
Client response 2804 received
Client response 2805 received
Client response 2806 received
Client res

Client response 3048 received
Client response 3049 received
Client response 3050 received
Client response 3051 received
Client response 3052 received
Client response 3053 received
Client response 3054 received
Client response 3055 received
Client response 3056 received
Client response 3057 received
Client response 3058 received
Client response 3059 received
Client response 3060 received
Client response 3061 received
Client response 3062 received
Client response 3063 received
Client response 3064 received
Client response 3065 received
Client response 3066 received
Client response 3067 received
Client response 3068 received
Client response 3069 received
Client response 3070 received
Client response 3071 received
Client response 3072 received
Client response 3073 received
Client response 3074 received
Client response 3075 received
Client response 3076 received
Client response 3077 received
Client response 3078 received
Client response 3079 received
Client response 3080 received
Client res

In [7]:
df_insta = pd.DataFrame(list_dict)
df_insta

Unnamed: 0,post_id,post_slug,unix_time,date_time,post_caption,hashtags,topic_tags,is_video,is_ad,post_likes,geo_tag,geo_slug,owner_id,owner_verified,owner_privacy,owner_unpublished,owner_total_posts,owner_total_followers
0,2415039914432249111,CGD8gG6H70X,1602115214,2020-10-08 00:00:14,We may be stressed and tied up during the week...,"[#selfcaresaturday, #saturday, #skincare, #sat...","[one or more people, text that says 'Charge yo...",False,False,2,,,3986460102,False,False,False,169,263
1,2403781281091523733,CFb8lbhMZSV,1600773080,2020-09-22 11:11:20,"Wanted mooncakes, but found something else... ...","[#oldman, #lunchtime, #streetphotography, #Chi...",,False,False,26,Chinatown,chinatown,32884185,False,False,False,512,269
2,2412436218553090968,CF6sfWKJMOY,1601804829,2020-10-04 09:47:09,"Me the local tourist, visited Pulau Ubin, Part...","[#singaporediscovers, #pulauubin, #bumboatride...",,False,False,15,"Pulau Ubin, Singapore",pulau-ubin-singapore,1477803308,False,False,False,1221,377
3,2405321894373859563,CFha4Tcn6Dr,1600956735,2020-09-24 14:12:15,How are you so far in 2020? ⁣\n ⁣\nIn the blin...,"[#exploreSingapore, #discoverSingapore, #Visit...","[1 person, bridge, night, outdoor.]",False,False,973,Singapore,singapore,4344244,False,False,False,1545,21104
4,2404452579460280953,CFeVOG2AHJ5,1600853121,2020-09-23 09:25:21,The COVID-19 pandemic has definitely made us u...,"[#hhwt, #havehalalwilltravel, #visitsingapore,...","[ocean, sky, outdoor, water, nature.]",False,False,331,Pulau Hantu,pulau-hantu,1708651786,False,False,False,4863,135003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3099,2405132000655600678,CFgvs_JAGwm,1600934098,2020-09-24 07:54:58,Singapore Changi Airport 「Day」☀️✈️\n\nチャンギ空港\...,"[#travels_cockaigne, #旅行_, #gulacockaigne, #ク,...","[sky, outdoor.]",False,False,10,Changi Airport,changi-airport,39849119645,False,False,False,143,48
3100,2408722895675960709,CFtgLYvHa2F,1601362166,2020-09-29 06:49:26,A table for five at Spice Brasserie please! \n...,"[#ShareYourMoments, #asianstreetfood, #throwba...","[6 people, people sitting.]",False,False,24,"PARKROYAL on Kitchener Road, Singapore",parkroyal-on-kitchener-road-singapore,2219928946,False,False,False,1236,2713
3101,2396497695679274342,CFCEfUAAV1m,1599904870,2020-09-12 10:01:10,The former Six Senses Duxton will be relaunche...,[],"[table, indoor.]",False,False,269,Six Senses Duxton,six-senses-duxton,777821651,False,False,False,4121,15353
3102,2405347795375872311,CFhgxNopdk3,1600959823,2020-09-24 15:03:43,#SingapoRediscovers 24 Sept 2020\n\nHave you e...,"[#SingapoRediscovers, #changiairport, #changij...",,False,False,3,Jewel Changi Airport,jewel-changi-airport,11840482991,False,False,False,208,31


In [8]:
df_insta.shape

(3104, 18)

In [15]:
filename = f"df_insta_{datetime.now().date()}"
df_insta.to_csv(f"../datasets/{filename}.csv", index=False)