---

### Travel location recommender:  

Before working on this task, I looked into some academic papers to get a general idea of how recommender systems are typically designed. It seems that many of the state-of-the-art recommender systems use either/both of:  
- [Collaborative Filtering](https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0):
    - This approach is based on the assumption that if customer A and customer B both likes item X, and customer A also likes item Y, then it is very likely that customer also likes item B. 
    - The main drawback of this approach is that, it suffers from the [cold start problem](https://en.wikipedia.org/wiki/Cold_start_(computing)). For the recommendations to be good, we require a large amount of existing records regarding customers' past engagements.  
  
   
- [Content Filtering](https://medium.com/@InDataLabs/approaching-the-cold-start-problem-in-recommender-systems-e225e0084970):
    - This approach basically looks into customer's past consumption of products/services, and looks for similar products/services to recommend to the user.
    - Basically, we are using a customer's past engagement with other products to predict whether he/she will like a future product
    
In this task, the goal is to **develop a travel location recommender engine using social media data (IG, Facebook, etc)**, so we have no access to any sort of travelling history. So we can't really use any of the two approaches above. 

*It should be mentioned that, if travelling history is provided, it might be possible to develop an **hybrid collaborative filtering model** using social media as inputs to alleviate the **cold start problem**. [This paper](https://arxiv.org/pdf/1606.07659.pdf) demonstrated this and based on their findings, social media data is very useful when historical data is scarce, but the benefits of adding social media data becomes less significant when more historical data is available.*

----
### Overall Framework   

In this task, I have narrowed down my data source to **Instagram**. To crawl data from Instagram, I used two third party libraries:  
- [huaying/instagram-crawler](https://github.com/huaying/instagram-crawler)
- [LevPasha/Instagram-API-python](https://github.com/LevPasha/Instagram-API-python)
  
To suggest a **travel location** to an Instagram user, the following assumption is made:  
- User is more likely to enjoy travelling to locations visited by *the people he/she is following on Instagram*
  
*For instance, if I am following user X on Instagram, and user X posted about his vacation to Australia lately, then I am likely going to enjoy visiting Australia.*

#### Idea: 
1. Look into a person's followings (the people he is following), and iterate through their recent posts
2. For each post, check if it belongs to a **"travelling"** topic. 
3. If it is, check if it belongs to these topics: **"australia"**, **"china"**, **"japan"** etc. 
4. If yes, then we know that one of the user's followings have travelled to **"australia"**/**"china"**/**"japan"** etc. We can then recommend that location to the user.

In 2, to determine whether a post belongs to the **travelling** topic, we will be making use of the *hashtags* in that post. More specifically, if any of the hashtags are *semantically closely related* to **travelling**, then we will consider the post as belonging to the **travelling** topic.

Similarly, in 3, we need to determine the whether any of the hashtags are *semantically closely related* to any of *location topics* (ie. **"australia"**, **"china"**, **"japan"** etc. ). 
  
So the problem reduces to: how to determine the *semantic similarity of two hashtags*. 

---
### Clustering for determining semantic similarity
My approach is largely inspired by the approach adopted in [this project report](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.383.1024&rep=rep1&type=pdf). They used clustering of hashtags classifiy Tweets into different topics.  

In that report, it was mentioned that there is strong evidence in the academic literature (for instance [this paper](http://arxiv.org/pdf/1111.6553v1)) that co-occurence of hashtags imply semantic similarity. In other words, if **P(hashtag X occurs | hashtag Y occurs)** and **P(hashtag Y occurs | hashtag X occurs)** are both high, then hashtag X and Y should closely related.   

More formally, MIN(**P(X | Y)**, **P(Y | X)**) can be used as a distance metric.

In this task, there is a simple way to estimate **P(X | Y)** and **P(Y | X)** using the concept of *sampling*:
- First, crawl the most recent N posts containing hashtag Y. Determine from those N posts how many times hashtag X also occured. If N is large, **num_times_X_occured / N** is a good estimate of **P(X | Y)**. 
- Use a similar approach to estimate **P(Y | X)**.

In theory, we could estimate the *distance* between all pairs of tags we have encountered, and then perform clustering to identify the clusters representing **travel** topic and all different location topics. However, after several experiments I found to be way too time consuming. So I decided to construct the clusters in a different manner:

#### Example: constructing the 'travel' cluster
1. Crawl the most recent N posts containing the **#travel** hashtag. Store all hashtags that we encountered in these N posts.
2. From each encountered hashtag *#H*, estimate **P(#H | #travel)**. If **P(#H | #travel)** is smaller than a specified threshold, prune it. Store the remaining hashtags in a *travel_set*.
3. From the each remaining hashtag *#T*, perform step 1. and 2. using *#T* as the subject (substitute **#travel** with *#T*). Store those resulting hashtags in *T_set*.
4. Perform *T_set* = *travel_set* **intersect** *T_set* for all *T*. 
5. Return the union of all *T_sets*. 

We can use the same approach to construct *travel clusters* for each travel location which we may want to recommend to the user.   

---

In this jupyter notebook, we will be constructing the clusters using the approach mentioned above.

---

### Load packages

For implementation details of the **Cluster** class, refer to **hashtag_cluster.py** in the **src** folder.

In [1]:
import sys
sys.path.extend(['../src', '../instagram_crawler'])

from hashtag_cluster import Cluster

%load_ext autoreload
%autoreload 2

### Set hyperparameters

The MIN_COOCURRENCE_PROBABILITY here is the threshold for MIN(**P(X | Y)**, **P(Y | X)**). SAMPLE_SIZE is the number of posts we are going to crawl during estimation of **P(X | Y)** and **P(Y | X)**.

In [2]:
SAMPLE_SIZE = 10
MIN_COOCURRENCE_PROBABILITY = 0.01
DIR = './output/'
topics = ['travel', 'japan', 'taiwan', 'korea', 'china', 'singapore', 'australia', 'europe']

We have selected a SAMPLE_SIZE of only 10 due to time constraint. In practice, it might be better to use a larger number (eg. 1000). 

### Crawl posts and construct topic clusters

In [3]:
for topic in topics:
    max_iters = 5
    while max_iters > 0:
        max_iters -= 1
        try:
            print("Constructing cluster for: %s" % topic)
            cluster = Cluster(topic, 
                              min_coocurrence_probablity=MIN_COOCURRENCE_PROBABILITY, 
                              sample_size=SAMPLE_SIZE)
            filename = DIR + topic
            cluster.save_cluster(filename)
            break
        except:
            print("Error occured constructing cluster for: %s \n" % topic)
            continue

Constructing cluster for: travel


fetching: 33it [00:01, 28.73it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.15it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.86it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.87it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.39it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.65it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.22it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.90it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.17it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.10it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 30.41it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.56it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.83it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.84it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.99it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: japan


fetching: 33it [00:01, 25.14it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.77it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.02it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.62it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: japan 

Constructing cluster for: japan


fetching: 33it [00:01, 23.50it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 30.72it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.32it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.49it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.34it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.42it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.18it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.55it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: taiwan


fetching: 33it [00:01, 22.88it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.54it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.24it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:03,  6.99it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.11it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.79it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.25it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.19it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.34it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.68it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.14it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.58it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: korea


fetching: 33it [00:01, 26.49it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.47it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.67it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.87it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.23it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.87it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.53it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.59it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.34it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.62it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.69it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:03,  6.78it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.82it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.58it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.56it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.82it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.36it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.51it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.26it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.02it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.65it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.88it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: korea 

Constructing cluster for: korea


fetching: 33it [00:01, 28.79it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.10it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.66it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.68it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.86it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.43it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.63it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.28it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: korea 

Constructing cluster for: korea


fetching: 33it [00:01, 29.48it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.69it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.09it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.33it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.03it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.68it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.92it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.70it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:03,  6.94it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.53it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:03,  6.74it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.47it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.49it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.66it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.51it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.43it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.34it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  7.66it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.50it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.14it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.03it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.61it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.39it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.62it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 21it [00:02,  8.08it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.37it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: china


fetching: 33it [00:01, 27.18it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.68it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: china 

Constructing cluster for: china


fetching: 33it [00:01, 25.48it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.91it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.27it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: china 

Constructing cluster for: china


fetching: 33it [00:01, 28.72it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.27it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.36it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.39it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: china 

Constructing cluster for: china


fetching: 33it [00:01, 30.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.27it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.80it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.42it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: singapore


fetching: 33it [00:01, 29.63it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.95it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.63it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.97it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.59it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 31.95it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.78it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.65it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.97it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.73it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.86it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.51it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.61it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.65it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.89it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.63it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.31it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.90it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.64it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 21.97it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 21.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.75it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.08it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.47it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.17it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 29.10it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.35it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.58it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.04it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.15it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.25it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.39it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: singapore 

Constructing cluster for: singapore


fetching: 33it [00:01, 27.03it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.13it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.72it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: australia


fetching: 33it [00:01, 25.12it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 21.70it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.91it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.54it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.95it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.44it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.07it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.57it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Error occured constructing cluster for: australia 

Constructing cluster for: australia


fetching: 33it [00:01, 24.83it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.26it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 27.43it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.69it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 24.78it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.34it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 30.55it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.48it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.86it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.17it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 28.63it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.
Constructing cluster for: europe


fetching: 33it [00:01, 23.89it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 22.06it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.52it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 25.36it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 23.84it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


fetching: 33it [00:01, 26.86it/s]                                                                                                                                                                                 


Done. Fetched 10 posts.


### Load Saved Clusters

In [9]:
for topic in topics:
    cluster = Cluster(topic, None, None, DIR + topic)
    print(topic)
    print(cluster.tags)
    print('----------------------------------------------------------------------')

travel
{'travelphotography', 'picoftheday', 'love', 'travelgram', 'beautiful', 'travel', 'wanderlust', 'instagood', 'landscape', 'nature', 'city', 'photography', 'photooftheday', 'travelling'}
----------------------------------------------------------------------
japan
{'travel', 'visitjapan', 'japan', 'ファインダー越しの私の世界', 'nature', 'japanesegirl', 'model', 'followme'}
----------------------------------------------------------------------
taiwan
{'台灣', '台湾', 'taiwan', '旅行', 'travel', 'taiwan1', 'taipei', 'iseetaiwan', 'iformosa', '写真', 'amazingtaiwan'}
----------------------------------------------------------------------
korea
{'いいね返し', '좋아요', '셀카', 'follow', 'daily', 'instagood', '인친', '얼스타그램', '맞팔', '좋아요반사', '데일리', '일상', '소통', '셀스타그램', 'ootd', '선팔하면맞팔', '맞팔해요', 'followme', '셀피', 'like4like', 'selfie', 'korea', 'outfit', 'fff', '팔로우', 'girl'}
----------------------------------------------------------------------
china
{'travel', 'china', 'nature'}
----------------------------------------

### Observation

We can see that even though SAMPLE_SIZE is relatively small, the results aren't too bad. 
- Most of the clusters actually make sense semantically. For instance the **australia cluster** contains hashtags such as *exploringaustralia*, *queensland*, *newsouthwales* which we know are semantically closely related to *australia*.