Is Web scraping legal? 
- [Link](https://blog.apify.com/is-web-scraping-legal/#:~:text=Myth%201%3A%20Web%20scraping%20is%20illegal,-It's%20all%20a&text=In%20most%20cases%2C%20it%20is,or%20rule%20banning%20web%20scraping.)


Purpose
- Scrape all the listings with every features it has ✅
- While scraping listings, reviews are needed to be scraped as well to be used as target variable ✅
- Scrape the reviews made by hosts on the users 

Random ideas
- Probably grouping user by certain criteri(region) for the interests of having more dense dataset 
- Need to understand more about basics of recommendation system so that I could gain understanding on how the scraped dataset can help in personalizing recommendation for every user
- I am more interested in building graph learning system, so ya can keep my focus on that
- Seeing how does the diversity and relevance of content recommended by the model in response to the given inputs for a given user changes over the time when models are retrained on a frequent basis is far more important in model intepretability
- how to debias logged data, inverse propensity importance weighting
- Check out what affect user interest (text?, image?, length of text?, price?)
- How to adjust to new data/user taste (performance on each user has drop based on certain "metric")
- Multi-objective optimization

Problems you wanna solve
- scalability of GNN
- pipeline that stream data in and to make the recommendation system adaptive to the changing data, learn more about the users and adapter to their taste
  - debias logged data
  - CT
  - how to measure whether the user behaviour has changed?
    - Check the distance between the location of latest listing the user rent and the central point/ average location of the previous listings users rented before
    - use bandits to check what is the threshold
- lack of ability to identify and utilize intrinsic relationship between data points  in recommendation system/ lack of intepretability
  - graph learning
  - mathematical way
  - a way to break down the graph into subgraph of interest
- solve the cold start problem
  - meta-learning
- challenge of understanding representation and content of the items
  -  training item embeddings that preserve semantically meaningful relationship/information 
- challenge of understanding relationship between items or items and users behaviours




Assumptions
- All the listings retrieved are always available
- use original price to exclude out promotion factor
- price per night are determing using same check in, checkout date across all the listings

Questions
- **Feasbility of this project** 
  - Is bandits algo suitable for applying on this airbnb dataset
    - Well, it definitely can be applied because we can set the a certain threshold to determine whether the review is considered as reward (binary variable) assume that what I scraped from airbnb website is a user-listing rating matrix
    - But i realize that bandit algo is not a batch machine learning approach but more like online learning approach which it will be continuously learning to maximize its reward so we don't have to train a model in offline
      - But how graph features can be applied?
        - I believe contextual bandit will comes in handy at this moment
    - According to eugeneyan's recap of RecSys 2022, most bandits approach are not robust to non-stationary environment
    -   so might need to consider whether I should use bandit approach or batch machine learning approach
  - Is graph learning applicable?
  - Is Airbnb listings a good dataset for training recommendation system
  - Is review rating a suitable target variable to train my recommender? 
  - Reason of using airbnb listings?
    - There is a growing trend of staycation/renting gorgeous place for photo shooting
    - It is undeniable that there is a growing trend in staycation and renting stays for photo shooting. For example, social media influencers tend to rent Airbnb stays that have some particular attractive points like gorgeous scenery, modern design, and so on for their photo shooting work. Furthermore, There are more people who prefer relaxing by having a staycation on the weekend due to its advantages like budget saving and ease of travel. Therefore, this growing trend has opened a new market of profit for Airbnb to exploit but t
- **Potential improvement of this project** 
  - Thompson sampling
    - Learn Bayesian Statistics first
  
- **Dataset detail**
  - How many listings I can get?
      - 25768
  - What are the regions that the listings locate at i should look into
      - Inspired by this article(https://www.stratosjets.com/blog/airbnb-statistics/#:~:text=Airbnb%20has%20listings%20in%20more%20than%20220%20countries%20and%20regions.&text=People%20stay%20an%20average%20of,has%20about%20245%2C000%20Airbnb%20listings.)
  - Is there a way to extract any regions from airbnb
      - Can be scraped from listing detail page
  - Is there a way to extract the category tags
      - manually recording down the category tags
  - Retrieve image as well


In [1]:
import json
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
import time
import os
import pandas as pd
from multiprocessing import Pool
import math
from enum import Enum


API_KEY = "d306zoyjsyarp7ifhu67rjxn52tv0t20"
API_ENDPOINT = "https://www.airbnb.com/api/v3/StaysSearch?operationName=StaysSearch&locale=en&currency=USD"
HEADERS = {
    "X-Airbnb-API-Key":API_KEY
}
QUERY_PARAMS = "?check_in=2022-09-17&check_out=2022-09-18&enable_auto_translate=true&locale=en&country_override=US"
class LISTING_DETAIL_URL_PREFIX(str, Enum):
    NORMAL = 'rooms'
    PLUS = 'rooms/plus'
    LUXE = 'luxury/listing'
    
    @classmethod
    def get_url_prefix(cls, listing_detail_type):
        for item in cls:
            if item.name == listing_detail_type:
                return item.value

class LISTING_DETAIL_TYPE(str, Enum):
    NORMAL = 'NORMAL'
    PLUS = 'PLUS'
    LUXE = 'LUXE'

    
RAW_FOLDER = "raw"
PROCESSED_FOLDER = "processed"

    
def write_dict_into_json(dictionary, filename):
    with open(filename, "w") as write_file:
        json.dump(dictionary, write_file, indent=4)

# Getting list of listing ids

### StaysSearch

In [9]:
class StaysSearchScraper:
    def __init__(self, query):
        self.query = query
        self.max_page = 20
        self.API_ENDPOINT = "https://www.airbnb.com/api/v3/StaysSearch?operationName=StaysSearch&locale=en&currency=USD"
        self.API_KEY = "d306zoyjsyarp7ifhu67rjxn52tv0t20"
        self.HEADERS = {
            "X-Airbnb-API-Key":self.API_KEY 
        }
        self.listings = []
        
    def build_request_payload(self, items_offset,section_offset):
        return {
            "operationName": "StaysSearch",
            "variables": {
            "isInitialLoad": 'true',
            "hasLoggedIn": 'true',
            "cdnCacheSafe": 'false',
            "source": "EXPLORE",
            "exploreRequest": {
              "metadataOnly": 'false',
              "version": "1.8.3",
              "tabId": "home_tab",
              "refinementPaths": [
                "/homes"
              ],
              "priceFilterInputType": 0,
              "datePickerType": "flexible_dates",
              "source": "structured_search_input_header",
              "searchType": "unknown",
              "flexibleTripLengths": [
                "one_week"
              ],
              "federatedSearchSessionId": "7d4abc19-5d13-4b1e-9390-4aaa3075b565",
              "itemsOffset": items_offset,
              "sectionOffset": section_offset,
              "query": self.query,
              "itemsPerGrid": 20,
              "cdnCacheSafe": 'false',
              "treatmentFlags": [
                "flex_destinations_june_2021_launch_web_treatment",
                "new_filter_bar_v2_fm_header",
                "new_filter_bar_v2_and_fm_treatment",
                "merch_header_breakpoint_expansion_web",
                "flexible_dates_12_month_lead_time",
                "storefronts_nov23_2021_homepage_web_treatment",
                "lazy_load_flex_search_map_compact",
                "lazy_load_flex_search_map_wide",
                "im_flexible_may_2022_treatment",
                "im_flexible_may_2022_treatment",
                "flex_v2_review_counts_treatment",
                "search_add_category_bar_ui_ranking_web_aa",
                "flexible_dates_options_extend_one_three_seven_days",
                "super_date_flexibility",
                "micro_flex_improvements",
                "micro_flex_show_by_default",
                "search_input_placeholder_phrases",
                "pets_fee_treatment"
              ],
              "screenSize": "large",
              "isInitialLoad": 'true',
              "hasLoggedIn": 'true'
            },
            "staysSearchM2Enabled": 'false',
            "staysSearchM6Enabled": 'false'
            },
            "extensions": {
            "persistedQuery": {
              "version": 1,
              "sha256Hash": "a743098b24b86de94a0af99620b9f6a35664ca58e1dd0afdc81802dd157d1155"
            }
            }
        }
    
    def get_total_inventory_count(self):
        json_payload = self.build_request_payload(0,1)
        r = requests.post(url = self.API_ENDPOINT, json = json_payload, headers=self.HEADERS)
        data = r.json()
        sections = data["data"]['presentation']['explore']['sections']['sections']
        pagination_section = None
        for section in sections:
            if section['sectionComponentType'] == "EXPLORE_NUMBERED_PAGINATION":
                pagination_section = section
        
        self.total_inventory_count = int(pagination_section['section']['totalInventoryCount'])

    def get_list_of_listing_id(self):
        items_offset = 0
        section_offset = 1
        while items_offset <= self.total_inventory_count:
            try:
                json_payload = self.build_request_payload(items_offset,section_offset)
                r = requests.post(url = self.API_ENDPOINT, json = json_payload, headers=self.HEADERS)
                data = r.json()
                sections = data["data"]['presentation']['explore']['sections']['sections']
                listing_section = None
                pagination_section = None
                for section in sections:
                    if section['sectionComponentType'] == "EXPLORE_SECTION_WRAPPER":
                        listing_section = section

                listings = listing_section['section']["child"]['section']['items']
                for listing in listings:
                    self.listings.append(listing["listing"]["id"])
                items_offset += self.max_page
                if items_offset == 300 or items_offset == 600 or items_offset == 900:
                    section_offset += 1
            except Exception as error:
                print(error)

        self.listings = list(set(self.listings))


## ExploreSections

In [3]:
class ExploreSectionScraper:
    def __init__(self, category_tag):
        self.category_tag = category_tag
        self.max_page = 20
        self.max_items_offset = 500
        self.API_KEY = "d306zoyjsyarp7ifhu67rjxn52tv0t20"
        self.HEADERS = {"X-Airbnb-API-Key": self.API_KEY}
        self.listings = []

    def build_request_url(self, category_tag, items_offset, section_offset):
        return f"https://www.airbnb.com/api/v3/ExploreSections?operationName=ExploreSections&locale=en&currency=USD&variables=%7B%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Atrue%2C%22cdnCacheSafe%22%3Afalse%2C%22source%22%3A%22EXPLORE%22%2C%22exploreRequest%22%3A%7B%22metadataOnly%22%3Afalse%2C%22version%22%3A%221.8.3%22%2C%22tabId%22%3A%22all_tab%22%2C%22refinementPaths%22%3A%5B%22%2Fhomes%22%5D%2C%22searchMode%22%3A%22flex_destinations_search%22%2C%22itemsPerGrid%22%3A20%2C%22cdnCacheSafe%22%3Afalse%2C%22treatmentFlags%22%3A%5B%22flex_destinations_june_2021_launch_web_treatment%22%2C%22merch_header_breakpoint_expansion_web%22%2C%22flexible_dates_12_month_lead_time%22%2C%22storefronts_nov23_2021_homepage_web_treatment%22%2C%22lazy_load_flex_search_map_compact%22%2C%22lazy_load_flex_search_map_wide%22%2C%22im_flexible_may_2022_treatment%22%2C%22im_flexible_may_2022_treatment%22%2C%22flex_v2_review_counts_treatment%22%2C%22search_add_category_bar_ui_ranking_web_aa%22%2C%22flexible_dates_options_extend_one_three_seven_days%22%2C%22super_date_flexibility%22%2C%22micro_flex_improvements%22%2C%22micro_flex_show_by_default%22%2C%22search_input_placeholder_phrases%22%2C%22pets_fee_treatment%22%5D%2C%22screenSize%22%3A%22large%22%2C%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Atrue%2C%22flexibleTripLengths%22%3A%5B%22one_week%22%5D%2C%22locationSearch%22%3A%22MIN_MAP_BOUNDS%22%2C%22categoryTag%22%3A%22Tag%3A{category_tag}%22%2C%22priceFilterInputType%22%3A0%2C%22priceFilterNumNights%22%3A5%2C%22itemsOffset%22%3A{items_offset}%2C%22sectionOffset%22%3A{section_offset}%2C%22federatedSearchSessionId%22%3A%2253c99239-03a7-4259-a5fe-22d1bb362ed3%22%7D%2C%22gpRequest%22%3A%7B%22expectedResponseType%22%3A%22INCREMENTAL%22%7D%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%2247aebb6c939057cf8d630a02a99021e11ab88eb0c9b889a21cb9a4c720cabf07%22%7D%7D"

    def get_list_of_listing_id(self):
        items_offset = 0
        section_offset = 0
        while items_offset <= self.max_items_offset:
            try:
                url = self.build_request_url(self.category_tag, items_offset, section_offset)
                r = requests.get(url=url, headers=self.HEADERS)
                data = r.json()
                items = data["data"]["presentation"]["explore"]["sections"]["responseTransforms"][
                    "transformData"
                ][0]["sectionContainer"]["section"]["child"]["section"]["items"]
                for item in items:
                    self.listings.append(item["listing"]["id"])
            except Exception as error:
                print(error)
            finally:
                items_offset += self.max_page

### Read data

In [4]:
# import json
  

# f1 = open('../data/category_tags_output.json')
# f2 = open('../data/querys_output.json')
# f3 = open('../data/listings.json')

# output1 = json.load(f1)
# unique_output1 = (output1['unique'])
# print(len(output1['all']))
# print(len(unique_output1))
# print(len(list(set(unique_output1))))

# output2 = json.load(f2)
# unique_output2 = (output2['unique'])
# print(len(output2['all']))
# print(len(unique_output2))
# print(len(list(set(unique_output2))))

# output3 = json.load(f3)
# unique_output3 = (output3['Thailand'])
# print(len(unique_output3))
# print(len(list(set(unique_output3))))

In [5]:
# all_listings = [*unique_output1,*unique_output2,*unique_output3]
# all_unique_listings = (list(set(all_listings)))
# print(len(all_listings))
# print(len(all_unique_listings))
# scraped_data = {
#     "all_unique_listing_ids":all_unique_listings
# }
# write_dict_into_json(scraped_data, "scraped_data.json")

In [6]:
f = open(f"../data/{PROCESSED_FOLDER}/all_unique_listing_ids.json")
all_unique_listing_ids = json.load(f)
len(all_unique_listing_ids)

25767

In [7]:
all_unique_listing_ids.index('12419192')

21009

In [60]:
all_unique_listing_ids[20938]

'13931366'

Reviews Counts

In [38]:
all_reviews_counts = []
directory = f"../data/{RAW_FOLDER}/reviews_counts"
for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        review_count_f  = open(f)      
        review_count = json.load(review_count_f)
        all_reviews_counts.append(review_count)
print(len(all_reviews_counts))
write_dict_into_json(all_reviews_counts, f"../data/{PROCESSED_FOLDER}/all_reviews_counts.json")

25767


In [44]:
filename = f"../data/{PROCESSED_FOLDER}/all_reviews_counts.json"
all_reviews_counts_df = pd.read_json(filename)
error_reviews_counts = all_reviews_counts_df[all_reviews_counts_df['reviews_count'] == "Error"]['listing_id'].to_list()

Reviews

In [82]:
old_reviews_f = open(f"../data/{RAW_FOLDER}/old_reviews.json")
old_reviews = json.load(old_reviews_f)

new_reviews_f = open(f"../data/{RAW_FOLDER}/new_reviews.json")
new_reviews = json.load(new_reviews_f)

new_new_reviews = {}
directory = f"../data/{RAW_FOLDER}/reviews"
for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        try:
            reviews_f  = open(f)      
            reviews = json.load(reviews_f)
            listing_id = list(reviews.keys())[0]
            new_new_reviews[listing_id] = reviews[listing_id]
        except:
            print(f)

../data/raw/reviews/.DS_Store


In [83]:
o = list(old_reviews.keys())
n = list(new_reviews.keys())
print(len(o))
print(len(n))

len([i for i in n if i not in o])

636
2951


2315

In [79]:
unique_listing_ids_from_scraped_reviews = []
unique_listing_ids_from_scraped_reviews.extend(new_reviews.keys())
unique_listing_ids_from_scraped_reviews.extend(old_reviews.keys())
unique_listing_ids_from_scraped_reviews.extend(new_new_reviews.keys())
unique_listing_ids_from_scraped_reviews  = list(set(unique_listing_ids_from_scraped_reviews ))
len((unique_listing_ids_from_scraped_reviews))

2951

In [85]:
[i for i in list(new_reviews.keys()) if i in list(new_new_reviews.keys())]

['24414782', '672646130912795985']

In [92]:
len(new_reviews['672646130912795985'])

0

In [91]:
len(new_new_reviews['672646130912795985'])

5

In [93]:
write_dict_into_json(new_new_reviews, f"../data/{RAW_FOLDER}/new_new_reviews.json")

In [94]:
all_reviews = new_reviews.copy()
all_reviews.update(new_new_reviews)

In [95]:
len(all_reviews.keys())

25767

In [98]:
write_dict_into_json(all_reviews, f"../data/{RAW_FOLDER}/compiled_reviews.json")

### compiled_reviews.json contains all the reviews in first iteration

##### Retry folder

In [10]:
directory = f"../data/{RAW_FOLDER}/retry"
retry_reviews = {}

for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        try:
            reviews_f  = open(f)      
            reviews = json.load(reviews_f)
            listing_id = list(reviews.keys())[0]
            retry_reviews[listing_id] = reviews[listing_id]
        except:
            print(f)

In [15]:
list(retry_reviews.keys())[0]
write_dict_into_json(retry_reviews, f"../data/{RAW_FOLDER}/compiled_retry_reviews.json")

##### Retry2 folder

In [2]:
directory = f"../data/{RAW_FOLDER}/retry2"
retry2_reviews = {}

for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        try:
            reviews_f  = open(f)      
            reviews = json.load(reviews_f)
            listing_id = list(reviews.keys())[0]
            retry2_reviews[listing_id] = reviews[listing_id]
        except:
            print(f)

In [7]:
list(retry2_reviews.keys())[0]
write_dict_into_json(retry2_reviews, f"../data/{RAW_FOLDER}/compiled_retry2_reviews.json")

## Scrape reviews
Reviews (Target variable)
- Trigger API request
    - Link: https://www.airbnb.com.sg/api/v3/PdpReviews?operationName=PdpReviews&locale=en-SG&currency=USD&variables=%7B%22request%22%3A%7B%22fieldSelector%22%3A%22for_p3_translation_only%22%2C%22limit%22%3A7%2C%22listingId%22%3A%221083329%22%2C%22offset%22%3A%2249%22%2C%22showingTranslationButton%22%3Afalse%2C%22checkinDate%22%3A%222022-12-04%22%2C%22checkoutDate%22%3A%222022-12-09%22%2C%22numberOfAdults%22%3A%224%22%2C%22numberOfChildren%22%3A%220%22%2C%22numberOfInfants%22%3A%220%22%7D%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%226a71d7bc44d1f4f16cced238325ced8a93e08ea901270a3f242fd29ff02e8a3a%22%7D%7D
    - plays with value of "limit" and "offset"
    - Num of total reviews is determined using "data.merlin.pdpReviews.metadata.reviewsCount" in the response 
- Request header need to include "X-Airbnb-API-Key" with value "d306zoyjsyarp7ifhu67rjxn52tv0t20"
- Note:
    - Absence of review, API will return 0 "reviewsCount"
    - Some listings to be double checked: 4153164, 18720892
    

In [11]:
class ReviewsScraper:
    def __init__(self, listing_ids):
        self.listing_ids = listing_ids
        self.max_limit = 7
        self.API_KEY = "d306zoyjsyarp7ifhu67rjxn52tv0t20"
        self.HEADERS = {"X-Airbnb-API-Key": self.API_KEY}
        self.all_reviews = {}

    def build_request_url(self, listing_id, offset, limit):
        return f"https://www.airbnb.com/api/v3/PdpReviews?operationName=PdpReviews&locale=en&currency=USD&variables=%7B%22request%22%3A%7B%22fieldSelector%22%3A%22for_p3_translation_only%22%2C%22limit%22%3A{limit}%2C%22listingId%22%3A%22{listing_id}%22%2C%22offset%22%3A%22{offset}%22%2C%22showingTranslationButton%22%3Afalse%2C%22checkinDate%22%3A%222022-11-06%22%2C%22checkoutDate%22%3A%222022-11-12%22%2C%22numberOfAdults%22%3A%221%22%2C%22numberOfChildren%22%3A%220%22%2C%22numberOfInfants%22%3A%220%22%7D%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%226a71d7bc44d1f4f16cced238325ced8a93e08ea901270a3f242fd29ff02e8a3a%22%7D%7D"

    def get_reviews_count(self, listing_id):
        url = self.build_request_url(listing_id, 0, self.max_limit)
        r = requests.get(url=url, headers=self.HEADERS)
        data = r.json()
        reviews_count = data["data"]["merlin"]["pdpReviews"]["metadata"]["reviewsCount"]
        return reviews_count

    def export_reviews(self, listing_id, reviews):
        self.all_reviews[listing_id] = reviews
        write_dict_into_json(self.all_reviews, "reviews.json")

    def get_reviews(self):
        for listing_id in self.listing_ids:
            print(f"{listing_id} started")
            offset = 0
            self.export_reviews(listing_id, [])
            max_reviews_count = self.get_reviews_count(listing_id)
            print(f"{listing_id} has {max_reviews_count} review count")
            if max_reviews_count == 0:
                continue
            while offset <= max_reviews_count:
                try:
                    url = self.build_request_url(listing_id, offset, self.max_limit)
                    r = requests.get(url=url, headers=self.HEADERS)
                    data = r.json()
                    reviews = data["data"]["merlin"]["pdpReviews"]["reviews"]
                    temp = self.all_reviews[listing_id]
                    temp.extend(reviews)
                    self.export_reviews(listing_id, temp)
                except Exception as error:
                    print(error)
                finally:
                    offset += self.max_limit

In [None]:
f = open('reviews.json')
reviews = json.load(f)

In [None]:
# got response and translated review
reviews['4316692'][84]

In [None]:
# no response and no translated review
reviews['4316692'][0]

### Potential a better way to scrape reviews?

In [82]:
def extract_element_data(soup, params):
    """Extracts data from a specified HTML element"""

    # 1. Find the right tag
    if "class" in params:
        elements_found = soup.find_all(params["tag"], params["class"])
    else:
        elements_found = soup.find_all(params["tag"])

    # 2. Extract text from these tags
    if "get" in params:
        element_texts = [el.get(params["get"]) for el in elements_found]
    else:
        element_texts = [el.get_text() for el in elements_found]

    # 3. Select a particular text or concatenate all of them
    tag_order = params.get("order", 0)
    if tag_order == -1:
        output = "**__**".join(element_texts)
    else:
        output = element_texts[tag_order]

    return output


def build_reviews_page_url(listing_id, listing_detail_type=LISTING_DETAIL_TYPE.NORMAL.value):
    url_prefix = LISTING_DETAIL_URL_PREFIX.get_url_prefix(listing_detail_type)
    return f"https://www.airbnb.com/{url_prefix}/{listing_id}/reviews{QUERY_PARAMS}"


def get_listing_detail_type_from_url(url):
    if "/plus" in url:
        return LISTING_DETAIL_TYPE.PLUS.value
    elif "/luxury" in url:
        return LISTING_DETAIL_TYPE.LUXE.value
    else:
        return LISTING_DETAIL_TYPE.NORMAL.value


def extract_reviews_soup(listing_id, required_waiting_time=[20, 60], small_delay_time=1):
    print(listing_id)
    try:
        # Get right listing detail url prefix
        temp_url = build_reviews_page_url(listing_id)
        driver = webdriver.Chrome(ChromeDriverManager().install())
        driver.get(temp_url)
        current_url = driver.current_url
        listing_detail_type = get_listing_detail_type_from_url(current_url)
        if listing_detail_type != LISTING_DETAIL_TYPE.NORMAL.value:
            driver.quit()
            url = build_reviews_page_url(listing_id, listing_detail_type)
            driver = webdriver.Chrome(ChromeDriverManager().install())
            driver.get(url)

        # Check whether listing has review
        HAS_REVIEW_CLASS_NAME = "_s65ijh7"
        try:
            has_review = WebDriverWait(driver, required_waiting_time[0]).until(
                EC.presence_of_element_located((By.CLASS_NAME, HAS_REVIEW_CLASS_NAME))
            )
        except:
            return BeautifulSoup("", features="html.parser")

        # Find review modal
        try:
            if has_review:
                REVIEWS_MODAL_CLASS_NAME = "_17itzz4"
                RULES_REVIEWS_PAGE = {
                    "num_of_reviews": {"tag": "span", "class": "_1qx9l5ba"},
                }
                try:
                    review_modal = WebDriverWait(driver, required_waiting_time[1]).until(
                        EC.presence_of_element_located((By.CLASS_NAME, REVIEWS_MODAL_CLASS_NAME))
                    )
                except:
                    review_modal = None

                print(f"{listing_id}: {review_modal}")
                # Scroll acccording to number of reviews available and extract html
                if review_modal:
                    temp_soup = BeautifulSoup(driver.page_source, features="html.parser")
                    num_of_reviews_html = extract_element_data(
                        temp_soup, RULES_REVIEWS_PAGE["num_of_reviews"]
                    )
                    num_of_reviews = int(num_of_reviews_html.split(" ")[0])
                    time.sleep(1)
                    num_of_scrolling_needed = num_of_reviews // 2
                    for i in range(num_of_scrolling_needed):
                        driver.execute_script(
                            "arguments[0].scrollTop = arguments[0].scrollHeight", review_modal
                        )
                        time.sleep(2)
        except:
            return BeautifulSoup("", features="html.parser")

        detail_page = driver.page_source
        driver.quit()

        return BeautifulSoup(detail_page, features="html.parser")
    except Exception as e:
        print(e)
        return BeautifulSoup("", features="html.parser")


def process_reviews_pages():
    """Runs reviews pages processing in parallel"""
    f = open("../data/scraped_data.json")
    scraped_data = json.load(f)
    listing_ids = scraped_data["all_unique_listing_ids"]
    n_pools = os.cpu_count() // 2
    with Pool(n_pools) as pool:
        result = pool.map(extract_reviews_soup, listing_ids[:16])
    pool.close()
    pool.join()

  driver = webdriver.Chrome(ChromeDriverManager().install())


In [87]:
test = extract_reviews_soup("24598097")

  driver = webdriver.Chrome(ChromeDriverManager().install())


In [90]:
elements_found = test.find_all("div", "r1are2x1")

In [102]:
i = 0
for child in elements_found[0].children:
    print(i)
    print(child)
    i+=1

0
<div><div class="chnzxuf dir dir-ltr"><div class="t9gtck5 dir dir-ltr" id="review_697670638047608361_title"><h3 class="_14i3z6h" elementtiming="LCP-target" tabindex="-1">Jamie</h3><div class="s11wgnhd dir dir-ltr"><div class="s189e1ab dir dir-ltr"><ol class="_7h1p0g"><li class="_1f1oir5" theme="[object Object]">August 2022</li></ol></div></div></div><div class="_e7hn5" style="height: 40px; width: 40px;"><a aria-label="Jamie" class="_9bezani" href="/users/show/310377101" target="_blank"><div class="_1c81x67" style="height: 40px; width: 40px; border-radius: 50%;"><div aria-busy="false" aria-label="Jamie" class="_1h6n1zu" role="img" style="display: inline-block; vertical-align: bottom; height: 100%; width: 100%; min-height: 1px;"><img alt="Jamie" aria-hidden="true" class="_9ofhsl" data-original-uri="https://a0.muscache.com/im/pictures/user/b9215414-dddd-4379-b26b-f52e371957fb.jpg?im_w=240" decoding="async" elementtiming="LCP-target" src="https://a0.muscache.com/im/pictures/user/b9215414

## how to scrape listing detail
1. https://www.airbnb.com/rooms/{listing_id}
2. https://www.airbnb.com/rooms/{listing_id}/amenities
3. https://www.airbnb.com/rooms/{listing_id}?modal=DESCRIPTION
4.https://www.airbnb.com/rooms/{listing_id}&modal=PHOTO_TOUR_SCROLLABLE
4. https://www.airbnb.com/rooms/53818538/safety
5. https://www.airbnb.com/rooms/47997939/house-rules
6. Latitude, longitude: search an anchor tag whose title is "Open this area in Google Maps (opens a new window)"


Note:
- if it is a luxury listing or plus then skip:
    - Example: 
    - https://www.airbnb.com/luxury/listing/20470768
    - https://www.airbnb.com/rooms/plus/33177355
- Add these query params (?check_in=2022-09-17&check_out=2022-09-18&enable_auto_translate=true&locale=en&country_override=US)
    - check_in=2022-09-17&check_out=2022-09-18
        - which are same check-in and check-out dates across all listings to ensure fairness
    - enable_auto_translate=true&locale=en&country_override=US
        - ensure we get translated text

In [12]:
def build_url(listing_id):
    query_params = 'check_in=2022-09-17&check_out=2022-09-18&enable_auto_translate=true&locale=en&country_override=US'
    return f'https://www.airbnb.com/rooms/{listing_id}?{query_params}'

In [13]:
def extract_soup_js(listing_url, required_waiting_time=20, small_delay_time=1):
    """Extracts HTML from JS pages: open, wait, click, wait, extract"""

    options = Options()
    driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

    # if the URL is not valid - return an empty soup
    try:
        driver.get(listing_url)
    except:
        print(f"Wrong URL: {listing_url}")
        return BeautifulSoup('', features='html.parser')
    current_url = driver.current_url
    feature_dict = {
        "is_airbnb_plus":False,
        "is_airbnb_luxe":False
    }
    
    try:
        my_elem1 = WebDriverWait(driver, required_waiting_time).until(EC.presence_of_element_located((By.CLASS_NAME, '_tyxjp1')))
        print(my_elem1)
        my_elem2 = WebDriverWait(driver, required_waiting_time).until(EC.presence_of_element_located((By.XPATH, "//script[@id='datadeferred-state']")))
        print(my_elem2)
    except:
        pass
    
    if '/plus' in current_url:
        feature_dict['is_airbnb_plus'] = True
    elif '/luxury' in current_url:
         feature_dict['is_airbnb_luxe'] = True

    detail_page = driver.page_source

    driver.quit()
    print(feature_dict)
    return BeautifulSoup(detail_page, features='html.parser')

In [108]:
detail_url = build_url('24598097')
detail_soup = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="a33ddb902b3b2bbde9949e0a90c91ddf", element="add48a21-e43e-461a-aac8-f3e4d8383226")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': False}


In [109]:
detail_url = build_url('54347410')
detail_soup2 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="8e5808030c481ae1bcdc5a67799292b9", element="fcae6f13-1e7a-4930-9710-cf26c9ab2b74")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': False}


In [110]:
detail_url = build_url('19454887')
detail_soup3 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="09eb29ba8ffdb2716791dfaa09a3a04e", element="8e01451e-2453-475f-b2ab-dea6bc9a23e9")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': False}


In [111]:
# Listing without reviews
detail_url = build_url('734536884906701187')
detail_soup4 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="668ce7c12d7457a29de55363b29676b8", element="2c3e6e17-7ea3-459c-8478-ce1f8744097c")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': False}


In [112]:
# Airbnb plus
detail_url = build_url('33177355')
detail_soup5 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="fa9f1dac4a64041edb9c4df010456320", element="25428825-4e28-4c04-b71b-bc7695b651c6")>
{'is_airbnb_plus': True, 'is_airbnb_luxe': False}


In [113]:
# Airbnb luxe with only 1 review
detail_url = build_url('20470768')
detail_soup6 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="88e95701f4b53202e93e32e2c22f1bcd", element="39f000a8-6311-4ca9-aeae-5ca65261f8a3")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': True}


In [114]:
# Airbnb luxe with only 4 review# Airbnb plus
detail_url = build_url('35875555')
detail_soup7 = extract_soup_js(detail_url)

  driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)


<selenium.webdriver.remote.webelement.WebElement (session="9ec985c68251ea2009afc02d973e78d0", element="add0c28e-59cd-49aa-b938-b9240e4aedde")>
{'is_airbnb_plus': False, 'is_airbnb_luxe': True}


In [115]:
soups = [detail_soup,detail_soup2,detail_soup3,detail_soup4,detail_soup5,detail_soup6,detail_soup7]

In [166]:
i = 1
for soup in soups:
    founds = soup.find_all('span', '_tyxjp1')
    length_founds = len(founds)
    if length_founds >= 1:
        texts = [found.get_text() for found in founds]
        print(f"{i}th found, length: {length_founds}, texts: {texts}")
    else:
        print(f"{i}th not found")
    i +=1


1th found, length: 2, texts: ['$113', '$113']
2th found, length: 1, texts: ['$26']
3th found, length: 1, texts: ['$190']
4th found, length: 1, texts: ['$53']
5th found, length: 2, texts: ['$34', '$34']
6th found, length: 1, texts: ['$3,431']
7th found, length: 1, texts: ['$1,823']


In [None]:
i = 1
for soup in soups:
    
#     founds = soup.find_all('ol', 'lgx66tx')
    length_founds = len(founds)
    if length_founds >= 1:
        texts = [found.get_text() for found in founds]
        print(f"{i}th found, length: {length_founds}, text: {texts}")
    else:
        print(f"{i}th not found")
    i +=1

In [None]:
for sec in sections:
    print(sec['sectionComponentType'], sec['sectionId'])

In [116]:
founds = detail_soup.find_all("script",{"id":"data-deferred-state"})
y = json.loads(founds[0].get_text())
temp = y['niobeMinimalClientData'][0]
print(len(temp))
sections = temp[1]['data']['presentation']['stayProductDetailPage']['sections']['sections']
for sec in sections:
    print(sec['sectionId'])
sections

2
DESCRIPTION_MODAL
AVAILABILITY_CALENDAR_DEFAULT
BOOK_IT_CALENDAR_SHEET
AIRCOVER_LEARN_MORE_MODAL
REVIEWS_DEFAULT
LOCATION_DEFAULT
HOST_PROFILE_DEFAULT
POLICIES_DEFAULT
SEO_LINKS_DEFAULT
BOOK_IT_SIDEBAR
URGENCY_COMMITMENT_SIDEBAR
REPORT_TO_AIRBNB
NAV_DEFAULT
BOOK_IT_NAV
OVERVIEW_DEFAULT
AIRCOVER_PDP_BANNER
UGC_TRANSLATION
DESCRIPTION_DEFAULT
SLEEPING_ARRANGEMENT_DEFAULT
AMENITIES_DEFAULT
AVAILABILITY_CALENDAR_INLINE
TITLE_DEFAULT
HERO_DEFAULT
BOOK_IT_FLOATING_FOOTER
EDUCATION_FOOTER_BANNER
NAV_MOBILE
URGENCY_COMMITMENT
PHOTO_TOUR_SCROLLABLE_MODAL
WHAT_COUNTS_AS_A_PET_MODAL
EDUCATION_FOOTER_BANNER_MODAL
CANCELLATION_POLICY_PICKER_MODAL


[{'__typename': 'SectionContainer',
  'id': 'U2VjdGlvbkNvbnRhaW5lcjoyOTI2MDY5MDAyMDY1NTU2ODkz',
  'sectionComponentType': 'PDP_DESCRIPTION_MODAL',
  'sectionContentStatus': 'COMPLETE',
  'sectionId': 'DESCRIPTION_MODAL',
  'errors': None,
  'sectionDependencies': [],
  'enableDependencies': None,
  'disableDependencies': None,
  'loggingData': {'__typename': 'LoggingEventData',
   'loggingId': 'pdp_platform.description_modal',
   'experiments': None,
   'eventData': None,
   'eventDataSchemaName': None,
   'section': None,
   'component': None},
  'e2eLoggingSession': None,
  'mutationMetadata': None,
  'pluginPointId': 'DESCRIPTION_MODAL',
  'section': {'__typename': 'GeneralListContentSection',
   'buttons': None,
   'caption': None,
   'ctaButton': None,
   'flip': None,
   'headingLevel': None,
   'items': [{'__typename': 'BasicListItem',
     'action': None,
     'anchor': None,
     'accessibilityLabel': None,
     'button': None,
     'icon': None,
     'html': {'__typename': 'H

In [131]:
founds = detail_soup4.find_all("script",{"id":"data-deferred-state"})
y = json.loads(founds[0].get_text())
temp = y['niobeMinimalClientData'][0]
print(len(temp))
sections = temp[1]['data']['presentation']['stayProductDetailPage']['sections']['sections']
for sec in sections:
    print(sec['sectionId'])
sections

2
AIRCOVER_LEARN_MORE_MODAL
AVAILABILITY_CALENDAR_DEFAULT
BOOK_IT_CALENDAR_SHEET
CANCELLATION_POLICY_PICKER_MODAL
DESCRIPTION_MODAL
WHAT_COUNTS_AS_A_PET_MODAL
PHOTO_TOUR_SCROLLABLE_MODAL
REVIEWS_EMPTY_DEFAULT
LOCATION_DEFAULT
HOST_PROFILE_DEFAULT
POLICIES_DEFAULT
SEO_LINKS_DEFAULT
BOOK_IT_SIDEBAR
URGENCY_COMMITMENT_SIDEBAR
REPORT_TO_AIRBNB
NAV_DEFAULT
BOOK_IT_NAV
OVERVIEW_DEFAULT
HIGHLIGHTS_DEFAULT
AIRCOVER_PDP_BANNER
DESCRIPTION_DEFAULT
SLEEPING_ARRANGEMENT_DEFAULT
AMENITIES_DEFAULT
AVAILABILITY_CALENDAR_INLINE
TITLE_DEFAULT
HERO_DEFAULT
BOOK_IT_FLOATING_FOOTER
EDUCATION_FOOTER_BANNER
NAV_MOBILE
URGENCY_COMMITMENT
EDUCATION_FOOTER_BANNER_MODAL


[{'__typename': 'SectionContainer',
  'id': 'U2VjdGlvbkNvbnRhaW5lcjotNjY4NzMyODU2OTE3NjEyOTE1OQ==',
  'sectionComponentType': 'AIRCOVER_LEARN_MORE_MODAL',
  'sectionContentStatus': 'COMPLETE',
  'sectionId': 'AIRCOVER_LEARN_MORE_MODAL',
  'errors': None,
  'sectionDependencies': [],
  'enableDependencies': None,
  'disableDependencies': None,
  'loggingData': {'__typename': 'LoggingEventData',
   'loggingId': 'pdp.aircoverModal.modal',
   'experiments': None,
   'eventData': None,
   'eventDataSchemaName': None,
   'section': 'aircoverModal',
   'component': 'modal'},
  'e2eLoggingSession': None,
  'mutationMetadata': None,
  'pluginPointId': None,
  'section': {'__typename': 'AircoverLearnMoreModalSection',
   'titleMedia': {'__typename': 'Image',
    'id': 'SW1hZ2U6MC44NTcxODE2Mzc1NTczOTk2',
    'aspectRatio': 1,
    'orientation': None,
    'onPressAction': None,
    'accessibilityLabel': 'AirCover',
    'baseUrl': 'https://a0.muscache.com/im/pictures/54e427bb-9cb7-4a81-94cf-78f1915

In [21]:
a = {
    "section":{
        "overallCount":"hel"
    }
}

rule = {
    "avg_rating": {
            "section_id": "REVIEWS_DEFAULT",
            "key": lambda x: x["section"]["overallCount"],
        },
}
rule['avg_rating']['key'](a)

'hel'

In [117]:
# Airbnb plus
founds = detail_soup5.find_all("script",{"id":"data-deferred-state"})
y = json.loads(founds[0].get_text())
temp = y['niobeMinimalClientData'][0]
print(len(temp))
sections = temp[1]['data']['presentation']['stayProductDetailPage']['sections']['sections']
for sec in sections:
    print(sec['sectionId'])
sections

2
PHOTO_TOUR_CAROUSEL_MODAL
BOOK_IT_SIDEBAR
URGENCY_COMMITMENT_SIDEBAR
NAV_DEFAULT
BOOK_IT_NAV
OVERVIEW_DEFAULT
AIRCOVER_PDP_BANNER
DESCRIPTION_DEFAULT
MOSAIC_TOUR_PREVIEW_PLUS_TWO_COLUMNS
SLEEPING_ARRANGEMENT_WITH_IMAGES
ACCESSIBILITY_FEATURES_DEFAULT
AVAILABILITY_CALENDAR_INLINE
EDUCATION_PLUS
AMENITIES_PLUS
REVIEWS_DEFAULT
LOCATION_DEFAULT
HOST_PROFILE_DEFAULT
POLICIES_DEFAULT
SEO_LINKS_DEFAULT
TITLE_DEFAULT
HERO_PLUS
URGENCY_COMMITMENT
MOSAIC_TOUR_PREVIEW_PLUS_ONE_COLUMN
BOOK_IT_FLOATING_FOOTER
NAV_MOBILE
WHAT_COUNTS_AS_A_PET_MODAL
AVAILABILITY_CALENDAR_DEFAULT
BOOK_IT_CALENDAR_SHEET
AIRCOVER_LEARN_MORE_MODAL
ACCESSIBILITY_FEATURES_MODAL
PHOTO_TOUR_SCROLLABLE_MODAL
DESCRIPTION_MODAL
CANCELLATION_POLICY_PICKER_MODAL


[{'__typename': 'SectionContainer',
  'id': 'U2VjdGlvbkNvbnRhaW5lcjo0NzEzODgzNzc2MTEwNDM5NDE2',
  'sectionComponentType': 'PHOTO_TOUR_CAROUSEL',
  'sectionContentStatus': 'COMPLETE',
  'sectionId': 'PHOTO_TOUR_CAROUSEL_MODAL',
  'errors': None,
  'sectionDependencies': [],
  'enableDependencies': None,
  'disableDependencies': None,
  'loggingData': None,
  'e2eLoggingSession': None,
  'mutationMetadata': None,
  'pluginPointId': None,
  'section': {'__typename': 'PhotoTourModalSection',
   'title': 'Property overview',
   'mediaItems': [{'__typename': 'Image',
     'id': '821733230',
     'aspectRatio': 1.5,
     'orientation': 'LANDSCAPE',
     'onPressAction': None,
     'accessibilityLabel': 'Living room image 1',
     'baseUrl': 'https://a0.muscache.com/pictures/9eb3fa05-99d4-4755-af23-55306260270f.jpg',
     'displayAspectRatio': None,
     'imageMetadata': {'__typename': 'ImageMetadata',
      'caption': '',
      'imageType': 'DETAIL',
      'isProfessional': False,
      'isVe

In [None]:
founds = detail_soup4.find_all("script",{"id":"data-deferred-state"})
y = json.loads(founds[0].get_text())
temp = y['niobeMinimalClientData'][0]
print(len(temp))
sections = temp[1]['data']['presentation']['stayProductDetailPage']['sections']['sections']
for sec in sections:
    print(sec['sectionId'])
sections

In [118]:
# Airbnb luxe with only 4 review
founds = detail_soup7.find_all("script",{"id":"data-deferred-state"})
y = json.loads(founds[0].get_text())
temp = y['niobeMinimalClientData'][0]
print(len(temp))
sections = temp[1]['data']['presentation']['stayProductDetailPage']['sections']['sections']
for sec in sections:
    print(sec['sectionId'])
sections

2
DESCRIPTION_MODAL
CANCELLATION_POLICY_PICKER_MODAL
PHOTO_TOUR_CAROUSEL_MODAL
PHOTO_TOUR_SCROLLABLE_MODAL
BOOK_IT_SIDEBAR
INSERT_LUXE
INCLUDED_SERVICES_LUXE
ADD_ON_SERVICES_LUXE
AMENITIES_LUXE
EDUCATION_LUXE
REVIEWS_DEFAULT
LOCATION_DEFAULT
POLICIES_DEFAULT
CONTACT_TRIP_DESIGNER_LUXE
NAV_DEFAULT
BOOK_IT_NAV
TITLE_DEFAULT
HERO_LUXE
OVERVIEW_LUXE
HIGHLIGHTS_DEFAULT
UNSTRUCTURED_DESCRIPTION_LUXE
CHAT_BUBBLE_LUXE
MOSAIC_TOUR_PREVIEW_LUXE_TWO_COLUMNS
AVAILABILITY_CALENDAR_INLINE
BOOK_IT_FLOATING_FOOTER
MOSAIC_TOUR_PREVIEW_LUXE_ONE_COLUMN
NAV_MOBILE
WHAT_COUNTS_AS_A_PET_MODAL
AVAILABILITY_CALENDAR_DEFAULT
BOOK_IT_CALENDAR_SHEET


[{'__typename': 'SectionContainer',
  'id': 'U2VjdGlvbkNvbnRhaW5lcjotMzU0OTkzNzQwMzA1NTk5NzM3OA==',
  'sectionComponentType': 'PDP_DESCRIPTION_MODAL',
  'sectionContentStatus': 'COMPLETE',
  'sectionId': 'DESCRIPTION_MODAL',
  'errors': None,
  'sectionDependencies': [],
  'enableDependencies': None,
  'disableDependencies': None,
  'loggingData': {'__typename': 'LoggingEventData',
   'loggingId': 'pdp_platform.description_modal',
   'experiments': None,
   'eventData': None,
   'eventDataSchemaName': None,
   'section': None,
   'component': None},
  'e2eLoggingSession': None,
  'mutationMetadata': None,
  'pluginPointId': 'DESCRIPTION_MODAL',
  'section': {'__typename': 'GeneralListContentSection',
   'buttons': None,
   'caption': None,
   'ctaButton': None,
   'flip': None,
   'headingLevel': None,
   'items': [{'__typename': 'BasicListItem',
     'action': None,
     'anchor': None,
     'accessibilityLabel': None,
     'button': None,
     'icon': None,
     'html': {'__typename'

In [None]:
soups[0].find_all('span', '_9xiloll')[0].get_text()

In [129]:
# import numpy as np
# df = pd.DataFrame(dict(a=[None,1],b=['b',None]))
# df.isnull()

Unnamed: 0,a,b
0,True,False
1,False,True


## startStaysCheckout

In [None]:
startStaysCheckout_ENDPOINT= "https://www.airbnb.com/api/v3/startStaysCheckout?operationName=startStaysCheckout&locale=en&currency=USD"
json_payload = {
  "operationName": "startStaysCheckout",
  "variables": {
    "input": {
      "businessTravel": {
        "workTrip": 'false'
      },
      "checkinDate": "2022-11-25",
      "checkoutDate": "2022-11-27",
      "guestCounts": {
        "numberOfAdults": 1,
        "numberOfChildren": 0,
        "numberOfInfants": 0,
        "numberOfPets": 0
      },
      "guestCurrencyOverride": "USD",
      "lux": {},
      "metadata": {
        "internalFlags": [
          "LAUNCH_LOGIN_PHONE_AUTH"
        ]
      },
      "org": {},
      "productId": "U3RheUxpc3Rpbmc6NTQzNDc0MTA=",
      "china": {},
      "quickPayData": None
    }
  },
  "extensions": {
    "persistedQuery": {
      "version": 1,
      "sha256Hash": "30ee0796968836b4fca1aefdd8a33def3f212413c868b8a22d4cbeb2754bdc1a"
    }
  }
}
r = requests.post(url = startStaysCheckout_ENDPOINT, json = json_payload, headers=HEADERS)


In [None]:
data = r.json()
data

In [None]:
data.keys()

In [None]:
data['data']['startStayCheckoutFlow']['stayCheckout']['sections'].keys()