# The Relationship Between a Hotel's Geographical Proximity to the Center of the Country and Its Price

* Yarin Cohen, ID: 211361720
* Amit Shiber, ID: 322372582

## About Our Project

From time to time the issue of the periphery versus the center of the country comes up in the media. We decided to research more about the subject and check the hotel prices in the cities near Tel Aviv and in the distant cities. After crawling the data from the hotel website, we will use an additional function with an API to calculate distances between two locations. Is there a connection between the price of hotel charges and its proximity to the center of country?

### Information Sources and Data Acquisition Methods

* **Crawling Booking.com** - One of the largest online travel agencies. As of December 31, 2022, Booking.com offered lodging reservation services for approximately 2.7 million properties, including 400,000 hotels, motels, and resorts and 2.3 million homes, apartments in over 220 countries and in over 40 languages. It will help us getting data about the hotels in this project.

* **GeoDB Cities API** - Online cities database. It exposes city, region, and country data via both GraphQL and REST APIs. It will help us calculate the distance between two cities.

### Data Set Description

Each line in the data set represents a hotel.

Columns representation in the data set:
* Hotel name
* Hotel Address
* Hotel Description
* Price per night (on a fixed date, the cheapest deal)
* Score - general
* Score - stuff
* Score - facilities
* Score - convenience
* Score - value for money
* Score - location
* Score - clean
* Proximity to the center of the country (km)

### Machine Learning

* **Type of ML**: Regression

* We will start with easy regression models (one variable and low powers) and try to go through each pair of an explanatory variable and an explained variable.

* There is no rule that says how many variables make a regression heavy and sluggish. If the software starts to falter, we will stop and think whether adding the variables and holdings will contribute to the prediction or only to the complications of calculation, memory, etc. We are required to exercise discretion between predictability and complications and resources such as private time.

* If the learning results are not satisfactory, we will use
in classification and division into price levels.

## Imports

In [2]:
import requests
import bs4
from bs4 import BeautifulSoup
import time
import random
from tqdm import tqdm
import pandas as pd
import scipy as sc
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import sklearn
from sklearn import linear_model, metrics, preprocessing
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import r2_score, f1_score
%matplotlib inline

## Step 1: Defining a Research Question

Is it possible to predict the price of a night in a certain hotel, based on its proximity to the center and the score given to it by surfers in the various categories?

## Step 2: Data Acquisition

### Data Acquisition by Crawling

First of all, we will check Booking.com's Robots.txt terms, to understand if there are any pages we can't crawl: https://booking.com/robots.txt

* We will start by searching manually on Booking's main page for a vacation in Israel, on 01-02/08/2023.

* The results page will be crawled first.
* Due to complexity of HTML elements, we will use the mobile version of Booking.
* <a href="https://www.booking.com/searchresults.he.html?ss=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&ss=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&ssne_untouched=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&sb_changed_dates=1&label=gen173nr-1BCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQGIAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIF4AIB&sid=ae3ca57b743d1747c5f828a2fabc4587&aid=304142&lang=he&sb=1&src_elem=sb&src=searchresults&dest_id=103&dest_type=country&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot" >This is</a> the first page will be crawled.

#### Auxiliary Functions

In [3]:
# Load soup object:

def loadSoupObject(url):

    headers = { "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148" }
    
    time.sleep(random.randint(1,5))
    r = requests.get(url, headers=headers).content
    
    return BeautifulSoup(r,"html.parser")

In [4]:
# Getting URLs of all the hotels in the page:

def getHotelsURL(soupObj):

    links = []
   
    for link in soupObj.findAll("a", {"data-testid" : "title"}):
        links.append(link.get("href"))

    return links

In [5]:
# Getting URL of the next results page:

def getNextPage(soupObj):
    return soupObj.find("a", {"title" : "Next page"}).get("href")

In [6]:
# Getting information from a hotel page:

def getHotelData(soupObj):

    dataOfHotel = []

    # Hotel name:
    name = soupObj.find("span",{"class" : "hp-header--title--text"})
    dataOfHotel.append(name.text if name else np.nan)

    # Hotel address:
    add = soupObj.find("span",{"class" : "js_hp_address_text_line"})
    dataOfHotel.append(add.text if add else np.nan)

    # Hotel description:
    desc = soupObj.find("div",{"class" : "page-section--content"})
    dataOfHotel.append(desc.text if desc else np.nan)
    
    # Price per night (on a fixed date, the cheapest deal):
    price = soupObj.find("div",{"class" : "prco-js-headline-price"})
    dataOfHotel.append(price.text if price else np.nan)

    # Score - general:
    score = soupObj.find("div",{"data-testid" : "review-score-component"})
    dataOfHotel.append(score.text if score else np.nan)

    # Score - stuff:
    # dataOfHotel.append(soupObj.find("div",{"id" : ":rb:-label"}).text)

    # Score - facilities:
    # dataOfHotel.append(soupObj.find("div",{"id" : ":r9:-label"}).text)

    # Score - convenience:
    # dataOfHotel.append(soupObj.find("div",{"id" : ":ra:-label"}).text)

    # Score - value for money:
    value = soupObj.find("div",{"id" : ":R5m:-label"})
    dataOfHotel.append(value.text if value else np.nan)

    # Score - location:
    location = soupObj.find("div",{"id" : ":R4m:-label"})
    dataOfHotel.append(location.text if location else np.nan)

    # Score - clean:
    clean = soupObj.find("div",{"id" : ":R56:-label"})
    dataOfHotel.append(clean.text if clean else np.nan)

    return dataOfHotel


In [7]:
# The whole process of crawling all the hotels from results pages:

def getAllHotels(urlResults):

    currentPage = loadSoupObject(urlResults)

    resultsPages = []
    hotelsLinks = []

    # Collecting links of results pages:

    resultsPages.append(currentPage)

    for i in tqdm(range(33), desc="Collecting next-links..."):
        nextPage = loadSoupObject(getNextPage(currentPage))
        resultsPages.append(nextPage)
        currentPage = nextPage
        if not (currentPage.find("a", {"title" : "Next page"})):
            break

    # Collecting links of hotels:

    for page in tqdm(resultsPages, desc="Collecting links of hotels..."):
        hotelsLinks.extend(getHotelsURL(page))

    # Crawling data from hotel pages

    hotelsData = []

    for link in tqdm(hotelsLinks, desc="Crawling hotels data..."):
        hotelsData.append(getHotelData(loadSoupObject(link)))

    return hotelsData

#### Main Function

In [8]:
searches = [
    # All of Israel
    "https://www.booking.com/searchresults.en-us.html?ss=Israel&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=Nazareth&ssne_untouched=Nazareth&sb_changed_dest=1&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=103&dest_type=country&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=b14072e34a0d003f&ac_meta=GhBiMTQwNzJlMzRhMGQwMDNmIAAoATICZW46BmlzcmFlbEAASgBQAA%3D%3D&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot",

    # Herzelia
    "https://www.booking.com/searchresults.html?ss=Herzelia%20%2C%20Center%20District%20Israel%2C%20Israel&ssne=Israel&ssne_untouched=Israel&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=-780136&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=c143705976100181&ac_meta=GhBjMTQzNzA1OTc2MTAwMTgxIAAoATICZW46BmhlcnplbEAASgBQAA%3D%3D&checkin=2024-02-01&checkout=2024-02-02&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure&prefer_site_type=mdot",

    # Netanya
    "https://www.booking.com/searchresults.en-us.html?ss=Netanya%2C%20Center%20District%20Israel%2C%20Israel&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=Herzliya&ssne_untouched=Herzliya&sb_changed_dest=1&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-780860&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=3&search_selected=true&search_pageview_id=84c970c1bc6a0337&ac_meta=GhA4NGM5NzBjMWJjNmEwMzM3IAAoATICZW46Bm5hdGFueUAASgBQAA%3D%3D&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot",

    # Mizpe Ramon
    "https://www.booking.com/searchresults.en-us.html?ss=Mitzpe%20Ramon%2C%20South%20District%20Israel%2C%20Israel&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=Israel&ssne_untouched=Israel&sb_changed_dest=1&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=900040703&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=1&search_selected=true&search_pageview_id=27a3716c0fc000b9&ac_meta=GhAyN2EzNzE2YzBmYzAwMGI5IAAoATICZW46CG1penBlIHJhQABKAFAA&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot",

    # Yeruham
    "https://www.booking.com/searchresults.en-us.html?ss=Yero%E1%BA%96am%2C%20South%20District%20Israel%2C%20Israel&ssne=Mitzpe%20Ramon&ssne_untouched=Mitzpe%20Ramon&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-781740&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=1&search_selected=true&search_pageview_id=0f8b727c12b502bb&ac_meta=GhAwZjhiNzI3YzEyYjUwMmJiIAAoATICZW46BXllcnVoQABKAFAA&checkin=2024-02-01&checkout=2024-02-02&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure&prefer_site_type=mdot",

    # Haifa
    "https://www.booking.com/searchresults.en-us.html?ss=Haifa%2C%20North%20District%20Israel%2C%20Israel&ssne=Yero%E1%BA%96am&ssne_untouched=Yero%E1%BA%96am&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-780112&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=db0572851eec0161&ac_meta=GhBkYjA1NzI4NTFlZWMwMTYxIAAoATICZW46BWhhaWZhQABKAFAA&checkin=2024-02-01&checkout=2024-02-02&group_adults=2&no_rooms=1&group_children=0&sb_travel_purpose=leisure&prefer_site_type=mdot",

    # Nazareth
    "https://www.booking.com/searchresults.en-us.html?ss=Nazareth%2C%20North%20District%20Israel%2C%20Israel&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=Haifa&ssne_untouched=Haifa&sb_changed_dest=1&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=-780833&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=817872d0f90601d0&ac_meta=GhA4MTc4NzJkMGY5MDYwMWQwIAAoATICZW46BG5henJAAEoAUAA%3D&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot",

    # Israel's north district
    "https://www.booking.com/searchresults.en-us.html?ss=North%20District%20Israel%2C%20Israel&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=Israel&ssne_untouched=Israel&sb_changed_dest=1&label=gen173nr-1FCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQH4AQ2IAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIG4AIB&sid=3835e8355663f34053900d62e4029676&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=3638&dest_type=region&ac_position=2&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=1a9f72fd1422006b&ac_meta=GhAxYTlmNzJmZDE0MjIwMDZiIAIoATICZW46BklzcmFlbEAASgBQAA%3D%3D&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot"

]

fullData = []

In [9]:
for url in searches:
    fullData.append(getAllHotels(url))

 97%|█████████▋| 32/33 [02:42<00:05,  5.09s/it]
100%|██████████| 1020/1020 [1:42:43<00:00,  6.04s/it]
 48%|████▊     | 16/33 [01:14<01:19,  4.68s/it]
100%|██████████| 531/531 [51:55<00:00,  5.87s/it] 
  3%|▎         | 1/33 [00:09<05:11,  9.74s/it]
100%|██████████| 63/63 [05:59<00:00,  5.71s/it]
  0%|          | 0/33 [00:00<?, ?it/s]


AttributeError: 'NoneType' object has no attribute 'get'

In [16]:
# Creating a data set:

headers = ["hotelName", "hotelAddress", "hotelDescription", "pricePerNight","scoreGeneral","scoreValueForMoney","scoreLocation","scoreClean"]

fullData

df = pd.DataFrame(fullData)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019
0,"[Badolina Ein Gedi Glamping, \nEin Gedi Street...","[Daniel Herzliya Hotel, \n60 Ramat Yam Street,...","[וילה תהילה המחודש - The new Villa Tehila, \n7...","[Keshet Yehonatan Country Lodging, \nמושב קשת,...","[Daria- Authentic Israeli Experience, \nHa'on ...","[Six Senses Shaharut, \nHevel Eilot Negev Dese...","[Michel House, \n6098 No. 6, Nazareth\n, \nFe...","[Mantur Metula by Selina, \nHarishonim St., Me...","[Yarden Estate Boutique Hotel, \nYesud HaMa'al...","[Pastoral Hotel - Kfar Blum, \nKibbutz Kfar Bl...",...,"[Italian design apartment in Rotchild /habima,...","[Dizngof down town telaviv, \nDizngof 50. Apt ...","[Aida Guest house, \n15 אניס כרדוש קומה 1, Naz...","[מגדלי המלכים נופש על שפת הכינרת, \nגדוד ברק, ...","[אחוזת נווה גד, \nשעורה, Telamim\n, \nSituated...","[beautiful 2 bedroom apartment, \n34 Kehilat C...",[Jerusalem Hotel Private Luxury Suites near We...,"[Sil Place, \n29 חב""ד, Rishon LeẔiyyon\n, \nSi...","[Naim Mountain View, \nדירה 254, Yuval\n, \nLo...",[Sea View Designed Studio Apartment in Front o...
1,"[Daniel Herzliya Hotel, \n60 Ramat Yam Street,...","[Publica Isrotel, Autograph Collection, \nAba ...","[NYX Herzliya, \nAbba Even Blvd. 19, Herzelia ...","[מרינה הרצליה דירות נופש, \n3 HaTsedef Street,...","[The Ritz-Carlton, Herzliya, \n4 Hashunit Stre...","[Herods Herzliya, \nHerzliya Marina, Herzelia ...","[ApartHotel Okeanos on the Beach, \nHashunit 1...","[Dan Accadia Herzliya Hotel, \n122 Eli landau ...","[Loft center Herzliya, \n27 סירקין, Herzelia \...","[Oceanus apartment hotel, \n10 Ha-Shunit Stree...",...,,,,,,,,,,
2,"[Leonardo Plaza Netanya Hotel, \nOsishkin 1, N...","[Mizpe Yam Boutique Hotel, \n1 Jabotinski Stre...","[Kikar Boutique Hotel, \nKikar Haazmaut 9, Net...",[David Tower Hotel Netanya by Prima Hotels - 1...,[island place - 2 min from the sea super king ...,"[Margoa Hotel Netanya, \n9 Gad Machnes St., Ne...","[Colors Suites in Netanya, \n13 כיכר העצמאות, ...","[Ramada Hotel & Suites by Wyndham Netanya, \nB...","[Appartement kikar netanya, \n37 Dizengoff Str...","[apartment kikar near the sea, \n1 Kikar HaAts...",...,,,,,,,,,,


In [12]:
df.describe()

NameError: name 'df' is not defined

In [None]:
# # Collecting links of results pages:

# resultsPages.append(currentPage)

# for i in range(33):
#     nextPage = loadSoupObject(getNextPage(currentPage))
#     resultsPages.append(nextPage)
#     currentPage = nextPage

In [8]:
# # Collecting links of hotels:

# for page in resultsPages:
#     hotelsLinks.extend(getHotelsURL(page))

In [None]:
# urlResults = "https://www.booking.com/searchresults.html?ss=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&ss=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&group_adults=2&group_children=0&no_rooms=1&sb_travel_purpose=leisure&ssne=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&ssne_untouched=%D7%99%D7%A9%D7%A8%D7%90%D7%9C&sb_changed_dates=1&label=gen173nr-1BCAEoggI46AdIM1gEaGqIAQGYAQ64AQfIAQzYAQHoAQGIAgGoAgO4AsPO_6IGwAIB0gIkN2EzYmVmMjgtNTkwYS00YjMyLWI5ZmUtMmZjMTQwOTdmM2I42AIF4AIB&sid=ae3ca57b743d1747c5f828a2fabc4587&aid=304142&lang=en-us&sb=1&src_elem=sb&src=searchresults&dest_id=103&dest_type=country&checkin=2024-02-01&checkout=2024-02-02&prefer_site_type=mdot&soz=1&lang_changed=1"
# currentPage = loadSoupObject(urlResults)

# resultsPages = []
# hotelsLinks = []

In [None]:
# # Collecting links of results pages:

# resultsPages.append(currentPage)

# for i in tqdm(range(33)):
#     nextPage = loadSoupObject(getNextPage(currentPage))
#     resultsPages.append(nextPage)
#     currentPage = nextPage

In [None]:
# # Collecting links of hotels:

# for page in resultsPages:
#     hotelsLinks.extend(getHotelsURL(page))

In [None]:
# # Crawling data from hotel pages

# hotelsData = []

# for link in tqdm(hotelsLinks):
#     hotelsData.append(getHotelData(loadSoupObject(link)))

### Data Acquisition by API

a little description and notes about this step...

#### Auxiliary Functions

In these two following function we calculate the distance between Tel-Aviv to another city.
The calculation using an API which called GEODB in order to get the city ID and to find the distance.

In [97]:
import requests

def getCityID(cityName):

    # Set up the API endpoint URL
    url = "https://wft-geo-db.p.rapidapi.com/v1/geo/cities"

    # Set your API key and headers
    api_key = "b357d38c99mshac61197df8fd7c2p1d5cd7jsn2bfe93de690c"
    headers = {
        "X-RapidAPI-Key": api_key,
        "X-RapidAPI-Host": "wft-geo-db.p.rapidapi.com"
    }

    # Set the query parameters for the country code and name
    params = {
        "countryIds": "IL",
        "namePrefix": cityName
    }

    # Send GET request to the API
    response = requests.get(url, headers=headers, params=params)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the response JSON
        data = response.json()

        # Check if any cities were found
        if data["data"]:
            # Get the city ID from the first result
            city_id = data["data"][0]["id"]
            print("City ID:", city_id)
            return city_id
        else:
            print("No matching cities found.")
            return np.nan
    else:
        print("Error:", response.status_code)
        return np.nan


In [112]:
import requests

def getCityDistance(cityID): #by my own with returns
    url = "https://wft-geo-db.p.rapidapi.com/v1/geo/cities/54067/distance"

    querystring = {"fromCityId":"54067","distanceUnit":"km","toCityId":cityID}

    headers = {
        "X-RapidAPI-Key": "b357d38c99mshac61197df8fd7c2p1d5cd7jsn2bfe93de690c",
        "X-RapidAPI-Host": "wft-geo-db.p.rapidapi.com"
    }

    response = requests.get(url, headers=headers, params=querystring)

    print(response.json())
    data = response.json()

    if "data" in data:
        return data["data"]
    else:
        return None




#### Main Function

In [113]:

cityName = "Haifa"
city_id = getCityID(cityName)
time.sleep(1)
city_distance = getCityDistance(city_id)
if city_distance is not None:
    print("Distance:", city_distance)
else:
    print("Distance information not available.")


City ID: 53902
{'data': 84.7}
Distance: 84.7


## Step 3: Data Handling

At this point, we will need to handle our data and organize it. For example, in the hotel data we crawled, some hotels do not have ratings on Booking.com. In addition, we will have to deal with duplicate hotels and outliers that do not reflect most of our data. Let's go for it!

### Auxiliary Functions

In [128]:
#extract the adress to three new columns

import pandas as pd

# Create a sample dataframe
data = {'Full Address': ['18 geulim, Holon', '25 Rothschild, Tel Aviv', '10 HaYarkon, Jerusalem']}
df1 = pd.DataFrame(data)

#create 3 new columns
df1['House Number'] = ''
df1['Street'] = ''
df1['City'] = ''
#extract the data
df1['House Number'] = df1['Full Address'].str.split(' ').str[0]
df1['Street'] = df1['Full Address'].str.extract(r'\s(.*?)\,')
df1['City'] = df1['Full Address'].str.split(',', 1).str[1].str.strip()

print(df1)








              Full Address House Number      Street       City
0         18 geulim, Holon           18      geulim      Holon
1  25 Rothschild, Tel Aviv           25  Rothschild   Tel Aviv
2   10 HaYarkon, Jerusalem           10    HaYarkon  Jerusalem


In [138]:
# add a column with the distance from TLV
df1['distance-TLV'] = ''

city_name = df1['City'].tolist()
print(city_name)
TLVDis= []
for city in city_name:
    time.sleep(2)
    city_id = getCityID(city)
    time.sleep(1)
    city_distance = getCityDistance(city_id)
    TLVDis.append(city_distance)


for j in range(len(TLVDis)):
    df1.at[j,'distance-TLV'] = TLVDis[j]



    # df1['distance-TLV'] = city_distance

# cityName = "Haifa"
# city_id = getCityID(cityName)
# time.sleep(1)
# city_distance = getCityDistance(city_id)
# if city_distance is not None:
#     print("Distance:", city_distance)
# else:
#     print("Distance information not available.")


print(df1)

['Holon', 'Tel Aviv', 'Jerusalem']
City ID: 3575840
{'data': 7.2}
City ID: 54067
{'data': 0.0}
City ID: 3713657
{'data': 53.8}
              Full Address House Number      Street       City distance-TLV
0         18 geulim, Holon           18      geulim      Holon          7.2
1  25 Rothschild, Tel Aviv           25  Rothschild   Tel Aviv          0.0
2   10 HaYarkon, Jerusalem           10    HaYarkon  Jerusalem         53.8


In [12]:
# Handling "/n" signs and others

def cleanSigns(df, colName):

    newDf = df.copy()
    newDf[colName] = newDf[colName].str.replace("[\n\t\r]", "", regex=True)

    return newDf

In [None]:
# Handling missing data:



The two following functions Handling  duplication:

In [None]:
# Handling  duplication:

def count_duplicatives(df, col_name=None):
    if col_name is None:
        print("num of Duplicates",df.duplicated().sum())
        return df.duplicated().sum()
    print("num of Duplicates",df.duplicated().sum())
    return df.duplicated([col_name]).sum()

In [None]:
def remove_duplicatives(df, col_name=None):
    if col_name is None:
        return df[~df.duplicated()]
    return df[~df.duplicated([col_name])]

In [None]:
# Handling outliers:



### Main Function

In [13]:
# Main



### Data Duplication

#### Auxiliary Functions

In [14]:
# Code:



#### Main Function

In [15]:
# Main



### Outliers

#### Auxiliary Functions

In [16]:
# Code:



#### Main Function

In [17]:
# Main



## Step 4: Machine Learning

a little description and notes about this step...

#### Auxiliary Functions

In [18]:
# Code:



#### Main Function

In [19]:
# Main

