# STEP 1 : Data

We have donwloaded the data. It is saved in Airbnb_Texas_Rentals.csv. You can find the in the root of repository. 

# Step 2 : Create documents

For creating the documents I have used function named as *create_tsv_files* in the writing_of_data.py file. We will write all the steps one by one and will run the script here. 

In [3]:
def create_tsv_files(df, folder_name):  # this function will be used to crete tsv files
    # Create a tsv file for each row
    pathlib.Path(folder_name).mkdir(parents=True, exist_ok=True)  # create a doc folder first
    for i in range(len(df)):
        pd.DataFrame(df.iloc[i]).transpose().to_csv(folder_name + '/doc_%s.tsv' % i, sep='\t')

# Step 3: Search Engine

We have used nltk library for removing stopwords, punctuation, Stemming, //n values and integers. We have also removed the null values in title and description. All of this code you will find inside the cleaning_of_data.py. For convenience We are going to add functions of cleaning of data here. 

1. Function open_csv_file_and_remove_extra_values(filename) will be used to clean the title and description and will return the dataframe. 
2. Function remove_extras_from_query(query) will get the sentence or query. It can be paragraph or simple line or query it does not matter at the end it will return list of words back to the user.

In [4]:
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import RegexpTokenizer

def open_csv_file_and_remove_extra_values(filename):  # open csv file and clean the data

    # Import the data
    df = pd.read_csv(filename)
    df.isnull().sum()
    # Drop the rows with null numbers
    df = df[pd.notnull(df["description"])]
    df = df[pd.notnull(df["title"])]
    return df


def remove_extras_from_query(query):  # use all the techniques to remove unwanted items from words

    tokenizer = RegexpTokenizer(r'\w+')

    ps = PorterStemmer()
    # Remove "\\n" and replace with a space
    words = query.replace("\\n", " ")
    # Convert everything to lowercase
    words = words.lower()
    # Removing punctuation
    words = tokenizer.tokenize(words)
    # Stemming
    words = [ps.stem(word) for word in words]
    # Removing stopwords
    words = [word for word in words if word not in stopwords.words('english')]
    # No integers
    words = [x for x in words if not (x.isdigit() or x[0].isdigit())]

    return words

## 3.1) Conjunctive query

### 3.1.1) Create your index!

Lets see the script of search_engine.py in action. We will be using it for the whole assignment and we will be updating it step by step. 

1. Lets take a look at the script which will be using cleaning of the data by using nltk library.
2. After that it will create a vocabulary file.
3. After that it will create the inverted index file. 
4. Names of the files will be at the top of the script so that it can change later easily. 

In [1]:
import cleaning_of_data  
import writing_of_data
import reading_of_data
import search_engine_processing

In [2]:
CSV_FILE_NAME = "Airbnb_Texas_Rentals.csv"  # constant csv file name
FOLDER_NAME_FOR_TSV_File = "doc_files" # this will be used to create doc_files folder for tsv files. 
VOC_FILE_NAME = "vocabulary.json" # this will be vocabulary file for our search engine
INVERTED_INDEX_FILE_NAME = "inverted_index.json" # this will be inverted index file name

# cleaning of data
df = cleaning_of_data.open_csv_file_and_remove_extra_values(CSV_FILE_NAME)

In [6]:
writing_of_data.create_tsv_files(df, FOLDER_NAME_FOR_TSV_File) # creating tsv files inside doc_files folder

Above script created all the tsv files after reading it from the CSV file. 

1. Before creating inverted index file we have to create dictionary file which will contains all the words that are inside the csv file. For that I will be using the writing_of_data.py file which will will have the function named as create_vocabulary_file() function to create the vocabulary file. First argument will get the length of the dataframe so that it will get all the documents. Also. it will get folder name as second argument for TSV file for reading each document one by one. Also it will get vocabulary name as third argument. 
2. After that we will create inverted index file which will get the inverted index file name as third argumument other are the same for vocabulary function. 

Below are the definition of the functions. I have added comments so that they explain theirselves. 

In [7]:
def create_vocabulary_file(df_length, folder_name, voc_file_name):
    # Create an empty set for the vocabulary
    voc_set = set()

    # For every file...
    for i in range(df_length):
        doc = pd.read_csv(folder_name + '/doc_%s.tsv' % i, sep='\t')
        print("creating vocabulary ed index:" + str(i))
        # Concatenate the description and title in a string
        words = doc["description"][0] + doc["title"][0]

        words = cleaning_of_data.remove_extras_from_query(words)

        # Storage the words in vocabulary set
        voc_set.update(words)

    # As a result of this problem, we have a vocabulary set with unique words
    # an a dictionary, with key: number of the document values: a list of all the words (filtered) in the Airbnb post

    # Create a vocabulary dictionary from the set dictionary
    voc_dict = {}
    voc_list = list(voc_set)
    for i in range(len(voc_list)):
        voc_dict[i] = voc_list[i]

    # saving it to Json file
    with open(voc_file_name, 'w') as fp:
        json.dump(voc_dict, fp, sort_keys=True, indent=4)

    return voc_dict

def create_inverted_index_file(df_length, folder_name, dic_file_name):
    # And an empty dictionary for storage the words for each document
    dictionary = {}

    # For every file...
    for i in range(df_length):
        doc = pd.read_csv(folder_name + '/doc_%s.tsv' % i, sep='\t')
        print("creating inverted index:" + str(i))

        # Concatenate the description and title in a string
        words = doc["description"][0] + doc["title"][0]

        words = cleaning_of_data.remove_extras_from_query(words)

        # Storage the words in vocabulary set
        dictionary.update({i: words})

    # As a result of this problem, we have a vocabulary set with unique words
    # an a dictionary, with key: number of the document values: a list of all the words (filtered) in the Airbnb post

    # Create the index
    inverted_index = defaultdict(str)

    for key, value in dictionary.items():

        list_of_words = list(value)
        voc_dic = reading_of_data.get_vocabulary_dic()  # get a dictionary from dictionary file
        for value in list_of_words:
            # get term id from voc_dic
            term_id = list(voc_dic.keys())[list(voc_dic.values()).index(value)]
            if str(term_id) in inverted_index:

                inverted_index[str(term_id)].append(key)
            else:
                inverted_index[str(term_id)] = [key]

    # saving it to Json file
    with open(dic_file_name, 'w') as fp:
        json.dump(inverted_index, fp, sort_keys=True, indent=4)

    return inverted_index

To run above functions we will calling above functions one by one for creating vocabulary file and inverted index file. 

In [None]:
# These methods are used to create vocabulary and dictionary files.
writing_of_data.create_vocabulary_file(len(df), FOLDER_NAME_FOR_TSV_File, VOC_FILE_NAME)
writing_of_data.create_inverted_index_file(len(df), FOLDER_NAME_FOR_TSV_File, INVERTED_INDEX_FILE_NAME)

After above functions run we will have vocabulary and inverted index file in the root[Main Directory] our repository named as **vocabulary.json** and **inverted_index.json**

### 3.1.2) Execute the query

Now that we have **vocabulary.json** file and **inverted_index.json**. We can do do the first part of our seach engine. Which is getting the document using And query containing all the words.  

I have also created another function in **reading_of_data.py** file named as **get_inverted_index_file** which will get the content of the **inverted_index_file.json** file and assign it to **inverted_index_dic** object. Let me write its definition here

In [13]:
def get_inverted_index_file(file_name='inverted_index.json"'):  # reading inverted_index dictionary file

    # Create an empty set for the vocabulary
    with open(file_name, 'r') as fp:
        data = json.load(fp)
    return data

After that I will get the input from the user and clean the query as I did before. Let me write those lines here. 

In [3]:
# # Now we have our inverted_index_file
inverted_index_dic = reading_of_data.get_inverted_index_file(INVERTED_INDEX_FILE_NAME)
query = input()
words = cleaning_of_data.remove_extras_from_query(query)

Book Your Group at One Location and Save!!\nVideo Tour Available Upon Request


Now inverted_index_dic will have all the inverted_index_dictionary items in **inverted_index_dic** object and **words**  will have the dictionary list of each word in the query. Just run it. 

In [4]:
words

['book',
 'group',
 'one',
 'locat',
 'save',
 'video',
 'tour',
 'avail',
 'upon',
 'request']

We will also use two more functions which are **run_simple_conjunctive_query** inside the **search_engine_processing.py** file and **output_results** which is inside the **writing_of_data.py** file. Let me post both definition here. 

In [5]:

def run_simple_conjunctive_query(words, inverted_index_items):
    query = list(words)

    # Each of the querys matches should contain all the words on the list query
    voc_dic = reading_of_data.get_vocabulary_dic()  # get a dictionary from dictionary file
    querys_matches = []
    for word in query:
        term_id = list(voc_dic.keys())[list(voc_dic.values()).index(word)]
        if term_id in inverted_index_items.keys():
            querys_matches.append(set(inverted_index_items[term_id]))
    # As a result we are going to have a list, with all the matches, called inter
    k = 0
    if len(querys_matches) > 0:
        inter = querys_matches[k]
        k += 1
        for i in range(1, len(querys_matches)):
            inter = inter.intersection(querys_matches[k])
            k += 1
    return inter

def output_results(folder_name, inter):
    if inter == set():
        print("No results were found with those characteristics")
        return
    else:
        index = 0
        inter = list(inter)
        cols_of_interest = ["Title", "Description", "City", "Url"]
        if len(inter) > 0:
            doc_id = inter[0]
            df = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
            df = df.rename(index=str,
                           columns={'title': 'Title',
                                    "description": "Description", "city": "City", "url": 'Url'})
            df = df.filter(cols_of_interest, axis=1)
        index += 1
        for i in range(1, len(inter)):
            doc_id = int(inter[i])
            cols_of_interest = ["Title", "Description", "City", "Url"]
            file = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
            file = file.rename(index=str, columns={'title': 'Title',
                                                   "description": "Description", "city": "City", "url": 'Url'})
            df = df.append(file.filter(cols_of_interest, axis=1), ignore_index=True, sort=False)

    df.reset_index(drop=True, inplace=True)
    print(df.to_string())
    return df


In [8]:
result_items = search_engine_processing.run_simple_conjunctive_query(words, inverted_index_dic)
df = writing_of_data.output_results(FOLDER_NAME_FOR_TSV_File, result_items)

**result_items** will contain the name of the document and **df** will have the all the dataframe rows after simple query

In [9]:
result_items

{4403, 16387, 17534}

In [10]:
df

Unnamed: 0,Title,Description,City,Url
0,Quiet Getaway Near I-35/UNT/TWU!!,Book Your Group at One Location and Save!!\nVi...,Denton,https://www.airbnb.com/rooms/15609235?location...
1,Quiet Getaway Near I-35/UNT/TWU!!,Book Your Group at One Location and Save!!\nVi...,Denton,https://www.airbnb.com/rooms/15609235?location...
2,Quiet Getaway Near I-35/UNT/TWU!!,Book Your Group at One Location and Save!!\nVi...,Denton,https://www.airbnb.com/rooms/15609235?location...


**As from above you can clearly see the And result. Notice it looks like they are same dataframe but they are not to clearify this I have to display urls.** 

In [14]:
list(df["Url"])

['https://www.airbnb.com/rooms/15609235?location=Aubrey%2C%20TX',
 'https://www.airbnb.com/rooms/15609235?location=Corinth%2C%20TX',
 'https://www.airbnb.com/rooms/15609235?location=Argyle%2C%20TX']

Now you you can see that all the urls are different. Lets move to 3.2 section of working on 
**3.2) Conjunctive query & Ranking score** 

### 3.2) Conjunctive query & Ranking score
1. Find all the documents that contains all the words in the query (as before...).
2. Sort them by their similarity with the query
3. Return in output k documents, or all the documents with non-zero similarity with the query when the results are less than k. You must use a heap data structure (you can use Python libraries) for maintaining the top-k documents.

To create a search engine which will fulfill above requirements we have to create new inverted index file as rquired to show **Term Frequency – Inverse Document Frequency** for each word in the document. Also we have to calculate Cosine Similarity for each document. We have used this url to understand how TFIDF works and what is cosine similarity 
https://janav.wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/

After that I have created **inverted_index_tfidf.json** using below function. But in the script it resides in **writing_of_data.py** file. 

In [16]:
def create_tfidf_inverted_index_file(df_length, folder_name, dic_file_name):
    # And an empty dictionary for storage the words for each document

    dictionary = {}
    dictionary_items = []
    # For every file...
    for i in range(df_length):
        doc = pd.read_csv(folder_name + '/doc_%s.tsv' % i, sep='\t')
        print("creating inverted index tfidf:" + str(i))

        # Concatenate the description and title in a string
        words = doc["description"][0] + doc["title"][0]

        words = cleaning_of_data.remove_extras_from_query(words)

        # Storage the words in vocabulary set
        dictionary.update({i: words})
        dictionary_items.append(" ".join(list(words)))

    # As a result of this problem, we have a vocabulary set with unique words
    # an a dictionary, with key: number of the document values: a list of all the words (filtered) in the Airbnb post

    # Create the index
    inverted_index = defaultdict(str)
    voc_long_string = reading_of_data.get_vocabulary_dic()
    voc_dic = reading_of_data.get_vocabulary_dic()
    file_items = " ".join(dictionary_items)
    tfidf = TfidfVectorizer(input=file_items, sublinear_tf=True)
    response = tfidf.fit_transform(dictionary_items)

    feature_names = tfidf.get_feature_names()

    for key, value in dictionary.items():
        print(" ---------- Document %s ------- " % key)
        feature_index = response[key, :].nonzero()[1]
        tfidf_scores = zip(feature_index, [response[key, x] for x in feature_index])
        for w, s in [(feature_names[i], s) for (i, s) in tfidf_scores]:
            term_id = list(voc_dic.keys())[list(voc_dic.values()).index(w)]
            if str(term_id) in inverted_index:
                inverted_index[str(term_id)].append({key: s})
            else:
                inverted_index[str(term_id)] = [{key: s}]
    # saving it to Json file
    with open(dic_file_name, 'w') as fp:
        json.dump(inverted_index, fp, sort_keys=True, indent=4)

    return inverted_index

For this We have used **sklearn** library and used particular namespace **TfidfVectorizer**. To use this correctly I have used https://github.com/mayank408/TFIDF/blob/master/Sklearn%20TFIDF.ipynb file as helper file. After that I know how it all works and I just continued on creating inverted_index_tfidf.json file. Lets run the function to see it in action. 

In [5]:
INVERTED_INDEX_TFIDF_FILE_NAME = "inverted_index_tfidf.json"

In [None]:
df = cleaning_of_data.open_csv_file_and_remove_extra_values(CSV_FILE_NAME)
writing_of_data.create_tfidf_inverted_index_file(len(df), FOLDER_NAME_FOR_TSV_File, INVERTED_INDEX_TFIDF_FILE_NAME)

Now above lines will create inverted index file with **tfidf** values. So to read the file we have to create function which will read the tfidf into dictionary object. For this we have created this function named as **get_tfidf_inverted_index_file** resides in **reading_of_data.py** file. Let me write the function here. 

In [19]:
def get_tfidf_inverted_index_file(file_name='inverted_index_tfidf.json"'):  # reading tfidf inverted_index dictionary file

    # Create an empty set for the vocabulary
    with open(file_name, 'r') as fp:
        data = json.load(fp)
    return data

After this we have to just call this function 

In [6]:
inverted_index_dic = reading_of_data.get_tfidf_inverted_index_file(INVERTED_INDEX_TFIDF_FILE_NAME)

In [24]:
list(inverted_index_dic.items())[:2] # only two elements of inverted_index_dic 

[('0', [{'1295': 0.3509427613911738}, {'15637': 0.5332534879691798}]),
 ('1',
  [{'33': 0.1668036991306726},
   {'42': 0.16853470157221048},
   {'68': 0.14250562593787316},
   {'77': 0.11374958724854627},
   {'98': 0.15002300650162795},
   {'138': 0.12824742576530407},
   {'232': 0.21031274165339703},
   {'234': 0.14246618989808812},
   {'235': 0.15019778013159235},
   {'287': 0.1207427602764078},
   {'309': 0.15228377972706172},
   {'317': 0.11207830683128686},
   {'327': 0.11082854898215158},
   {'360': 0.1286249832680968},
   {'363': 0.1702486806765383},
   {'374': 0.11273402296785252},
   {'426': 0.08645262523959589},
   {'445': 0.1367980077213027},
   {'459': 0.13691046155949088},
   {'461': 0.1748327559541605},
   {'500': 0.08695823736935757},
   {'534': 0.13730867748127507},
   {'545': 0.11793029673329705},
   {'563': 0.1482906348209445},
   {'589': 0.12148656843918638},
   {'609': 0.11896540969742374},
   {'648': 0.1423695184419867},
   {'664': 0.11995307795414739},
   {'688': 

From above you can clearly see that this is what we wanted from inverted_index_tfidf.json file. Now lets look at other question how we can use this dictionary items combined with query for this we have written a function in **search_engine_processing.py** file named as **run_cosine_similarity_tfidf_conjunctive_query** which will get above dictionary and user entered query. Let me write the definition of that function and its related function which are 


In [25]:
def run_cosine_similarity_tfidf_conjunctive_query(words, tdidf_inverted_index_items):
    query = list(words)

    # Each of the querys matches should contain all the words on the list query
    voc_dic = reading_of_data.get_vocabulary_dic()  # get a dictionary from dictionary file
    querys_matches = []
    for word in query:
        term_id = list(voc_dic.keys())[list(voc_dic.values()).index(word)]
        if term_id in tdidf_inverted_index_items.keys():
            querys_matches.append(list(tdidf_inverted_index_items[term_id]))
    # As a result we are going to have a list, with all the matches, called inter

    k = 0
    # documents_dictionary_with_tfidf = defaultdict(list)

    if len(querys_matches) > 0:
        inter = return_set_of_docs_from_tfidf_inverted_index_doc_item(querys_matches[k])
        k += 1
        for i in range(1, len(querys_matches)):
            inter = inter.intersection(return_set_of_docs_from_tfidf_inverted_index_doc_item(querys_matches[k]))
            k += 1

        documents_dictionary_with_tfidf = []

        for i in range(0, len(querys_matches)):
            common_docs = inter.intersection(return_set_of_docs_from_tfidf_inverted_index_doc_item(querys_matches[i]))
            documents_scores = get_tdidf_values_from_doc_ids(querys_matches[i], common_docs)
            documents_dictionary_with_tfidf.append(documents_scores)

        # I have 2D arry for each document in which columns contains tfidf values
        documents_dictionary_with_tfidf_matrix = np.array(documents_dictionary_with_tfidf)

        list_of_scores_for_query = []
        # I have to calculate the tdidf values for current sentence
        list_of_words = words
        voc_dic = reading_of_data.get_vocabulary_dic()  # get a dictionary from dictionary file
        voc_items = " ".join(voc_dic.values())
        sentence = " ".join(list_of_words)  # joining the sentence to use it later
        tfidf = TfidfVectorizer()
        tfidf_scores = tfidf.fit_transform([sentence])
        feature_names = tfidf.get_feature_names()
        for col in tfidf_scores.nonzero()[1]:
            word_item = feature_names[col]
            tfidf_score = tfidf_scores[0, col]
            list_of_scores_for_query.append(tfidf_score)

        # now I am going to calculate cosine similarity
        cosine_similarity_dictionary = defaultdict(str)
        for i in range(len(documents_dictionary_with_tfidf_matrix[0])):
            doc_n = documents_dictionary_with_tfidf_matrix[:, i]
            cosine_similarity_value = cosine(np.array(list_of_scores_for_query), doc_n)
            cosine_similarity_dictionary[list(common_docs)[i]] = cosine_similarity_value

        # sorting dictionary by value

        ordered_values = OrderedDict(sorted(cosine_similarity_dictionary.items(), key=lambda x: x[1]))

        priority_queue = []
        # priority queue
        for key, value in ordered_values.items():
            heapq.heappush(priority_queue,  (value * -1, key))

    return priority_queue
    # now we got the docs_ids which matched


def get_tdidf_values_from_doc_ids(items, common_docs):
    doc_ids = list()
    if len(items) > 0:
        for item in items:
            for key, value in item.items():  # iterating through dictionary it will only have one element
                for doc_id in common_docs:
                    if doc_id == key:
                        doc_ids.append(value)
                        break
    return doc_ids

For calculating consine similarity between two functions we have written a function named as cosine. It will get two vectors to caluclate the cosine similiraty between two docs. 


In [26]:
def cosine(v1, v2):
    v1 = np.array(v1)
    v2 = np.array(v2)
    prodouct_of_both = np.dot(v1, v2)
    square_root_of_v1 = np.sqrt(np.sum(v1 ** 2))
    square_root_of_v2 = np.sqrt(np.sum(v2 ** 2))
    prodcuct_of_square_root_of_both = (square_root_of_v1 * square_root_of_v2)
    cosine_value = (prodouct_of_both / prodcuct_of_square_root_of_both)
    return cosine_value

Please also notice in run_cosine_similarity_tfidf_conjunctive_query function that we have used different librarires for differnt functionalties. 
1. numpy for multiplying two vectors. 
2. collections for defaultdict
3. heapq for heap queue. [Specifically Max heap.]

There is no Max heap in python for this I have to multiply the actualy value to -1 before adding it to the queue and after getting its value from Queue. So the **run_cosine_similarity_tfidf_conjunctive_query** will return the proirity queue. 

Lets get the input from user and pass it to **run_cosine_similarity_tfidf_conjunctive_query** function. 

In [151]:
query = input()
words = cleaning_of_data.remove_extras_from_query(query)
result_items = search_engine_processing.run_cosine_similarity_tfidf_conjunctive_query(words, inverted_index_dic)

Easy access to highway


We have result_items object which is priority queue. Lets pass it over to our output function named as **output_results_cosine_similarity** which resides it in **writing_of_data.py** file. Let me write definition of it here. 

In [143]:
def output_results_cosine_similarity(folder_name, priority_queue):
    df = pd.DataFrame()
    cols_of_interest = ["Title", "Description", "City", "Url", "Similarity"]
    if priority_queue:
        value, key = heapq.heappop(priority_queue)
        doc_id = int(key)
        df = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
        df = df.rename(index=str,
                       columns={'title': 'Title',
                                "description": "Description", "city": "City", "url": 'Url'})
        df = df.filter(cols_of_interest, axis=1)
        df["Similarity"] = str((-1) * value)

        while priority_queue:
            value, key = heapq.heappop(priority_queue)
            doc_id = int(key)
            file = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
            file = file.rename(index=str, columns={'title': 'Title',
                                                   "description": "Description", "city": "City", "url": 'Url'})
            file["Similarity"] = str((-1) * value)
            df = df.append(file.filter(cols_of_interest, axis=1), ignore_index=True, sort=False)

        if df is not None:
            df.reset_index(drop=True, inplace=True)
            print(df.to_string())
            return df
        else:
            return
    else:
        print("No results were found with those characteristics")
        return

Lets see it in action. :) 

In [144]:
df = writing_of_data.output_results_cosine_similarity(FOLDER_NAME_FOR_TSV_File, result_items)
df

Unnamed: 0,Title,Description,City,Url,Similarity
0,Charm-Private Enrance-Good Location,With its own bath and your own private entranc...,San Antonio,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,"Book for Superbowl! 4/5 bedroom, 3.5 bath",30 minutes from NRG Stadium. My place is close...,Houston,https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,Beautiful home in Hurst. Easy access to Dallas/FW,Beautiful single story home completely updated...,Hurst,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,Superbowl Staycation Getaway,"My place is close to restaurants and dining, q...",Sugar Land,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,Clean & Comfortable Room w/King Bed,"Great location, easy access to the beach, kite...",Corpus Christi,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,Private Room Near the Airports,One of three private rooms in a two story home...,Irving,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,Amazing House - Near Katy Mills and Water Park,"Our house is beautiful, conveniently located, ...",Katy,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,Happy Home,This happy home is located very close to resta...,Fort Worth,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,Bedroom near shopping peaceful area,Our one bedroom is located in a nice two story...,Converse,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,Private Room Near Airports,One of three private rooms in a two story home...,Irving,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


You can clearly see the difference while looking at Similarity function. Now above section is completed for combining **Conjunctive query & Ranking score**. Lets move onto next part. All the above functionality code is inside the **search_engine.py** file which completes our script. 

### Step 4: Define a new score!

In [145]:
import pandas as pd

To define the new score, we use the following variables:
1. average_rate_per_night
2. bedrooms_count	
3. city
4. date_of_listing

Following we explain the functions that were used to compute the score, in each of the variables

#### 1. Average rate per night: 

This function compares the ideal rate for the user, which is explicitly asked, and the actual rate per nigth of each Airbnb, called by the variable doc_rate. Resulting in a rate between 0 and 1, called rate_score.

- The maximum rate_score you can have is 1, and it is reached when user_rate = doc_rate.
- The minimum rate_score you can have is 0, and it is reached when doc_rate it's more than 30 dollars away from user_rate
- The intermediate values are given by areas around our variable of interest: user_rate, giving more points to values closer and less to the farthest.

In [152]:
def rate_score(user_rate, doc_rate):
    if user_rate == doc_rate:
        rate_score = 1
    elif user_rate - 10 <= doc_rate <= user_rate + 10:
        rate_score = 0.75
    elif user_rate - 20 <= doc_rate <= user_rate + 20:
        rate_score = 0.5
    elif user_rate - 30 <= doc_rate <= user_rate + 30:
        rate_score = 0.25
    else:
        rate_score = 0
    return rate_score

In [153]:
rate_score(25, 45)

0.5

In [154]:
def get_data_frame_from_priority_queue(folder_name, priority_queue):
    df = pd.DataFrame()
    cols_of_interest = ["Title", "Description", "City", "Url", "Similarity"]
    if priority_queue:
        value, key = heapq.heappop(priority_queue)
        doc_id = int(key)
        df = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
        df["Similarity"] = str((-1) * value)

        while priority_queue:
            value, key = heapq.heappop(priority_queue)
            doc_id = int(key)
            file = pd.read_csv(folder_name + "/doc_%s.tsv" % doc_id, sep="\t")
            file["Similarity"] = str((-1) * value)
            df = df.append(file, ignore_index=True, sort=False)

        if df is not None:
            df.reset_index(drop=True, inplace=True)
            # print(df.to_string())
            return df
        else:
            return
    else:
        print("No results were found with those characteristics")
        return

Now we need whole dataframe rows rather than intrested rows which we get from above function. 

In [155]:
import heapq
dataframe = get_data_frame_from_priority_queue(FOLDER_NAME_FOR_TSV_File, result_items)

In [156]:
dataframe

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,$45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,$1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,$150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,$2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,$90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,$38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,$289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,$37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,$45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,$36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


In [157]:
df = dataframe
# Remove the $ 
df['average_rate_per_night'] = df['average_rate_per_night'].str.replace("$", '')
# Convert to integer
df['average_rate_per_night'] = pd.to_numeric(df['average_rate_per_night'])

In [133]:
df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


In order to execute the previous function, we need to make some small modifications to the data as removing the dollar sign and convert the variable to an integer.

#### 2. Bedrooms count:

This function compares the number of bedrooms required by the user, which is explicitly asked, and the actual number of bedrooms of each Airbnb, called by the variable doc_bedroom. Resulting in a rate between 0 and 1, called bedroom_score.

- The maximum bedroom_score you can have is 1, and it is reached when user_bedroom = doc_bedroom.
- The minimum bedroom_score you can have is 0, and it is reached when doc_bedroom it's more than 3 bedrooms away from user_bedroom.
- The intermediate values are given by areas around our variable of interest: user_bedrrom, giving more points to values closer and less to the farthest.

In [158]:
def bedroom_score(user_bedroom, doc_bedroom):
    if user_bedroom == doc_bedroom:
        bedroom_score = 1
    elif user_bedroom - 1 <= doc_bedroom <= user_bedroom + 1:
        bedroom_score = 0.75
    elif user_bedroom - 2 <= doc_bedroom <= user_bedroom + 2:
        bedroom_score = 0.5
    elif user_bedroom - 3 <= doc_bedroom <= user_bedroom + 3:
        bedroom_score = 0.25
    else:
        bedroom_score = 0
    return bedroom_score

In order to execute the previous function, we need to make some small modifications to the data as removing null values. In this case, the variable bedrooms_count contains numbers, and the string "Studio". A Studio is a small apartment which combines living room, bedroom, and kitchen into a single room, for that reason, we assume that when it says "Studio", it refers to a single bedroom. Ultimately we convert the variable to an integer.

In [159]:
# Remove null values
df = dataframe
df = df[pd.notnull(df["bedrooms_count"])]
df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


In [160]:
# # Replace Studio for 1
df['bedrooms_count'] = df['bedrooms_count'].replace('Studio', '1')

In [161]:
df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


In [162]:
# # Convert to integer
df['bedrooms_count'] = df['bedrooms_count'].astype(int)

In [163]:
df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


#### 3. City: 

This function compares the city preference specified by the user, and the actual city of each Airbnb, called by the variable doc_city. Resulting in a binary rate that takes the value 1 when user_city = doc_city, and 0 otherwise. 

In [96]:
pip install geopy # this is required it should be run outside


The following command must be run outside of the IPython shell:

    $ pip install geopy # this is required it should be run outside

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


In [97]:
import geopy.distance

In [164]:
def city_score(user_city_latitude, user_city_longitude, doc_city_latitude, doc_city_longitude):
    coords_1 = (user_city_latitude, user_city_longitude) #we have to found a way to get the coordinates from the name of the city
    coords_2 = (doc_city_latitude, doc_city_longitude)
    distance = geopy.distance.vincenty(coords_1, coords_2).km
    if 0 < distance < 5:
        city_score = 1
    elif 5 <= distance <= 10:
        city_score = 0.50
    elif 10 <= distance <= 20:
        city_score = 0.25
    else:
        city_score = 0.10
    return city_score

####  Final Score
To finish, we will define the function that calculates the final score for each Airbnb, doing a weighted average of our four previously calculated scores. As we do not find any relevant reason to assign more weight to one variable than to another, we weight each sub score, as 25% of the total.

Again we have a result that oscillates between 0 and 1, where 1 represents a greater similarity of Airbnb with the preferences of the user.

In [165]:
def final_score(rate_score, bedroom_score, city_score):
    return 0.25*rate_score + 0.25*bedroom_score + 0.25*city_score

### EXAMPLE

So we already have result from conjuctive query lets apply it on dataframe. Let me referesh the memory how it looks 

In [166]:
df

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,average_rate_per_night,bedrooms_count,city,date_of_listing,description,latitude,longitude,title,url,Similarity
0,16697,16698,45,1,San Antonio,March 2014,With its own bath and your own private entranc...,29.461442,-98.528715,Charm-Private Enrance-Good Location,https://www.airbnb.com/rooms/5183096?location=...,0.9933205307927605
1,3186,3187,1000,4,Houston,November 2016,30 minutes from NRG Stadium. My place is close...,29.771038,-95.717095,"Book for Superbowl! 4/5 bedroom, 3.5 bath",https://www.airbnb.com/rooms/16056676?location...,0.9933205307927605
2,1553,1554,150,3,Hurst,September 2016,Beautiful single story home completely updated...,32.815248,-97.195185,Beautiful home in Hurst. Easy access to Dallas/FW,https://www.airbnb.com/rooms/14917951?location...,0.9933205307927604
3,15627,15628,2000,4,Sugar Land,January 2017,"My place is close to restaurants and dining, q...",29.601298,-95.670353,Superbowl Staycation Getaway,https://www.airbnb.com/rooms/16854595?location...,0.9933205307927604
4,2149,2150,90,1,Corpus Christi,July 2016,"Great location, easy access to the beach, kite...",27.691706,-97.346182,Clean & Comfortable Room w/King Bed,https://www.airbnb.com/rooms/14057644?location...,0.9933205307927604
5,2414,2415,38,1,Irving,June 2016,One of three private rooms in a two story home...,32.930455,-96.965192,Private Room Near the Airports,https://www.airbnb.com/rooms/13602580?location...,0.9933205307927604
6,3223,3224,289,4,Katy,January 2014,"Our house is beautiful, conveniently located, ...",29.742156,-95.817678,Amazing House - Near Katy Mills and Water Park,https://www.airbnb.com/rooms/15911826?location...,0.9933205307927604
7,3601,3602,37,1,Fort Worth,August 2016,This happy home is located very close to resta...,32.870463,-97.307442,Happy Home,https://www.airbnb.com/rooms/14782837?location...,0.9933205307927604
8,5856,5857,45,1,Converse,March 2016,Our one bedroom is located in a nice two story...,29.517416,-98.295695,Bedroom near shopping peaceful area,https://www.airbnb.com/rooms/12201369?location...,0.9933205307927604
9,5986,5987,36,1,Irving,June 2016,One of three private rooms in a two story home...,32.929012,-96.966895,Private Room Near Airports,https://www.airbnb.com/rooms/13604316?location...,0.9933205307927604


Lets gather information about user's preferences

In [167]:
user_rate = int(input("How much is the ideal rate you would pay per night? "))
user_bedroom = int(input("How many bedrooms do you need? "))
user_city = input("In wich city do you prefer to stay? ")

How much is the ideal rate you would pay per night? 56
How many bedrooms do you need? 2
In wich city do you prefer to stay? dellas


In [168]:
# create a new column with empty values
df["New Score"] = ""

User entered the dellas and we can get it from google API for its coordinates secifically latitude and longitude using this approach. https://people.revoledu.com/kardi/tutorial/Python/Automatic+Geocoding+using+Python.html
But Right now I am considering we already have this. 

In [169]:
user_city_latitude = -37.751949
user_city_longitude = 145.123856

In [170]:
# fill that empty space with the result of computing the final score
for i in range(len(df)):
    rs = rate_score(user_rate, df["average_rate_per_night"][i])
    bs = bedroom_score(user_bedroom, df["bedrooms_count"][i])
    cs = city_score(user_city_latitude, user_city_longitude, df["latitude"][i], df["longitude"][i])
    fs = final_score(rs, bs, cs)
    df["New Score"][i] = fs

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys


Now we have a column named **New Score**. Lets sort the data frame accordingly and display 

In [171]:
# Sort the values by score in descending order
df = df.sort_values('New Score', ascending=False)
# Reset the index, using now the sorted values
df = df.reset_index(drop=True)
# Make the index start from 1
df.index += 1 
# Rename the index as Ranking
df.index.name = 'Ranking'
# Select only the columns of interest and show the results
cols_of_interes = ["title", "description", "city", "url"]
df[cols_of_interes]

Unnamed: 0_level_0,title,description,city,url
Ranking,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Beautiful 2 bedroom/ 2 bath Condo,"Very calm, quiet, and relaxing environment. Ga...",San Antonio,https://www.airbnb.com/rooms/17176922?location...
2,Beautiful 2 bedroom/ 2 bath Condo,"Very calm, quiet, and relaxing environment. Ga...",San Antonio,https://www.airbnb.com/rooms/17176922?location...
3,Beautiful 2 bedroom/ 2 bath Condo,"Very calm, quiet, and relaxing environment. Ga...",San Antonio,https://www.airbnb.com/rooms/17176922?location...
4,Beautiful 2 bedroom/ 2 bath Condo,"Very calm, quiet, and relaxing environment. Ga...",San Antonio,https://www.airbnb.com/rooms/17176922?location...
5,Quaint with quick access to local hotspots!,Conveniently located 1 bedroom 1 bath option w...,Prosper,https://www.airbnb.com/rooms/19555797?location...
6,Home away from home,My cool comfortable3 bedroom home with plenty ...,Carrollton,https://www.airbnb.com/rooms/16392671?location...
7,Entire Apartment Near Downtown Houston,"Three lights away from Downtown, you’ll love t...",Houston,https://www.airbnb.com/rooms/16808594?location...
8,Quaint with quick access to local hotspots!,Conveniently located 1 bedroom 1 bath option w...,Prosper,https://www.airbnb.com/rooms/19555797?location...
9,Modern Retreat #2 w/Private Bath,Bedroom with own private bathroom in new house...,Austin,https://www.airbnb.com/rooms/2138690?location=...
10,Cozy 1 bedroom/bathroom with pool,My cool and comfortable bedroom apartment feel...,Irving,https://www.airbnb.com/rooms/7276294?location=...


You can see that no rows have been deleted and we have sorted our data frame according to new score. 
**Ciao**