<img src="https://cdn-assets-cloud.frontify.com/local/frontify/eyJwYXRoIjoiXC9wdWJsaWNcL3VwbG9hZFwvc2NyZWVuc1wvMTk3OTA0XC80M2ZmNTdhYjc4OTdlZjUzY2IzMWUwNGU0MTVjZTY2NC0xNTYyMTAzMDk0LnBuZyJ9:frontify:7CTV2DtJsWvlctEUEyFK36JoXsZuVtHssMaDED6O5z0" width='150' />

# TEXT-SEARCH

__The ability to search & stack rank records based on a matching a search string - even if that search string is misspelled.__

## Introduction

MongoDB Atlas Search makes it easy to build fast, relevant, full-text search capabilities on top of your data in MongoDB Atlas. With search built directly into Atlas, there's no need to replicate data to separate search platform that needs to be set up, maintained, and scaled. Atlas Search is built on top of [Apache Lucene](https://lucene.apache.org/core/), the industry standard library for full-text Search.

<img src="https://webassets.mongodb.com/_com_assets/cms/mongotarchitecture-asa401j6rs.png" alt="Drawing" style="width: 600px;"/>

In [86]:
########## SETUP ##########
# Imports
import pymongo
import json
from pygments.style import Style
from pygments.token import Token
from pygments import highlight, lexers, formatters


# Initialization
# client = pymongo.MongoClient(<Atlas application connection string>)
db = client.sample_airbnb
listings_cltcn = db.listingsAndReviews

class MyStyle(Style):
        styles = {
            # https://pygments.org/docs/styles/#terminal-styles
            #Token.Name: 'ansiblack',
            Token.String:     'ansigreen',
            Token.Literal: 'ansibrightyellow',
            Token.Keyword: 'ansimagenta',
            Token.Operator: 'ansibrightmagenta',
            #Token.Punctuation: 'ansiblack'
        }
     
def color_print(doc):
    formatted_json = json.dumps(doc, indent=4)
    colorful_json = highlight(formatted_json, lexers.JsonLexer(), formatters.Terminal256Formatter(style=MyStyle))
    #colorful_json = highlight(formatted_json, lexers.JsonLexer(), formatters.Terminal256Formatter())
    print(colorful_json)

## Execution

In this example, MongoDB's Atlas Full-Text Search is used to perform various full text search queries against the ```listingsAndReviews``` collection. Here's an example listing:

```JSON
{
    "_id": "10066928",
    "access": "Le logement sera disponible en entier pour votre s\u00e9jour.",
    "accommodates": 6,
    "address": {
        "country": "Canada",
        "country_code": "CA",
        "government_area": "Le Plateau-Mont-Royal",
        "location": {
            "coordinates": [
                -73.57383,
                45.52233
            ],
            "is_location_exact": true,
            "type": "Point"
        },
        "market": "Montreal",
        "street": "Montr\u00e9al, Qu\u00e9bec, Canada",
        "suburb": "Le Plateau-Mont-Royal"
    },
    "amenities": [
        "Internet",
        "Wifi",
        "Kitchen",
        "Heating",
        "Family/kid friendly",
        "Washer",
        "Dryer",
        "Essentials",
        "Shampoo",
        "Hangers",
        "Hair dryer",
        "Iron",
        "Laptop friendly workspace"
    ],
    "availability": {
        "availability_30": 0,
        "availability_365": 0,
        "availability_60": 0,
        "availability_90": 0
    },
    "bathrooms": {
        "$numberDecimal": "1.0"
    },
    "bed_type": "Real Bed",
    "bedrooms": 3,
    "beds": 3,
    "calendar_last_scraped": {
        "$date": 1552276800000
    },
    "cancellation_policy": "flexible",
    "description": "Notre appartement comporte 3 chambres avec chacune un lit queen. Nous avons \u00e9galement un salon, une salle de bain avec baignoire, et une cuisine toute \u00e9quip\u00e9e, avec laveuse et s\u00e9cheuse. Notre logement est lumineux, plein de vie et chaleureux! Vous disposerez de l'appartement entier avec 3 chambres ferm\u00e9es, chacune avec 1 lit queen size. Le logement sera disponible en entier pour votre s\u00e9jour. N'h\u00e9sitez pas \u00e0 m'\u00e9crire pour toute demande de renseignement ! L'appartement se situe au coeur du Plateau, donc tr\u00e8s central pour toutes vos activit\u00e9s. \u00c0 1 min \u00e0 pied se trouvent bars, restaurants, magasins, \u00e9piceries. Le quartier est tr\u00e8s vivant et id\u00e9al pour d\u00e9couvrir Montr\u00e9al. L'appartement se situe \u00e0 \u00e9gale distance des m\u00e9tros Sherbrooke et Mont-Royal et \u00e0 proximit\u00e9 de nombreux arr\u00eats d'autobus.",
    "extra_people": {
        "$numberDecimal": "0.00"
    },
    "guests_included": {
        "$numberDecimal": "1"
    },
    "host": {
        "host_about": "",
        "host_has_profile_pic": true,
        "host_id": "9036477",
        "host_identity_verified": false,
        "host_is_superhost": false,
        "host_listings_count": 2,
        "host_location": "Montreal, Quebec, Canada",
        "host_name": "Margaux",
        "host_neighbourhood": "Le Plateau",
        "host_picture_url": "https://a0.muscache.com/im/users/9036477/profile_pic/1425247510/original.jpg?aki_policy=profile_x_medium",
        "host_thumbnail_url": "https://a0.muscache.com/im/users/9036477/profile_pic/1425247510/original.jpg?aki_policy=profile_small",
        "host_total_listings_count": 2,
        "host_url": "https://www.airbnb.com/users/show/9036477",
        "host_verifications": [
            "email",
            "phone",
            "reviews",
            "work_email"
        ]
    },
    "house_rules": "Merci de respecter ce lieu de vie.",
    "images": {
        "medium_url": "",
        "picture_url": "https://a0.muscache.com/im/pictures/f208bdd7-bdab-4d4d-b529-8e6b1e5a83c1.jpg?aki_policy=large",
        "thumbnail_url": "",
        "xl_picture_url": ""
    },
    "interaction": "N'h\u00e9sitez pas \u00e0 m'\u00e9crire pour toute demande de renseignement !",
    "last_scraped": {
        "$date": 1552276800000
    },
    "listing_url": "https://www.airbnb.com/rooms/10066928",
    "location": "CA",
    "maximum_nights": 1125,
    "minimum_nights": 1,
    "name": "3 chambres au coeur du Plateau",
    "neighborhood_overview": "L'appartement se situe au coeur du Plateau, donc tr\u00e8s central pour toutes vos activit\u00e9s. \u00c0 1 min \u00e0 pied se trouvent bars, restaurants, magasins, \u00e9piceries. Le quartier est tr\u00e8s vivant et id\u00e9al pour d\u00e9couvrir Montr\u00e9al.",
    "notes": "",
    "number_of_reviews": 0,
    "price": {
        "$numberDecimal": "140.00"
    },
    "property_type": "Apartment",
    "review_scores": {},
    "reviews": [],
    "room_type": "Entire home/apt",
    "space": "Notre logement est lumineux, plein de vie et chaleureux! Vous disposerez de l'appartement entier avec 3 chambres ferm\u00e9es, chacune avec 1 lit queen size.",
    "summary": "Notre appartement comporte 3 chambres avec chacune un lit queen. Nous avons \u00e9galement un salon, une salle de bain avec baignoire, et une cuisine toute \u00e9quip\u00e9e, avec laveuse et s\u00e9cheuse.",
    "transit": "L'appartement se situe \u00e0 \u00e9gale distance des m\u00e9tros Sherbrooke et Mont-Royal et \u00e0 proximit\u00e9 de nombreux arr\u00eats d'autobus."
}
```

Search Indexes are defined in Atlas under the Collections tab. An index has been created which includes the following fields:
* address
* cancellation_policy
* description
* location, and 
* name

<img src="https://raw.githubusercontent.com/wbleonard/share/master/mside.listingsAndReviews-SearchIndex.png" alt="Search Index" />

Below is the complete index definition. Even though multiple fields are indexed, this demo simply relies on the ```description``` field. However, the mSIDE application is using the `name` field:

```JSON
{
  "mappings": {
    "fields": {
      "address": {
        "fields": {
          "country": {
            "analyzer": "lucene.standard",
            "type": "string"
          },
          "location": {
            "type": "geo"
          },
          "market": [
            {
              "foldDiacritics": false,
              "maxGrams": 2,
              "minGrams": 1,
              "tokenization": "edgeGram",
              "type": "autocomplete"
            },
            {
              "analyzer": "lucene.standard",
              "type": "string"
            }
          ]
        },
        "type": "document"
      },
      "cancellation_policy": {
        "analyzer": "lucene.standard",
        "type": "string"
      },
      "description": {
        "analyzer": "lucene.standard",
        "type": "string"
      },
      "location": {
        "analyzer": "lucene.standard",
        "type": "string"
      },
      "name": [
        {
          "foldDiacritics": true,
          "maxGrams": 10,
          "minGrams": 2,
          "tokenization": "edgeGram",
          "type": "autocomplete"
        }
      ]
    }
  }
}
```
The `market` and `name` fields use the [autocomplete](https://docs.atlas.mongodb.com/reference/atlas-search/index-definitions/index.html#autocomplete) search type, which have been configured with the following options:
* foldDiacritics: true - diacritics are removed from the indexed text
* maxGrams: 10 - indexed terms are limited to 10 charcters
* minGrams: 2 - indexed terms have a minimum of 2 characters
* tokenization: "edgeGram" - indexing begins at the beginning of the word. 


### Test 1: Run an Atlas Search query looking for matches in a listing's description
In this test we search the ```description``` field for the word ```baseball```.

```JSON
 {
   "$search": {
       "text": {
           "path": "description",
           "query": "baseball"
       }
   }
 },
 ```

In [87]:
docs = listings_cltcn.aggregate([  
 {
   "$search": {
       "text": {
           "path": "description",
           "query": "baseball"
       }
   }
 },
  {
    "$limit": 5
  },
 {
   "$project": {
       "_id" : 0,
       "name": 1,
       "description": 1,
       'score': { "$meta": 'searchScore' },
   }
 }
])
for doc in docs:
    color_print(doc)  

{
    "name": [32m"Entire Private Brownstone in Brooklyn"[39m,
    "description": [32m"Entire newly furnished brownstone home in Bayridge/Dyker Heights area in Brooklyn. If you're looking for a clean, comfortable, and cozy home than look no further. This home is equiped with polished wooden floors, cathedral decorated windows, and natural artistic architecture structure. This two-family home is equipped with polished wooden floors, cathedral decorated windows, and natural artistic architecture structure. Must climb stairs. The apartment is on the 2nd floor of our two family house and please be aware there are tenants on the first floor so the main entrance will be shared. Yes we can be reached via (Email hidden by Airbnb) and here on the app. It's located next to Mckinley Park (tennis, basketball, baseball courts) across St. Ephrems Church, minutes from local shops, pizzeria, and of course easy public parking and transportation (bus stop is 40 feet from the house). Walking and bikin

### Test 2: Run an Atlas Search fuzzy query looking for matches including misspellings in a listing's description

In this test we search the ```description``` field for the word ```baseball``` but it's mispelled ```basball```.  Results are still found because we include the ```fuzzy``` field:

```JSON
 "fuzzy": { "maxEdits": 1, "prefixLength": 2 }
```
The ```maxEdits``` field is used to indicate that only one character variation is allowed for each term to match the query document. The ```prefixLength``` is used to to indicate that the first two characters of each term in the string ```basball``` may not be changed to match the query to a document.

In [88]:
docs = listings_cltcn.aggregate([  
 {
   "$search": {
       "text": {
           "path": "description",
           "query": "basball",
          "fuzzy": { "maxEdits": 1, "prefixLength": 2 }
       }
   }
 },
 {
   "$project": {
       "_id" : 0,
       "name": 1,
       "description": 1,
       'score': { "$meta": 'searchScore' },
   }
 }
])
for doc in docs:
    color_print(doc)   

{
    "name": [32m"Entire Private Brownstone in Brooklyn"[39m,
    "description": [32m"Entire newly furnished brownstone home in Bayridge/Dyker Heights area in Brooklyn. If you're looking for a clean, comfortable, and cozy home than look no further. This home is equiped with polished wooden floors, cathedral decorated windows, and natural artistic architecture structure. This two-family home is equipped with polished wooden floors, cathedral decorated windows, and natural artistic architecture structure. Must climb stairs. The apartment is on the 2nd floor of our two family house and please be aware there are tenants on the first floor so the main entrance will be shared. Yes we can be reached via (Email hidden by Airbnb) and here on the app. It's located next to Mckinley Park (tennis, basketball, baseball courts) across St. Ephrems Church, minutes from local shops, pizzeria, and of course easy public parking and transportation (bus stop is 40 feet from the house). Walking and bikin

### Test 3: Run an Atlas Search query using "phrase" and "slop" to find words that are in close proximity to one another

The [phrase](https://docs.atlas.mongodb.com/reference/atlas-search/phrase/index.html) operator performs a search for documents containing an ordered sequence of terms. 

```JSON
 {
   "$search": {
       "phrase": {
           "path": "description",
           "query": "spacious comfortable",
           "slop": 2
       }
   }
 }
 ```

The ```slop``` operator allows for a distance of 2 words between ```spacious``` and ```comfortable```. 

Notice the results and how the search terms are not exactly in the same order as the query.

In [89]:
docs = listings_cltcn.aggregate([  
 {
   "$search": {
       "phrase": {
           "path": "description",
           "query": "spacious comfortable",
           "slop": 2
       }
   }
 },
  {
    "$limit": 5
  },
 {
   "$project": {
       "_id" : 0,
       "name": 1,
       "description": 1,
       'score': { "$meta": 'searchScore' },
   }
 }
])
for doc in docs:
    color_print(doc)   

{
    "name": [32m"the best location"[39m,
    "description": [32m"located near the oratory st joseph, restaurants and bars, 10 minutes from the city center by car and 5 minutes to the station Snowdon walk. spacious and comfortable for a couple. well equipped: internet, TV, queen bed ..."[39m,
    "score": [93m1.8477208614349365[39m
}

{
    "name": [32m"Warm and Bright Studio in Hollywood Rd Sheung Wan"[39m,
    "description": [32m"Great location near the city centre with extremely convenient transportation. You may find it incredible how easy it is to find cultural or social events within walking distance.  Spacious, comfortable and well-equipped flat, best for a recharge for the next journeys to come. Bright, spacious studio with a cute balcony on the 4th Floor (no elevator) You can basically make yourself at home :) As you basically have the private access to the whole premises, the host won't be around. But his little assistant will take care of everything online and off.

### Test 4: Run a complex Atlas Search query on listings (e.g. geoWithin, compound, global shard key)

In this example, we are running a complicated search with the following conditions:

- the keyword "pool" must exist in the description
- the listing's location must be "US"
- the listings must be within 2000 ft from a given lat/lng point
- "garden" should be in the description if possible
- and the cancellation policy must NOT have a "strict" policy

Something to pay special attention to is that we are passing in the listing's location of "US" so we are directing this query to our global shard specifically for the US and not EMEA or APAC!

In [90]:
docs = listings_cltcn.aggregate([  
 {
     "$search": {
          "compound": {
            "must": [ 
              {
                "text": {
                  "query": "pool",
                  "path": "description"
                }
              },
              {
                "geoWithin": {
                  "circle": {
                    "center": {
                      "type": "Point",
                      "coordinates": [-74.00714, 40.71455]
                    },
                    "radius": 2000
                  },
                  "path": "address.location"
                }
              }
            ],
          "should": {
               "search": {
                   "path": "description",
                   "query": "garden"
               }
           },
           "mustNot": {
               "search": {
                   "path": "cancellation_policy",
                   "query": "strict",
                   "phrase": {"prefix": True}
               }
           }                  
          }
        }
      },
      {
       "$project": {
          "name": 1,
          "description": 1,
          "cancellation_policy": 1,
          "accommodates": 1,
          "bedrooms": 1,
          "bath": 1,
         # "price": 1,
          "images.picture_url": 1,
          "address.location": 1
         }
      }
])
for doc in docs:
    color_print(doc)   

{
    "_id": [32m"1931341"[39m,
    "name": [32m"Cozy bedroom on Wall Street"[39m,
    "description": [32m"Fully Furnished BedRoom available in a 4 bedroom apt in a luxury building Gym inside the building,Laundry inside the building Connected to subway trains : 2,3,4,5, A,B,C,J,Z, Numerous eating places and restaurants around Grocery stores just a block away 2 minutes walk from new york stock exchange, 10 minutes walk from Battery park,statue of liberty, 10 minutes walk from Wall street Bull. 10 minutes walk from shopping arcades like century 21 Huge lounge area with pool table and library , gym, laundry in the building"[39m,
    "cancellation_policy": [32m"flexible"[39m,
    "accommodates": [93m1[39m,
    "bedrooms": [93m1[39m,
    "images": {
        "picture_url": [32m"https://a0.muscache.com/im/pictures/26745759/982deeba_original.jpg?aki_policy=large"[39m
    },
    "address": {
        "location": {
            "type": [32m"Point"[39m,
            "coordinates": [
