Skip to content

Latest commit

 

History

History
1056 lines (921 loc) · 20.2 KB

app-search-api.asciidoc

File metadata and controls

1056 lines (921 loc) · 20.2 KB

App Search APIs

On this page

Initializing the Client

The AppSearch client can either be configured directly:

# Use the AppSearch client directly:
from elastic_enterprise_search import AppSearch

app_search = AppSearch(
    "http://localhost:3002",
    http_auth="private-..."
)
# Now call API methods
app_search.search(...)

…​or can be used via a configured EnterpriseSearch.app_search instance:

from elastic_enterprise_search import EnterpriseSearch

ent_search = EnterpriseSearch("http://localhost:3002")

# Configure authentication of the AppSearch instance
ent_search.app_search.http_auth = "private-..."

# Now call API methods
ent_search.app_search.search(...)

API Key Privileges

Using the APIs require a key with read, write or search access depending on the action. If you’re receiving an UnauthorizedError make sure the key you’re using in http_auth has the proper privileges.

Engine APIs

Engines index documents and perform search functions. To use App Search you must first create an Engine.

Create Engine

Let’s create an Engine named national-parks and uses English as a language:

# Request:
app_search.create_engine(
    engine_name="national-parks",
    language="en",
)

# Response:
{
  "name": "national-parks",
  "type": "default",
  "language": "en"
}

Get Engine

Once we’ve created an Engine we can look at it:

# Request:
app_search.get_engine(engine_name="national-parks")

# Response:
{
  "document_count": 0,
  "language": "en",
  "name": "national-parks",
  "type": "default"
}

List Engines

We can see all our Engines in the App Search instance:

# Request:
app_search.list_engines()

# Response:
{
  "meta": {
    "page": {
      "current": 1,
      "size": 25,
      "total_pages": 1,
      "total_results": 1
    }
  },
  "results": [
    {
      "document_count": 0,
      "language": "en",
      "name": "national-parks",
      "type": "default"
    }
  ]
}

Delete Engine

If we want to delete the Engine and all documents inside you can use the delete_engine() method:

# Request:
app_search.delete_engine(engine_name="national-parks")

# Response:
{
  "deleted": True
}

Document APIs

Create and index Documents

Once you’ve created an Engine you can start adding documents with the index_documents() method:

# Request:
app_search.index_documents(
    engine_name="national-parks",
    documents=[{
        "id": "park_rocky-mountain",
        "title": "Rocky Mountain",
        "nps_link": "https://www.nps.gov/romo/index.htm",
        "states": [
            "Colorado"
        ],
        "visitors": 4517585,
        "world_heritage_site": False,
        "location": "40.4,-105.58",
        "acres": 265795.2,
        "date_established": "1915-01-26T06:00:00Z"
    }, {
        "id": "park_saguaro",
        "title": "Saguaro",
        "nps_link": "https://www.nps.gov/sagu/index.htm",
        "states": [
            "Arizona"
        ],
        "visitors": 820426,
        "world_heritage_site": False,
        "location": "32.25,-110.5",
        "acres": 91715.72,
        "date_established": "1994-10-14T05:00:00Z"
    }]
)

# Response:
[
  {
    "errors": [],
    "id": "park_rocky-mountain"
  },
  {
    "errors": [],
    "id": "park_saguaro"
  }
]

List Documents

Both of our new documents indexed without errors.

Now we can look at our indexed documents in the engine:

# Request:
app_search.list_documents(engine_name="national-parks")

# Response:
{
  "meta": {
    "page": {
      "current": 1,
      "size": 100,
      "total_pages": 1,
      "total_results": 2
    }
  },
  "results": [
    {
      "acres": "91715.72",
      "date_established": "1994-10-14T05:00:00Z",
      "id": "park_saguaro",
      "location": "32.25,-110.5",
      "nps_link": "https://www.nps.gov/sagu/index.htm",
      "states": [
        "Arizona"
      ],
      "title": "Saguaro",
      "visitors": "820426",
      "world_heritage_site": "false"
    },
    {
      "acres": "265795.2",
      "date_established": "1915-01-26T06:00:00Z",
      "id": "park_rocky-mountain",
      "location": "40.4,-105.58",
      "nps_link": "https://www.nps.gov/romo/index.htm",
      "states": [
        "Colorado"
      ],
      "title": "Rocky Mountain",
      "visitors": "4517585",
      "world_heritage_site": "false"
    }
  ]
}

Get Documents by ID

You can also retrieve a set of documents by their id with the get_documents() method:

# Request:
app_search.get_documents(
    engine_name="national-parks",
    document_ids=["park_rocky-mountain"]
)

# Response:
[
  {
    "acres": "265795.2",
    "date_established": "1915-01-26T06:00:00Z",
    "id": "park_rocky-mountain",
    "location": "40.4,-105.58",
    "nps_link": "https://www.nps.gov/romo/index.htm",
    "states": [
      "Colorado"
    ],
    "title": "Rocky Mountain",
    "visitors": "4517585",
    "world_heritage_site": "false"
  }
]

Update existing Documents

You can update documents with the put_documents() method:

# Request:
resp = app_search.put_documents(
    engine_name="national-parks",
    documents=[{
        "id": "park_rocky-mountain",
        "visitors": 10000000
    }]
)

# Response:
[
  {
    "errors": [],
    "id": "park_rocky-mountain"
  }
]

Delete Documents

You can delete documents from an Engine with the delete_documents() method:

# Request:
resp = app_search.delete_documents(
    engine_name="national-parks",
    document_ids=["park_rocky-mountain"]
)

# Response:
[
  {
    "deleted": True,
    "id": "park_rocky-mountain"
  }
]

Schema APIs

Now that we’ve indexed some data we should take a look at the way the data is being indexed by our Engine.

Get Schema

First take a look at the existing Schema inferred from our data:

# Request:
resp = app_search.get_schema(
    engine_name="national-parks"
)

# Response:
{
  "acres": "text",
  "date_established": "text",
  "location": "text",
  "nps_link": "text",
  "states": "text",
  "title": "text",
  "visitors": "text",
  "world_heritage_site": "text"
}

Update Schema

Looks like the date_established field wasn’t indexed as a date as desired. Update the type of the date_established field:

# Request:
resp = app_search.put_schema(
    engine_name="national-parks",
    schema={
        "date_established": "date"
    }
)

# Response:
{
  "acres": "number",
  "date_established": "date",  # Type has been updated!
  "location": "geolocation",
  "nps_link": "text",
  "square_km": "number",
  "states": "text",
  "title": "text",
  "visitors": "number",
  "world_heritage_site": "text"
}

Search APIs

Once documents are ingested and the Schema is set properly you can use the search() method to search through an Engine for matching documents.

The Search API has many options, read the Search API documentation for a list of all options.

# Request:
resp = app_search.search(
    engine_name="national-parks",
    body={
        "query": "rock"
    }
)

# Response:
{
  "meta": {
    "alerts": [],
    "engine": {
      "name": "national-parks-demo",
      "type": "default"
    },
    "page": {
      "current": 1,
      "size": 10,
      "total_pages": 2,
      "total_results": 15
    },
    "request_id": "6266df8b-8b19-4ff0-b1ca-3877d867eb7d",
    "warnings": []
  },
  "results": [
    {
      "_meta": {
        "engine": "national-parks-demo",
        "id": "park_rocky-mountain",
        "score": 6776379.0
      },
      "acres": {
        "raw": 265795.2
      },
      "date_established": {
        "raw": "1915-01-26T06:00:00+00:00"
      },
      "id": {
        "raw": "park_rocky-mountain"
      },
      "location": {
        "raw": "40.4,-105.58"
      },
      "nps_link": {
        "raw": "https://www.nps.gov/romo/index.htm"
      },
      "square_km": {
        "raw": 1075.6
      },
      "states": {
        "raw": [
          "Colorado"
        ]
      },
      "title": {
        "raw": "Rocky Mountain"
      },
      "visitors": {
        "raw": 4517585.0
      },
      "world_heritage_site": {
        "raw": "false"
      }
    }
  ]
}

Multiple searches can be executed at the same time with the multi_search() method:

# Request:
resp = app_search.multi_search(
    engine_name="national-parks",
    body={
        "queries": [
            {"query": "rock"},
            {"query": "lake"}
        ]
    }
)

# Response:
[
  {
    "meta": {
      "alerts": [],
      "engine": {
        "name": "national-parks-demo",
        "type": "default"
      },
      "page": {
        "current": 1,
        "size": 1,
        "total_pages": 15,
        "total_results": 15
      },
      "warnings": []
    },
    "results": [
      {
        "_meta": {
          "engine": "national-parks",
          "id": "park_rocky-mountain",
          "score": 6776379.0
        },
        "acres": {
          "raw": 265795.2
        },
        "date_established": {
          "raw": "1915-01-26T06:00:00+00:00"
        },
        "id": {
          "raw": "park_rocky-mountain"
        },
        "location": {
          "raw": "40.4,-105.58"
        },
        "nps_link": {
          "raw": "https://www.nps.gov/romo/index.htm"
        },
        "square_km": {
          "raw": 1075.6
        },
        "states": {
          "raw": [
            "Colorado"
          ]
        },
        "title": {
          "raw": "Rocky Mountain"
        },
        "visitors": {
          "raw": 4517585.0
        },
        "world_heritage_site": {
          "raw": "false"
        }
      }
    ]
  },
  ...
]

Curation APIs

Curations hide or promote result content for pre-defined search queries.

Create Curation

# Request:
resp = app_search.create_curation(
    engine_name="national-parks",
    queries=["rocks", "rock", "hills"],
    promoted_doc_ids=["park_rocky-mountains"],
    hidden_doc_ids=["park_saguaro"]
)

# Response:
{
  "id": "cur-6011f5b57cef06e6c883814a"
}

Get Curation

# Request:
resp = app_search.get_curation(
    engine_name="national-parks",
    curation_id="cur-6011f5b57cef06e6c883814a"
)
{
  "hidden": [
    "park_saguaro"
  ],
  "id": "cur-6011f5b57cef06e6c883814a",
  "promoted": [
    "park_rocky-mountains"
  ],
  "queries": [
    "rocks",
    "rock",
    "hills"
  ]
}

Update Curation

# Request:
app_search.put_curation(
  engine_name='my-engine',
  curation_id='cur-6011f5b57cef06e6c883814a',
  queries=["foo", "bar"],
  promoted=["doc-1", "doc-2"],
  hidden=["doc-3"]
)
# Response:
{
  "id": "cur-6011f5b57cef06e6c883814a"
}

List Curations

# Request:
app_search.list_curations(
    engine_name="national-parks"
)

Delete Curation

# Request:
app_search.delete_curation(
    engine_name="national-parks",
    curation_id="cur-6011f5b57cef06e6c883814a"
)

Meta Engine APIs

Meta Engines is an Engine that has no documents of its own, instead it combines multiple other Engines so that they can be searched together as if they were a single Engine.

The Engines that comprise a Meta Engine are referred to as "Source Engines".

Create Meta Engine

Creating a Meta Engine uses the create_engine() method and set the type parameter to "meta".

# Request:
app_search.create_engine(
    engine_name="meta-engine",
    type="meta",
    source_engines=["national-parks"]
)

# Response:
{
  "document_count": 1,
  "name": "meta-engine",
  "source_engines": [
    "national-parks"
  ],
  "type": "meta"
}

Searching Documents from a Meta Engine

# Request:
app_search.search(
    engine_name="meta-engine",
    body={
        "query": "rock"
    }
)

# Response:
{
  "meta": {
    "alerts": [],
    "engine": {
      "name": "meta-engine",
      "type": "meta"
    },
    "page": {
      "current": 1,
      "size": 10,
      "total_pages": 1,
      "total_results": 1
    },
    "request_id": "aef3d3d3-331c-4dab-8e77-f42e4f46789c",
    "warnings": []
  },
  "results": [
    {
      "_meta": {
        "engine": "national-parks",
        "id": "park_black-canyon-of-the-gunnison",
        "score": 2.43862
      },
      "id": {
        "raw": "national-parks|park_black-canyon-of-the-gunnison"
      },
      "nps_link": {
        "raw": "https://www.nps.gov/blca/index.htm"
      },
      "square_km": {
        "raw": 124.4
      },
      "states": {
        "raw": [
          "Colorado"
        ]
      },
      "title": {
        "raw": "Black Canyon of the Gunnison"
      },
      "world_heritage_site": {
        "raw": "false"
      }
    }
  ]
}

Notice how the id of the result we receive (national-parks|park_black-canyon-of-the-gunnison) includes a prefix of the Source Engine that the result is from to distinguish them from results with the same id but different Source Engine within a search result.

Adding Source Engines to an existing Meta Engine

If we have an existing Meta Engine named meta-engine we can add additional Source Engines to it with the add_meta_engine_source() method. Here we add the state-parks Engine:

# Request:
app_search.add_meta_engine_source(
    engine_name="meta-engine",
    source_engines=["state-parks"]
)

# Response:
{
  "document_count": 1,
  "name": "meta-engine",
  "source_engines": [
    "national-parks",
    "state-parks"
  ],
  "type": "meta"
}

Removing Source Engines from a Meta Engine

If we change our mind about state-parks being a Source Engine for meta-engine we can use the delete_meta_source_engines() method:

# Request:
app_search.delete_meta_engine_source(
    engine_name="meta-engine",
    source_engines=["state-parks"]
)

# Response:
{
  "document_count": 1,
  "name": "meta-engine",
  "source_engines": [
    "national-parks"
  ],
  "type": "meta"
}

Web Crawler APIs

Domains

# Create a domain
resp = app_search.create_crawler_domain(
  engine_name="crawler-engine",
  body={
    "name": "https://example.com"
  }
)
domain_id = resp["id"]

# Get a domain
app_search.get_crawler_domain(
  engine_name="crawler-engine",
  domain_id=domain_id
)

# Update a domain
app_search.put_crawler_domain(
  engine_name="crawler-engine",
  domain_id=domain_id,
  body={
    ...
  }
)

# Delete a domain
app_search.delete_crawler_domain(
  engine_name="crawler-engine",
  domain_id=domain_id
)

# Validate a domain
app_search.get_crawler_domain_validation_result(
  body={
    "url": "https://example.com",
    "checks": [
      "dns",
      "robots_txt",
      "tcp",
      "url",
      "url_content",
      "url_request"
    ]
  }
)

# Extract content from a URL
app_search.get_crawler_url_extraction_result(
  engine_name="crawler-engine",
  body={
    "url": "https://example.com"
  }
)

# Trace a URL
app_search.get_crawler_url_tracing_result(
  engine_name="crawler-engine",
  body={
    "url": "https://example.com"
  }
)

Crawls

# Get the active crawl
app_search.get_crawler_active_crawl_request(
  engine_name="crawler-engine",
)

# Start a crawl
app_search.create_crawler_crawl_request(
  engine_name="crawler-engine"
)

# Cancel the active crawl
app_search.delete_crawler_active_crawl_request(
  engine_name="crawler-engine"
)

Entry Points

# Create an entry point
resp = app_search.create_crawler_entry_point(
  engine_name="crawler-engine",
  body={
    "value": "/blog"
  }
)
entry_point_id = resp["id"]

# Delete an entry point
app_search.delete_crawler_entry_point(
  engine_name="crawler-engine",
  entry_point_id=entry_point_id
)

Crawl Rules

# Create a crawl rule
resp = app_search.create_crawler_crawl_rule(
  engine_name="crawler-engine",
  domain_id=domain_id,
  body={
    "policy": "deny",
    "rule": "ends",
    "pattern": "/dont-crawl"
  }
)
crawl_rule_id = resp["id"]

# Delete a crawl rule
app_search.delete_crawler_crawl_rule(
  engine_name="crawler-engine",
  domain_id=domain_id,
  crawl_rule_id=crawl_rule_id
)

Sitemaps

# Create a sitemap
resp = app_search.create_crawler_sitemap(
  engine_name="crawler-engine",
  domain_id=domain_id,
  url="https://example.com/sitemap.xml"
)
sitemap_id = resp["id"]

# Delete a sitemap
app_search.delete_crawler_sitemap(
  engine_name="crawler-engine",
  domain_id=domain_id,
  sitemap_id=sitemap_id
)

Adaptive Relevance APIs

Settings

# Get adaptive relevenace settings for an Engine
app_search.get_adaptive_relevance_settings(
  engine_name="adaptive-engine"
)
{
  "curation": {
    "enabled": True,
    "mode": "manual",
    "timeframe": 7,
    "max_size": 3,
    "min_clicks": 20,
    "schedule_frequency": 1,
    "schedule_unit": "day"
  }
}

# Enable automatic adaptive relevance
app_search.put_adaptive_relevance_settings(
  engine_name="adaptive-engine",
  curation={
    "mode": "automatic"
  }
)

Suggestions

# List all adaptive relevance suggestions for an engine
app_search.list_adaptive_relevance_suggestions(
  engine_name="adaptive-engine"
)
{
  "meta": {
    "page": {
      "current": 1,
      "total_pages": 1,
      "total_results": 2,
      "size": 25
    }
  },
  "results": [
    {
      "query": "forest",
      "type": "curation",
      "status": "pending",
      "updated_at": "2021-09-02T07:22:23Z",
      "created_at": "2021-09-02T07:22:23Z",
      "promoted": [
        "park_everglades",
        "park_american-samoa",
        "park_arches"
      ],
      "operation": "create"
    },
    {
      "query": "park",
      "type": "curation",
      "status": "pending",
      "updated_at": "2021-10-22T07:34:12Z",
      "created_at": "2021-10-22T07:34:54Z",
      "promoted": [
        "park_yellowstone"
      ],
      "operation": "create",
      "override_manual_curation": true
    }
  ]
}

# Get adaptive relevance suggestions for a query
app_search.get_adaptive_relevance_suggestions(
  engine_name="adaptive-engine",
  query="forest",
)
{
  "meta": {
    "page": {
      "current": 1,
      "total_pages": 1,
      "total_results": 1,
      "size": 25
    }
  },
  "results": [
    {
      "query": "forest",
      "type": "curation",
      "status": "pending",
      "updated_at": "2021-09-02T07:22:23Z",
      "created_at": "2021-09-02T07:22:23Z",
      "promoted": [
        "park_everglades",
        "park_american-samoa",
        "park_arches"
      ],
      "operation": "create"
    }
  ]
}

# Update status of adaptive relevance suggestions
app_search.put_adaptive_relevance_suggestions(
  engine_name="adaptive-engine",
  suggestions=[
    {"query": "forest", "type": "curation", "status": "applied"},
    {"query": "mountain", "type": "curation", "status": "rejected"}
  ]
)