0.8.0 draft - Part 1 #1

jasonbosco · 2018-01-02T04:41:40Z

This version aims to make the API endpoints and specifically the request/response formats more consistent with each other.

Goals:

Improve Developer Experience by making the API requests/responses consistent
Make it easy to implement client libraries, by standardizing responses.

Here's a summary of the changes:

API Routes:

The visual diff of api-routes.png in the PR below summarizes the API endpoint changes well.

Create a new sub-resource called documents under collections and update the end-points correspondingly:

So to index a document or technically create a document, instead of posting to /collections/{collectionName} we'd now post to /collections/{collectionName}/documents. Currently a POST to /collections/{collectionName} could mean that we're updating the collection definition itself, which is not the case.
/collections/{collectionName}/search becomes /collections/{collectionName}/documents/search since we're technically searching the documents within a collection rather than the collection (schema) itself. This removes any confusion.
/collections/{collectionName}/export becomes /collections/{collectionName}/documents/export. Same as above. We're exporting the documents and not the collection schema.
/collections/{collectionName}/{documentId} becomes /collections/{collectionName}/documents/{documentId} to keep it contained within the documents namespace within collections.

API Request:

This is not addressed yet in the spec below, but something for us to discuss is how we can avoid using : (colon) in the search endpoint's filter_by and sort_by fields since colons can't be part of URLs as per spec. They need to be URL encoded.

API Response:

I like how when creating a new Collection, the response contains the whole collection schema again. We should do the same thing when creating a document. Currently, it only returns the ID of the document. Instead, we should return the entire document after indexing it. This keeps the API response format consistent anytime the API returns a Document.
Similarly, in the GET /collections response, it currently only returns two attributes of a collection (name and num_documents). We should just return all attributes for the collection entity, for all entities. This way the response is consistent anytime a collection shows up in the response.
Only the GET /collections endpoint response is nested under a root-level attribute called collections. We should return an array of collection objects as the response.
When a collection is DELETEd, we should also return the whole collection entity in the response.
When a document is DELETEd, we should also return the whole document entity in the response.

A couple of other general observations:

I didn't find documentation for the response fields returned by the search endpoint. It's mostly intuitive, but would be good to explicitly document each attribute.
The search result response mutates the Document data model by injecting an _highlight field inside the document, which seems a little odd to me. How about instead something like this:

{
  "facet_counts": [],
  "found": 1,
  "took_ms": 1,
  "hits": [
    {
      "document": {
        "id": "124",
        "company_name": "Stark Industries",
        "num_employees": 5215,
        "country": "USA"
      }
      "_highlight": {
        "description": "<mark>Stark</mark> Industries"
      },
    }
  ]
}

Is performance the reason why the /export end-point returns JSON lines format? Does it stream the output line-by-line? It would make the API more consistent if we just returned an array of documents that people can then parse using say jq.

jasonbosco · 2018-01-02T04:42:10Z

@kishorenc See above

kishorenc · 2018-01-02T16:16:12Z

@jasonbosco

Great review - agree with most of those suggestions and I have gone ahead and fixed them on this branch: https://github.com/wreally/typesense/tree/0.8-api-changes

You can test via the docker image typesense/typesense:0.8-api-changes

The issue with the CORS header is also fixed in the above image.

something for us to discuss is how we can avoid using : (colon)

I don't think this is an issue. For e.g. see Github's search API which uses colon and this Stackoverflow answer.

I didn't find documentation for the response fields returned by the search endpoint. It's mostly intuitive, but would be good to explicitly document each attribute.

I will add this to the documentation once we freeze on the API spec, along with the API end-point changes.

Is performance the reason why the /export end-point returns JSON lines format?

As an end-user of APIs that used to return big, raw JSON in the past, I hated that. Not all languages have a nice streaming JSON parser. Without that, you need to buffer the response in memory and parse it. JSON line is an elegant substitute as an export format. It's a cross between CSV and pure JSON.

jasonbosco · 2018-01-03T15:45:05Z

Great!

I'm running into some other CORS issue now. I'll investigate further and reply in typesense/typesense#7.

I don't think this is an issue. For e.g. see Github's search API which uses colon and this Stackoverflow answer.

The main reason I brought it up was because Swagger Editor was automatically URL-encoding the parameters. Let me see if there's a way to disable that.

As an end-user of APIs that used to return big, raw JSON in the past, I hated that. Not all languages have a nice streaming JSON parser. Without that, you need to buffer the response in memory and parse it. JSON line is an elegant substitute as an export format. It's a cross between CSV and pure JSON.

👍

kishorenc · 2018-01-04T02:25:56Z

@jasonbosco

Swagger Editor was automatically URL-encoding the parameters

That's not a problem. The HTTP server automatically decodes the value. I just verified it by sending sort_by=points%3Adesc instead of sort_by=points:desc - it worked.

Similarly, a space in the "q" (query) field is also handled. For e.g.

http://localhost:8108/collections/<collection_name>/documents/search?q=steve jobs&prefix=false&query_by=title&sort_by=points:desc&per_page=10&num_typos=2

Notice the space in the query steve jobs. That will be encoded by Swagger as steve%20jobs and will be decoded by Typesense's HTTP server.

jasonbosco added 4 commits January 1, 2018 19:34

Added 0.7.2-draft-1

bfa8750

Merge branch 'master' into 0.7.2-draft

6db04f1

Updated visuals

bf2458f

Merge branch 'master' into 0.7.2-draft

3e7593c

jasonbosco requested a review from kishorenc January 2, 2018 04:41

jasonbosco mentioned this pull request Jan 3, 2018

Add support for CORS typesense/typesense#7

Closed

jasonbosco changed the title ~~0.7.2 draft~~ 0.8.0 draft Jan 4, 2018

jasonbosco closed this Jan 4, 2018

jasonbosco deleted the 0.7.2-draft branch January 4, 2018 15:35

jasonbosco mentioned this pull request Jan 4, 2018

0.8.0 draft - Part 1 merge #2

Merged

jasonbosco changed the title ~~0.8.0 draft~~ 0.8.0 draft - Part 1 Jan 4, 2018

jasonbosco mentioned this pull request Jan 4, 2018

0.8.0 draft - Part 2 #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.8.0 draft - Part 1 #1

0.8.0 draft - Part 1 #1

jasonbosco commented Jan 2, 2018 •

edited

jasonbosco commented Jan 2, 2018

kishorenc commented Jan 2, 2018

jasonbosco commented Jan 3, 2018 •

edited

kishorenc commented Jan 4, 2018

0.8.0 draft - Part 1 #1

0.8.0 draft - Part 1 #1

Conversation

jasonbosco commented Jan 2, 2018 • edited

API Routes:

API Request:

API Response:

jasonbosco commented Jan 2, 2018

kishorenc commented Jan 2, 2018

jasonbosco commented Jan 3, 2018 • edited

kishorenc commented Jan 4, 2018

jasonbosco commented Jan 2, 2018 •

edited

jasonbosco commented Jan 3, 2018 •

edited