Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8.0 draft - Part 1 #1

Closed
wants to merge 4 commits into from
Closed

0.8.0 draft - Part 1 #1

wants to merge 4 commits into from

Conversation

jasonbosco
Copy link
Member

@jasonbosco jasonbosco commented Jan 2, 2018

This version aims to make the API endpoints and specifically the request/response formats more consistent with each other.

Goals:

  • Improve Developer Experience by making the API requests/responses consistent
  • Make it easy to implement client libraries, by standardizing responses.

Here's a summary of the changes:

API Routes:

The visual diff of api-routes.png in the PR below summarizes the API endpoint changes well.

Create a new sub-resource called documents under collections and update the end-points correspondingly:

  • So to index a document or technically create a document, instead of posting to /collections/{collectionName} we'd now post to /collections/{collectionName}/documents. Currently a POST to /collections/{collectionName} could mean that we're updating the collection definition itself, which is not the case.
  • /collections/{collectionName}/search becomes /collections/{collectionName}/documents/search since we're technically searching the documents within a collection rather than the collection (schema) itself. This removes any confusion.
  • /collections/{collectionName}/export becomes /collections/{collectionName}/documents/export. Same as above. We're exporting the documents and not the collection schema.
  • /collections/{collectionName}/{documentId} becomes /collections/{collectionName}/documents/{documentId} to keep it contained within the documents namespace within collections.

API Request:

  • This is not addressed yet in the spec below, but something for us to discuss is how we can avoid using : (colon) in the search endpoint's filter_by and sort_by fields since colons can't be part of URLs as per spec. They need to be URL encoded.

API Response:

  • I like how when creating a new Collection, the response contains the whole collection schema again. We should do the same thing when creating a document. Currently, it only returns the ID of the document. Instead, we should return the entire document after indexing it. This keeps the API response format consistent anytime the API returns a Document.
  • Similarly, in the GET /collections response, it currently only returns two attributes of a collection (name and num_documents). We should just return all attributes for the collection entity, for all entities. This way the response is consistent anytime a collection shows up in the response.
  • Only the GET /collections endpoint response is nested under a root-level attribute called collections. We should return an array of collection objects as the response.
  • When a collection is DELETEd, we should also return the whole collection entity in the response.
  • When a document is DELETEd, we should also return the whole document entity in the response.

A couple of other general observations:

  • I didn't find documentation for the response fields returned by the search endpoint. It's mostly intuitive, but would be good to explicitly document each attribute.
  • The search result response mutates the Document data model by injecting an _highlight field inside the document, which seems a little odd to me. How about instead something like this:
{
  "facet_counts": [],
  "found": 1,
  "took_ms": 1,
  "hits": [
    {
      "document": {
        "id": "124",
        "company_name": "Stark Industries",
        "num_employees": 5215,
        "country": "USA"
      }
      "_highlight": {
        "description": "<mark>Stark</mark> Industries"
      },
    }
  ]
}
  • Is performance the reason why the /export end-point returns JSON lines format? Does it stream the output line-by-line? It would make the API more consistent if we just returned an array of documents that people can then parse using say jq.

@jasonbosco
Copy link
Member Author

@kishorenc See above

@kishorenc
Copy link
Member

@jasonbosco

Great review - agree with most of those suggestions and I have gone ahead and fixed them on this branch: https://github.com/wreally/typesense/tree/0.8-api-changes

You can test via the docker image typesense/typesense:0.8-api-changes

The issue with the CORS header is also fixed in the above image.

something for us to discuss is how we can avoid using : (colon)

I don't think this is an issue. For e.g. see Github's search API which uses colon and this Stackoverflow answer.

I didn't find documentation for the response fields returned by the search endpoint. It's mostly intuitive, but would be good to explicitly document each attribute.

I will add this to the documentation once we freeze on the API spec, along with the API end-point changes.

Is performance the reason why the /export end-point returns JSON lines format?

As an end-user of APIs that used to return big, raw JSON in the past, I hated that. Not all languages have a nice streaming JSON parser. Without that, you need to buffer the response in memory and parse it. JSON line is an elegant substitute as an export format. It's a cross between CSV and pure JSON.

@jasonbosco
Copy link
Member Author

jasonbosco commented Jan 3, 2018

Great!

I'm running into some other CORS issue now. I'll investigate further and reply in typesense/typesense#7.

I don't think this is an issue. For e.g. see Github's search API which uses colon and this Stackoverflow answer.

The main reason I brought it up was because Swagger Editor was automatically URL-encoding the parameters. Let me see if there's a way to disable that.

As an end-user of APIs that used to return big, raw JSON in the past, I hated that. Not all languages have a nice streaming JSON parser. Without that, you need to buffer the response in memory and parse it. JSON line is an elegant substitute as an export format. It's a cross between CSV and pure JSON.

👍

@kishorenc
Copy link
Member

@jasonbosco

Swagger Editor was automatically URL-encoding the parameters

That's not a problem. The HTTP server automatically decodes the value. I just verified it by sending sort_by=points%3Adesc instead of sort_by=points:desc - it worked.

Similarly, a space in the "q" (query) field is also handled. For e.g.

http://localhost:8108/collections/<collection_name>/documents/search?q=steve jobs&prefix=false&query_by=title&sort_by=points:desc&per_page=10&num_typos=2

Notice the space in the query steve jobs. That will be encoded by Swagger as steve%20jobs and will be decoded by Typesense's HTTP server.

@jasonbosco jasonbosco changed the title 0.7.2 draft 0.8.0 draft Jan 4, 2018
@jasonbosco jasonbosco closed this Jan 4, 2018
@jasonbosco jasonbosco deleted the 0.7.2-draft branch January 4, 2018 15:35
@jasonbosco jasonbosco changed the title 0.8.0 draft 0.8.0 draft - Part 1 Jan 4, 2018
@jasonbosco jasonbosco mentioned this pull request Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants