Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature requests (running list) #363

Closed
zehawki opened this issue Aug 27, 2021 · 34 comments
Closed

Feature requests (running list) #363

zehawki opened this issue Aug 27, 2021 · 34 comments

Comments

@zehawki
Copy link

zehawki commented Aug 27, 2021

1. Allow search fields to be inferred

When a collection has been setup, its understood that these are fields to be indexed, and hence to be searched for. If no query_by is sent in the params, then it means that the search should be performed thru all fields.

2. Hide certain fields from being sent in the response

out_of and search_time_ms should not be sent except for the Admin key. I thought a search-only key would automatically hide it, but it still shows up. Then I created a Scoped Search Key, with a "exclude_fields":"out_of", but it still shows up in search results.

3. Control the snippet length

Allow controlling the length of the snippet that is generated. Its currently too short.

4. Control "allowed domains" under CORS

Right now CORS is a single on/off settings. Is it possible to enhance this to support a set of named & wildcarded domains instead, like Django CORS_ORIGIN_WHITELIST

@zehawki
Copy link
Author

zehawki commented Aug 29, 2021

@jasonbosco # 3 above is quite important for us - is there any way to achieve this now without a new release?

@kishorenc
Copy link
Member

@zehawki

You can use the highlight_affix_num_tokens for # 3:

The number of tokens that should surround the highlighted text on each side. Default: 4

The snippet_threshold parameter could also be useful:

Field values under this length will be fully highlighted, instead of showing a snippet of relevant portion. Default: 30

You can see the arguments here: https://typesense.org/docs/0.21.0/api/documents.html#arguments

@zehawki
Copy link
Author

zehawki commented Aug 30, 2021

Yup I saw these. I've read thru pretty much every bit of docs (BTW, superb work on documentation! Its really high quality and comprehensive.)

I tried snippet_threshold already, but the way I understood that feature (and the way I saw it working) didnt help - it did not increase the length of the snippet. I understand this helps only when the value at a key is less than X length, which in my case won't be true since the body of a post would be 100s of words.

TBH, I hadn't understood the other one highlight_affix_num_tokens - I'd been meaning to ask what that does. But now I see one could say that token = words.

@zehawki
Copy link
Author

zehawki commented Aug 30, 2021

Okays highlight_affix_num_tokens works just fine, thanks.

Please consider changing the help text as follows: The number of tokens that should surround the highlighted text on each side. This controls the length of the snippet.

@kishorenc
Copy link
Member

Glad to hear that it works!

Please consider changing the help text as follows

Thank you, I've committed this change and will get published in the next documentation refresh.

But now I see one could say that token = words.

Typesense uses tokens instead of words everywhere because it's a more accurate description of what gets indexed -- as a piece of string can be alpha-numerical in nature (pedantic, I know!).

Request # 1 has been asked previously and we want to support it in future.

Request # 2 is also something I'm open to supporting. Will leave this issue open to track that.

@zehawki
Copy link
Author

zehawki commented Sep 5, 2021

Added a # 4 item in the 1st post...

@zehawki
Copy link
Author

zehawki commented Nov 12, 2021

@kishorenc Please please take # 4 (Control allowed domains under CORS) in your upcoming release.

@kishorenc
Copy link
Member

@zehawki We are in a code freeze for the upcoming release (just ironing out last mile minor issues), but I will be happy to consider for the next one.

It would be great if you can elaborate on your use case for custom CORS domains. Because, this is not a fool-proof security measure since CORS headers can be spoofed by requests sent from outside the browser environment.

@zehawki
Copy link
Author

zehawki commented Dec 2, 2021

Hi @kishorenc

Re use case: we have a number of properties (domains) and 1 server, so we need a way to check whether the request is from one of the "authorized" domains. This is rather a standard use case I would imagine, like the example I gave of Django's CORS_ORIGIN_WHITELIST. Currently I'm sending requests thru a proxy server which does this check, but of course that adds a needless large delay and turns instant search into a joke.

Re security - yes that's well understood, but I'm not sure there is an alternative or? The only way we can handle is via CORS + auth key. Auth keys when used on the front end are exposed, so CORS adds a bit more security.

@kishorenc
Copy link
Member

@zehawki No. 2 (hide certain fields from response) and no.4 (allowed domains for CORS) are available in 0.23.0.rc30 build.

To hide the out_of and search_time_ms just add them to exclude_fields parameter embedded into a scoped API key (so that it can't be tampered by user making explicit requests).

CORS domains are now allowed by just specifying a comma separated string, e.g. --enable-cors=https://example.com,https://example2.com (make sure there is no trailing slash).

@zehawki
Copy link
Author

zehawki commented Feb 21, 2022

Oh man, I love you guys ;-) thank you!

@kishorenc
Copy link
Member

😄 Please try it out and let me know if everything works as expected.

@zehawki
Copy link
Author

zehawki commented Feb 23, 2022

Any plans to have CORS managed via an API?

@kishorenc
Copy link
Member

Yes, it's part of another feature where we intend to make a bunch of run-time configurations configurable by the /config end-point. Some configurations can already be changed via the /config end-point but they are not persisted yet.

@zehawki
Copy link
Author

zehawki commented Mar 10, 2022

@zehawki No. 2 (hide certain fields from response) and no.4 (allowed domains for CORS) are available in 0.23.0.rc30 build.

Hiya, this must be a terribly stupid Q - but where do I get the RC builds (Deb package) from?

@jasonbosco
Copy link
Member

We usually publish RC builds on Docker Hub and as DEB packages.

If you replace the version numbers on the downloads page with the RC version number, you should be able to get the Docker Image or DEB package. If you need RPM / Linux / Mac binaries let me know and we can publish one-off builds.

If you're on Typesense Cloud, we can upgrade your cluster from our side if you email contact at typesense dot org with your cluster ID.

@kishorenc
Copy link
Member

We found and fixed a few edge cases in the CORS domain feature introduced in 0.23.0.rc30 so I recommend using the last few RC builds, e.g. 0.23.0.rc40+.

@zehawki
Copy link
Author

zehawki commented Mar 13, 2022

So I'm testing 0.23.0.rc40. The read side works fine, ie as expected, without the Origin field, I get a 403 Forbidden. Now what about write, say from the TS python client from the server - this is also giving a 403 Forbidden. Is that expected? If so how do I get that to work? OR should CORS check be applied only to Search only API Keys (https://typesense.org/docs/0.22.2/api/api-keys.html#search-only-api-key)

@kishorenc
Copy link
Member

You don't need CORS for writes because you are going to use a API key with write permissions from your backend, and this operation is not going to be exposed to frontend. Regular API keys with permissions that allow write to only a specific collection is sufficient.

@zehawki
Copy link
Author

zehawki commented Mar 14, 2022

You don't need CORS for writes because you are going to use a API key with write permissions from your backend

That's right, that's what I expected. However I AM getting 403 forbidden. This is with the Admin key, and I'm doing an upsert on multiple (https://typesense.org/docs/0.22.2/api/documents.html#action-modes-batch-create-upsert-update).

Works fine as soon as I remove the enable-cors = in the typesense-server.ini, so this seems to be clearly linked to CORS setting, rather than any other reason.

  File "/home/a/mc/.venv/lib/python3.8/site-packages/typesense/documents.py", line 62, in import_
    api_response = self.api_call.post(self._endpoint_path('import'), docs_import, params, as_json=False)
  File "/home/a/mc/.venv/lib/python3.8/site-packages/typesense/api_call.py", line 143, in post
    return self.make_request(requests.post, endpoint, as_json,
  File "/home/a/mc/.venv/lib/python3.8/site-packages/typesense/api_call.py", line 115, in make_request
    raise ApiCall.get_exception(r.status_code)(r.status_code, error_message)
typesense.exceptions.RequestForbidden: [Errno 403] Forbidden.

@kishorenc
Copy link
Member

Can you try sending a Origin header from your Python client when making the request? When CORS domains are enabled, Typesense checks for the Origin header and ensures that the origin is allowed.

@zehawki
Copy link
Author

zehawki commented Mar 14, 2022

Can you try sending a Origin header from your Python client when making the request? When CORS domains are enabled, Typesense checks for the Origin header and ensures that the origin is allowed.

That's essentially what I mentioned in my earlier comment :-) :

  1. You check the Origin field.
  2. I'm using the Typesense python client and I see no way to add an Origin field.

@kishorenc
Copy link
Member

Oh right, sorry. Looks like we have to support passing a custom Origin header in the Python client. But that sounds a bit odd: maybe Typesense should check for CORS for only the GET /search and POST /multi_search end-points.

@zehawki
Copy link
Author

zehawki commented Mar 14, 2022

Yup, also what I mentioned up there ;-p "should CORS check be applied only to Search only API Keys"

@zehawki
Copy link
Author

zehawki commented Mar 14, 2022

I'm no expert on CORS, but here is my understanding: CORS is a directive from the server to the client and its for the client to enforce the directive. Ie a browser can ask "will you support requests from this domain", and the server says "no, bugger off" - now its up to the browser to actually follow this or not. Which is why server to server / Postman to server, etc works just fine since THAT client does not bother to follow the CORS directive.

Having said this, I'm perfectly happy that you are actively checking for the Origin field, and rejecting the request if its not a "valid origin".

Maybe this should not be called a CORS feature, rather just an "Allowed origins" feature.

@kishorenc
Copy link
Member

Got it. I just read up on this further, and as you mention, CORS indeed does not require sending a 403 for a request originating from a non-allowed origin. So if a request's origin does not match the allowed configuration, the server should simply not send a Access-Control-Allow-Origin header in the response, and it's upto the browsers to detect that and enforce the restriction.

I would rather stick to the standard to avoid confusion. I will make this change in the next RC build.

@zehawki
Copy link
Author

zehawki commented Mar 14, 2022

I would rather stick to the standard to avoid confusion. I will make this change in the next RC build.

Please reconsider maybe - I think the current method is better and more "correct" as per the original intent, ie allow only certain domains access. Which is why I was suggesting just calling this an "Allowed origins" feature, with no mention of CORS etc.

But anyhoo - either method is a bit of a joke anyway as far as access control goes since one can just send any Origin field from non browser environments.

@kishorenc
Copy link
Member

After some thinking, we've decided to stick to the CORS spec. Because, otherwise:

a) other users who are familiar with CORS will be totally confused as to why we're returning a 403
b) everyone using Typesense from their backend will have to explicitly set an Origin header

In any case, as you point out, neither method offers any real security beyond browser-bound cross origin requests.

@zehawki
Copy link
Author

zehawki commented Mar 23, 2022

Will you please let me know whenever you release another RC with the above changes...

@kishorenc
Copy link
Member

@zehawki Please check against 0.23.0.rc49.

@zehawki
Copy link
Author

zehawki commented Apr 1, 2022

Thank you, CORS working as expected.

@kishorenc
Copy link
Member

Ran into a limitation of the command line parsing library in using --enable-cors both as a boolean flag and as a string flag for taking a list of CORS domains.

From 0.23.0.rc53 we will be using two separate flags:

  • --enable-cors the existing flag will work as of now, but we are enabling CORS by default
  • --cors-domains=http://localhost:8108 is a new flag for specifying a comma separated list of CORS domains to be whitelisted.

@zehawki
Copy link
Author

zehawki commented May 13, 2022

@kishorenc I dont see this in your release list for 0.23 (https://github.com/typesense/typesense/projects/2#column-17153142). Can you pls confirm if it will (or wont) be there.

@kishorenc
Copy link
Member

It's on the release notes. Since this is a running list I've not added it to the release column :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants