-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass custom delimiters, dictionary and non-dictionary schemas #3
Comments
Hey @kavirajk, Right, there’s currently no easy way to specify them. Full schema support is in our open-sourcing pipeline. We are finishing up optimization & testing. Stay tuned! |
Thanks @kirkrodrigues for the information!. So basically, now clp-core uses kinda static delimiter and variables check as done here and here respectively. Correct? I also tested with your hadoop dataset. The compression ratio is very impressive!. Great work!. Looking forward for the full schema support! |
Yup, that's correct. Nice, thanks for trying it out! We'll post here when full schema support is open-sourced. |
Hi @kirkrodrigues,
|
Currently, we have a few schemas implemented implicitly (i.e., the logic is not implemented as a regular expression but as a set of conditions) in the code. Roughly, the logic works as follows:
Encoding a query is a little bit more complicated since we need to handle wildcards (admittedly, the logic is a bit messy). To see how it works, I would start from We are working hard to try and get an easy-to-use version of schema support open-sourced so that the logic won't be so complicated. I hoped we would be done by now, but development always takes longer than we expect. |
Thank you for your response @kirkrodrigues. Good to hear that development is in progress. |
According to the paper, we can pass following configs for CLP.
But, AFAIU, there is no way to pass these for
clg
andclp
now.Can you help me if I miss anything? Thanks
The text was updated successfully, but these errors were encountered: