New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vector operations #26
Comments
So this would basically be performing joins on a list of expected values? Won't you have to know the column names for the This seems a complicated way to get around returning a single record and performing a check. I don't fully understand why is is not convenient to return a single aggregate record (or a single raw record) for a test. I guess I could see some usefulness in the following scenario: Table Still, all this could be done with the previous, setup. The tests would be longer but simpler. It really falls under the simple but verbose vs complex but terse. I would side with simpler, but if someone really wants it I don't particularly mind. |
Yeah, The problem is not just that you want to do It could all still be done but it just gets really verbose, prone to errors and hard to understand. At the end of the day, if validatar is meant to simplify your functional testing, I am of the opinion that taking on some of the annoyance and making it somewhat simpler to express, read and share could be really valuable to users. I'll give it a shot and see if I can't keep the implementation simple. Thanks for the feedback! |
It is not convenient to boil your test data into a single aggregate. If you have N dimensions and M metrics, in the Hive execution engine, you have write a query for each of the M metrics and in each query, produce the metric for N dimensions boiled down to single row.
It is far more user-friendly if the user could provide a CSV representation of the expected data:
This could be a datasource that just reads a CSV file and makes a tabular format of it as as any datasource currently does in Validatar.
And in their asserts (assuming their expected data is in a query called R and the data being tested is in a query called Q), simply do :
The first assert checks that for all rows where R.Dim1 = Q.Dim1, Q.Met1 is less than R.Met1. The second one asserts that Q.Met2 is within R.Met2 by 2% for all rows where Q.Dim1 = R.Dim1 or R.Dim1 is N3.
I thought about how we implement this and it seems relatively easy. As for parsing the new assertion syntax, it's same as our Grammar right now, just separated with a new 'on' keyword -> we just need to add a new grammar level. We will need to relax our assertion framework which is currently forcing everything to one row.
If you don't provide an 'on' keyword, we will behave as we do now and force to one row. This has the nice benefit that nothing changes for existing test files already written.
The motivation for this comes from some users who contacted concerning how they would keep their dimensions and metrics in check as they grow over time.
Thoughts? Comments?
The text was updated successfully, but these errors were encountered: