Skip to content

Commit

Permalink
Merge branch 'dev' into feature/kypher/optmatch
Browse files Browse the repository at this point in the history
  • Loading branch information
chalypso committed Apr 23, 2021
2 parents fe1ea6d + b13bb73 commit 6f0cbd5
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 33 deletions.
87 changes: 61 additions & 26 deletions docs/curate/validate.md
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,10 @@ Data errors reported: 2
The first data line was flagged because it contained "00" in the day
field, which violates the ISO 8601 specification.

The following error message is sent to stderr. The return status is 1.

Errors detected

### Validate with Verbose Feedback

Sometimes you may wish to get more feedback about what `kgtk validate` is
Expand Down Expand Up @@ -531,6 +535,10 @@ Data lines excluded due to invalid values: 2
Data errors reported: 2
~~~

The following error message is sent to stderr. The return status is 1.

Errors detected

### Validate Only the Header

Validate only the header record, ignoring data records:
Expand Down Expand Up @@ -561,14 +569,10 @@ This generates the following message on standard output:
Error: No header line in file
~~~

This also generates the following message on standard error:
The following error message is sent to stderr. The return status is 1.

Exiting due to error

!!! note
At the present time, the latter error message is not routable
to standard output.

### Supply a Missing Header Line

Validate an empty input file, supplying a header line:
Expand Down Expand Up @@ -604,14 +608,10 @@ This generates the following message on standard output:
Error: No header line to skip
~~~

This also generates the following message on standard error:
The following error message is sent to stderr. The return status is 1.

Exiting due to error

!!! note
At the present time, this latter error message is not routable
to standard output.

### Header Error: Column Name Is Empty

Validate an input file with an empty column name:
Expand All @@ -633,13 +633,10 @@ The following error is reported on standard output:
In input header ' label node2': Column 0 has an empty name in the file header
~~~

The following message appears on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

!!! note
The `Exit requested` message cannot be routed to standard output at the present time.

### Header Error: See All Header Errors

Validate an input file with an empty column name. This will generate an error
Expand Down Expand Up @@ -671,6 +668,9 @@ Data lines passed: 0

Processing continues without exiting.

!!! note
No error message is sent to stderr and the return status is 0.


### Header Error: Column Name Starts with White Space

Expand Down Expand Up @@ -701,6 +701,10 @@ Data lines read: 0
Data lines passed: 0
~~~

The following error message is sent to stderr. The return status is 1.

xxx

### Header Error: Column Name Ends with White Space

Validate an input file where the intended `node1`, `label`, and `node2`
Expand Down Expand Up @@ -730,6 +734,10 @@ Data lines read: 0
Data lines passed: 0
~~~

The following error message is sent to stderr. The return status is 1.

xxx

### Header Error: Column Name Contains Internal White Space

Validate an input file where the intended `node1` and `node2`
Expand All @@ -755,6 +763,10 @@ Data lines read: 0
Data lines passed: 0
~~~

The following error message is sent to stderr. The return status is 1.

xxx

### Header Error: Column Name Contains a Comma (`,`)

Validate an input file where the intended `node1`, `label`, and `node2`
Expand Down Expand Up @@ -784,6 +796,10 @@ Data lines read: 0
Data lines passed: 0
~~~

The following error message is sent to stderr. The return status is 1.

xxx

### Header Error: Column Name Contains a Vertical Bar (`|`)

Validate an input file where the intended `node1`, `label`, and `node2`
Expand All @@ -806,6 +822,10 @@ Data lines read: 0
Data lines passed: 0
~~~

The following error message is sent to stderr. The return status is 1.

xxx

### Header Error: Column Name Is a Duplicate

Validate an input file with two `node1` columns instead of
Expand All @@ -828,7 +848,7 @@ The following error is reported on standard output:
In input header 'node1 label node1 id': Column 2 (node1) is a duplicate of column 0
~~~

The following is reported on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

Expand Down Expand Up @@ -856,7 +876,7 @@ The following error is reported on standard output:
In input header 'col1 col2 col3': Missing required column: id | ID
~~~

The following is reported on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

Expand Down Expand Up @@ -884,7 +904,7 @@ The following error is reported on standard output:
In input header 'col1 col2 col3': Missing required column: node1 | from | subject
~~~

The following is reported on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

Expand Down Expand Up @@ -912,7 +932,7 @@ The following error is reported on standard output:
In input header 'col1 col2 col3': Missing required column: id | ID
~~~

The following is reported on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

Expand Down Expand Up @@ -963,7 +983,7 @@ The following error is reported on standard output:
In input header 'node1 label node2 id from': Ambiguous required column names node1 and from
~~~

The following is reported on standard error:
The following error message is sent to stderr. The return status is 1.

Exit requested

Expand Down Expand Up @@ -1683,10 +1703,10 @@ kgtk cat -i examples/docs/validate-strings.tsv
| node1 | label | node2 |
| -- | -- | -- |
| line1 | invalid | "xxx |
| line2 | valid | "xxx\"yyy" |
| line2 | valid | "xxx\\"yyy" |
| line3 | invalid | "xxx"yyy" |
| line4 | valid | "xxx\\yyy" |
| line5 | valid | "xxx\tyyy" |
| line4 | valid | "xxx\\\\yyy" |
| line5 | valid | "xxx\\tyyy" |

```bash
kgtk validate -i examples/docs/validate-strings.tsv
Expand Down Expand Up @@ -1721,10 +1741,10 @@ kgtk cat -i examples/docs/validate-lax-strings.tsv
| node1 | label | node2 |
| -- | -- | -- |
| line1 | invalid | "xxx |
| line2 | valid | "xxx\"yyy" |
| line2 | valid | "xxx\\"yyy" |
| line3 | valid | "xxx"yyy" |
| line4 | valid | "xxx\\yyy" |
| line5 | valid | "xxx\tyyy" |
| line4 | valid | "xxx\\\\yyy" |
| line5 | valid | "xxx\\tyyy" |


```bash
Expand Down Expand Up @@ -1782,7 +1802,7 @@ kgtk cat -i examples/docs/validate-language-qualified-strings.tsv
| node1 | label | node2 |
| -- | -- | -- |
| line1 | valid | 'abc'@en |
| line2 | valid | 'a\'bc'@en |
| line2 | valid | 'a\\'bc'@en |
| line3 | invalid | 'a'bc'@en |
| line4 | invalid | 'abc'@en-gb |
| line5 | invalid | 'abc'@xxx |
Expand Down Expand Up @@ -2312,6 +2332,10 @@ Data lines excluded due to invalid values: 6
Data errors reported: 6
~~~

The following error message is sent to stderr. The return status is 1.

Errors detected

### Value Check: Change the Minimum Valid Year

Suppose we want to exclude all dates before the year 1000.
Expand Down Expand Up @@ -2362,6 +2386,10 @@ Data lines excluded due to invalid values: 5
Data errors reported: 5
~~~

The following error message is sent to stderr. The return status is 1.

Errors detected

### Value Check: Clamp the Minimum Valid Year

Suppose we want to validate all records, converting any negative
Expand Down Expand Up @@ -2484,6 +2512,10 @@ Data lines excluded due to invalid values: 5
Data errors reported: 5
~~~

The following error message is sent to stderr. The return status is 1.

Errors detected

### Value Check: Changing the Maximum Valid Year

Let's change the maximum valid year to 9999:
Expand Down Expand Up @@ -2524,6 +2556,9 @@ Data lines excluded due to invalid values: 2
Data errors reported: 2
~~~

The following error message is sent to stderr. The return status is 1.

Errors detected

### Value Check: Changing the Maximum Valid Year #2

Expand Down
8 changes: 4 additions & 4 deletions kgtk/cli/import_wikidata.py
Original file line number Diff line number Diff line change
Expand Up @@ -1845,8 +1845,8 @@ def process(self, line):
if len(nrows) > 0 or \
len(erows) > 0 or \
len(qrows) > 0 or \
len(invalid_erows) > 0 or \
len(invalid_qrows) > 0 or \
(invalid_erows is not None and len(invalid_erows) > 0) or \
(invalid_qrows is not None and len(invalid_qrows) > 0) or \
len(description_erows) > 0 or \
len(sitelink_erows) > 0:
if collect_results:
Expand Down Expand Up @@ -1945,11 +1945,11 @@ def process(self, line):
if skip_validation or validate(row, "detailed qual uncollected"):
self.qual_wr.writerow(row)

if invalid_edge_file:
if invalid_edge_file and invalid_erows is not None:
for row in invalid_erows:
self.invalid_edge_wr.writerow(row)

if invalid_qual_file:
if invalid_qual_file and invalid_qrows is not None:
for row in invalid_qrows:
self.invalid_qual_wr.writerow(row)

Expand Down
11 changes: 8 additions & 3 deletions kgtk/cli/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ def run(input_files: KGTKFiles,
print ("Validating from stdin", file=error_file, flush=True)

kr: typing.Optional[KgtkReader] = None
error_return: int = 1
try:
kr = KgtkReader.open(kgtk_file,
error_file=error_file,
Expand Down Expand Up @@ -140,8 +141,12 @@ def run(input_files: KGTKFiles,
if report_summary or verbose:
print("\n====================================================", file=error_file, flush=True)
kr.report_summary()

kr.close()

return 0
if kr.data_errors_reported == 0:
error_return = 0
kr.close()

if error_return == 0:
return 0
else:
raise KGTKException("Errors detected")

0 comments on commit 6f0cbd5

Please sign in to comment.