Skip to content
This repository has been archived by the owner on Nov 8, 2021. It is now read-only.

Commit

Permalink
Add csv encoding option and fix a few spelling errors in example
Browse files Browse the repository at this point in the history
  • Loading branch information
villebro committed Nov 6, 2019
1 parent 7d14237 commit 046f826
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 6 deletions.
6 changes: 3 additions & 3 deletions example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,12 @@ echo "select * from fact_customer_dq;" | sqlite3 --column --header target.db
|report_date|customer_id|rowid|source|priority|category|column_name|message|
| --- | --- | --- | --- | --- | --- | --- | --- |
2019-06-30|Terminator|2019-11-06 05:59:52.380884|source|high|incorrect|blood_group|Invalid blood group: Liquid Metal
2019-06-30|Terminator|2019-11-06 05:59:52.380825|transform|medium|missing|age|Age is undefined due to undefined
2019-06-30|Terminator|2019-11-06 05:59:52.380825|transform|medium|missing|age|Age is undefined due to undefined birthdate
2019-06-30|Terminator|2019-11-06 05:59:52.380767|source|high|incorrect|birthdate|Birthdate in future: 2095-01-01
2019-06-30|Peter Impossible|2019-11-06 05:59:52.380575|source|high|incorrect|blood_group|Invalid blood group: X+
2019-06-30|Peter Impossible|2019-11-06 05:59:52.380516|transform|medium|missing|age|Age is undefined due to undefined's blood group table
2019-06-30|Peter Impossible|2019-11-06 05:59:52.380516|transform|medium|missing|age|Age is undefined due to undefined birthdate
2019-06-30|Peter Impossible|2019-11-06 05:59:52.380459|source|high|incorrect|birthdate|Cannot parse birthdate: 1980-13-01
2019-06-30|Mary Null|2019-11-06 05:59:52.380341|source|medium|missing|blood_group|Blood group undefined in customer's blood group table
2019-06-30|Mary Null|2019-11-06 05:59:52.380341|source|medium|missing|blood_group|Blood group undefined in customer blood group table
2019-06-30|Mary Null|2019-11-06 05:59:52.380280|transform|medium|missing|age|Age is undefined due to undefined birthdate
2019-06-30|Mary Null|2019-11-06 05:59:52.380219|source|medium|missing|birthdate|Missing birthdate
2019-06-30|John Connor|2019-11-06 05:59:52.378454|transform|medium|missing|age|Age is undefined due to undefined birthdate
Expand Down
2 changes: 1 addition & 1 deletion example/tasks/fact_customer_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def transform(self) -> None:
source=dq.Source.SOURCE,
priority=dq.Priority.MEDIUM,
category=dq.Category.MISSING,
message="Blood group undefined in customer's blood group table"
message="Blood group undefined in customer blood group table"
)
elif not valid_blood_group:
row.log_dq(
Expand Down
7 changes: 5 additions & 2 deletions sqltask/sources/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,23 @@ class CsvRowSource(BaseRowSource):
def __init__(self,
file_path: str,
name: Optional[str] = None,
delimiter: str = ","):
delimiter: str = ",",
encoding: Optional[str] = None):
"""
:param name: name of data source.
:param file_path: path to the csv file.
:param delimiter: csv file delimiter.
:param encoding: Character encoding of csv file.
"""

super().__init__(name)
self.file_path = file_path
self.delimiter = delimiter
self.encoding = encoding

# populate column names
self.columns: List[str] = []
with open(self.file_path, newline="") as csvfile:
with open(self.file_path, newline="", encoding=encoding) as csvfile:
csvreader = csv.reader(csvfile, delimiter=self.delimiter)
row = next(csvreader)
for column in row:
Expand Down

0 comments on commit 046f826

Please sign in to comment.