-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xls2csv #1480
xls2csv #1480
Conversation
e6c54e6
to
0bd0424
Compare
retriever/lib/engine_tools.py
Outdated
dire = book.rstrip(".xlsx") | ||
if not os.path.exists(dire): | ||
os.makedirs(dire) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this else
required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using it just to remove path already exists error, I'll remove it by adding exist_ok=True
retriever/lib/engine_tools.py
Outdated
pass | ||
os.chdir(dire) | ||
res = len(workbook.sheet_names()) | ||
for sheet in range(0, res): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can omit the 0
and just use range(res)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
retriever/lib/engine_tools.py
Outdated
for index in range(worksheet.nrows): | ||
df.loc[len(df)] = worksheet.row_values(1) | ||
table_name = workbook.sheet_names() | ||
df.to_csv(table_name[sheet] + '.csv', index_label='index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make use of python f-strings
here.
retriever/lib/engine_tools.py
Outdated
df.loc[len(df)] = worksheet.row_values(1) | ||
table_name = workbook.sheet_names() | ||
df.to_csv(table_name[sheet] + '.csv', index_label='index') | ||
os.chdir("..") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a cleaner way to do this? I am not sure if this will always be compatible across different OS - it maybe. If we can pass the absolute path in os.chdir()
somehow that would be great.
@ashishpriyadarshiCIC Any updates on this? |
@@ -162,6 +164,21 @@ def xml2csv(input_file, outputfile=None, header_values=None, row_tag="row"): | |||
return outputfile | |||
|
|||
|
|||
def xlsxcsv(book): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should be xslx2csv
to be consistent with other functions. Also, worth adding some comments to the code here.
Hi @apoorvaeternity, I just shifted all the work on Xlsx to CSV conversion function to #1506 |
Ref: #1506 |
xls to CSV conversion function