Skip to content

Etn/fix schedule name merge bug#34

Merged
ed-nykaza merged 2 commits into
masterfrom
etn/fix-scheduleName-merge-bug
Oct 29, 2018
Merged

Etn/fix schedule name merge bug#34
ed-nykaza merged 2 commits into
masterfrom
etn/fix-scheduleName-merge-bug

Conversation

@ed-nykaza
Copy link
Copy Markdown
Contributor

@jameno please review this bug fix when you get a chance. cc: @pazaan, I assume you will want to be aware of this.

this fixes as bug where the scheduleNames were being appended to data types that did not have that data, because the merge was happening on "time" instead of "id." In other words, if the another data type happened at the same "time" as the scheduleName, then the schedule data was appended to that row of data as well.
@ed-nykaza ed-nykaza requested a review from jameno October 29, 2018 15:27
Copy link
Copy Markdown
Contributor

@jameno jameno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a suggestion to use the .csv format for retrieving column heads instead of loading in the .json files. It is not a necessary change to complete this pull request.

# drop and reattach the new data
df = df.drop(columns=scheduleName)
df = pd.merge(df, scheduleNameDataFrame.loc[:, ["time", scheduleName]], how="left", on="time")
df = pd.merge(df, scheduleNameDataFrame.loc[:, ["id", scheduleName]], how="left", on="id")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

fileSize = os.stat(jsonFileName).st_size
if fileSize > 1000:
i = i + 1
data = td.load.load_json(jsonFileName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file ran on a csv version of the files instead, all columns heads will be in the first line, and can be read directly with:

# Read the first line of file
    with open(csvFileName, 'r') as f:
        header = f.readline()

This would make it easier to loop through files without having to load in the entire JSON file each time.

@ed-nykaza ed-nykaza merged commit d08970f into master Oct 29, 2018
@ed-nykaza ed-nykaza deleted the etn/fix-scheduleName-merge-bug branch October 29, 2018 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants