Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SG: don't fail when unable to OCR #1281

Merged
merged 5 commits into from Apr 3, 2018

Conversation

jarek
Copy link
Collaborator

@jarek jarek commented Mar 31, 2018

Solar production is a very small part of Singapore electricity, so when it can't be retrieved, just log warning and move on.

Add the OCR'd text to log message to possibly aid debugging.

Currently the image is stuck on data from ~12 hours ago. The parser has code to discard too old data, but unfortunately the time 11:44 confuses terract, which reads it as 2018-03-31 1 1 :44. Maybe the regexp could be improved to handle that, but in meanwhile, fall through and log better.

Solar production is a very small part of Singapore electricity, so when
it can't be retrieved, just log warning and move on.

Add the OCR'd text to log message to possibly aid debugging.

Currently OCR is stuck on parsing the datetime "2018-03-31 11:44"
which it reads as "2018-03-31 1 1 :44". Maybe the regexp could be
improved to handle that, but in meanwhile, log better.
@jarek jarek requested a review from systemcatch March 31, 2018 16:29
Copy link
Collaborator

@systemcatch systemcatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful improvements!

@@ -1,11 +1,12 @@
#!/usr/bin/env python3

from collections import defaultdict
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, since this only used once maybe it should be from logging import getLogger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is a style thing (there is no perf difference). My very personal reason for liking logging.getLogger() is because I don't have to scroll up to see if getLogger was imported and from which module, or maybe a function somewhere within this file. To give an extreme example, if we were to do from re import search, it's impossible for someone diving straight into the code to guess what search is without reading all imports.

I don't think we have particular consistency in the codebase on this right now. At the same time I myself did the from collections import defaultdict right above :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current codebase: import logging 9 times including SG (and in all cases it is only used for getLogger), from logging import getLogger 7 times. So if I change it, it'll be a tie. Do we have a style guide? :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought there was a difference, never mind then it doesn't matter. If there was an EM style guide it would simply read "Chaos is good". 😈

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Carbon is bad, chaos is good"

black_white = gray.point(threshold_filter, '1')

text = image_to_string(black_white, lang='eng')

pattern = r'Est. PV Output: (.*)MWac'
val = re.search(pattern, text, re.MULTILINE).group(1)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better handling, we shouldn't throw away data just because solar is not working.

parsers/SG.py Outdated
@@ -216,12 +219,16 @@ def fetch_production(zone_key='SG', session=None, target_datetime=None, logger=N
'hydro': 0
})

source = 'emcsg.com'
if generation_by_type['solar']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea.

Copy link
Collaborator Author

@jarek jarek Mar 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's too clever. When there is no solar production, the data that solar is 0 also comes from EMA...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe both should be included by default?

@corradio corradio requested a review from maxbellec April 1, 2018 09:37
@jarek
Copy link
Collaborator Author

jarek commented Apr 3, 2018

I am on mobile next couple of days, please merge if you're happy with it :)

@systemcatch systemcatch merged commit 7cec43e into electricitymaps:master Apr 3, 2018
@jarek jarek deleted the SG_handle_solar branch April 13, 2018 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants