Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DK real-time data #1747

Merged
merged 13 commits into from Feb 14, 2019

Conversation

Projects
None yet
4 participants
@tmslaine
Copy link
Contributor

commented Jan 19, 2019

Changes Danish production and exchange data sources to https://www.energidataservice.dk/en/group/production-and-consumption

Estimates generation by fuel for the time frame from -2hrs to -5minutes
by linear regression of 5-minute frequency small-scale and large-scale production against
recent reported hourly fuel data. When fuel data becomes available, corrects this estimate.

tmslaine added some commits Jan 19, 2019

Real-time parser for DK-DK1, DK-DK2
Implements 5-min frequency real-time generation and exchange parsers for Danish bidding zones DK-DK1 and DK-DK2.
For each of coal, gas, oil, biomass, unknown and hydro, real-time generation is estimated from recent hourly data by
linear regression, and scaled to hourly values when these become available.
Change DK to real-time values
Change DK-DK-1, DK-DK-2 to use real-time values
Change DK to real-time
Change DK-DK1 and DK-DK2 to use the 5-minute frequency real-time exchange data reported by Energinet

@systemcatch systemcatch requested a review from brunolajoie Jan 21, 2019

@brunolajoie

This comment has been minimized.

Copy link
Collaborator

commented Jan 21, 2019

Thanks @tmslaine I'll test this locally and compare with existing data.
Can you explain me a bit more your last sentence about linear regression?

@tmslaine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 22, 2019

Trying to get real-time production by type is a bit greedy as the 5-minute source does not report this. Instead, you have wind, solar, >100MW production, <100MW production.

For hourly data you have wind, solar, hydro, biomass, waste, oil, gas and coal.
The parser tries to estimate each of hydro, bio(=biomass+(50% of waste)), oil, gas, coal and unknown(=50% of waste) as a weighted sum of >100MW and <100MW production, respectively.

To do this it

  1. fetches production data df from the 5-minute frequency data source for the past N hours, N=8
  2. takes hourly means df_1h of the 5-minute data df
  3. fetches production by type for the past N hours
  4. joins the production by type data with hourly means df_1h by timestamp
  5. for each of the fuels and hydro in the resulting hourly frequency dataframe,
    optimizes in the least squares sense the weights a and b
    in aA +bB = F,
    where
    A:= ">100MW prod.",
    B:="<100MW prod.",
    F:="Production from the fuel in question (or hydro)"
    and applies weights a and b back to the ">100MW prod.", "<100MW prod." of the original 5-minute data df of step 1. to estimate the real-time/5-minute frequency production from the fuel.

In addition, for each of the hours that have fuel data,
the parser scales the sum of ">100MW prod." and "<100MW prod."
in 5-minute data df to sum up to the realized production for each of the fuels and hydro.

If this seems too complicated, even implementing only the exchange part is probably an improvement.

@jarek

This comment has been minimized.

Copy link
Collaborator

commented Jan 23, 2019

IMHO I would lean towards reporting only what we know for sure rather than trying to estimate precision. The estimate would be more useful in analysis code than in data collection code.

To give a dumb example: if we have a region that reports its usage in hourly increments, having a parser estimate wind production for 10:30 by calculating mean of production reported at 10:00 and at 11:00 is not useful IMO

@corradio

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

Hi @tmslaine,

That you for the very extensive work. Our philosophy at electricityMap is indeed to stay as close to the data by limiting the amount of interpolation/estimation.
However, the updated data source for the exchange and production data is certainly welcomed.
What I recommend:

  • using the newest exchange data
  • using the hourly mix data
  • comment out / disable the 5min data, but maybe keep it coded in case we want to switch to it later on

How does that sound?
Furthermore, I don't understand the 50% waste assumption. Can you maybe explain that a bit more, or point to a document that does?

Thanks!

Olivier

@tmslaine

This comment has been minimized.

Copy link
Contributor Author

commented Jan 24, 2019

@corradio
That sounds good, I will make the changes and update the PR.

Dividing waste into 50% renewable, 50% fossil is just a rule of thumb.
For example https://ens.dk/sites/ens.dk/files/Statistik/int.reporting_2016.xls
shows closer to 55%, 45% between MunWaste Renewable and MunWaste Non-renewable.
The historical division of waste into fossil and renewable is available for all EU countries in Eurostat table nrg_105a. If I recall correctly, eMap ENTSO-E parser maps all waste into biomass.

There are still at least two open questions before merging:

  1. Should the production be changed from ENTSO-E to the new source? On a quick look
    the local source has a decimal more but otherwise the values are the same as those reported
    by ENTSO-E Transparency Platform.
  2. Should the mapping of Waste be changed to 100% biomass?

tmslaine added some commits Jan 24, 2019

Remove calculations, improve DK-DK2->SE exchange
Removes estimation of real-time generation by fuel.
Subtracts DK-BHM->SE-SE4 from the DK-DK2->SE-SE4 exchange
as DK-BHM is reported separately in electricityMap.
@corradio

This comment has been minimized.

Copy link
Member

commented Jan 25, 2019

I'd say we follow ENS and add a clear comment/reference in the code.
I'd use your parser for the production data as it is closer to the data source.
What do you think @brunolajoie ?

@brunolajoie

This comment has been minimized.

Copy link
Collaborator

commented Jan 28, 2019

  • I'd say we follow ENS and add a clear comment/reference in the code.
    Let's do this! We can assume safely that the ratio between REN-waste and non-REN waste and should remain approximately constant over within a year. Let's write it down clearly in the code so we we'll be able to update this ratio every year or so

  • I'd use your parser for the production data as it is closer to the data source.
    Agree

@corradio

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

@brunolajoie are we waiting for anything?

tmslaine and others added some commits Feb 2, 2019

Change waste reporting ratio
Report 45% of waste under unknown(fossil share of municipal waste)
and 55% under biomass(renewable share of municipal waste)
Source: https://ens.dk/sites/ens.dk/files/Statistik/int.reporting_2016.xls (visited Jan 24th, 2019)
@brunolajoie

This comment has been minimized.

Copy link
Collaborator

commented Feb 4, 2019

Tested the branch locally.

The production parser

  • goes back to 2017-01
  • our historical dataset currently goes back 2015-01

The exchange parser

  • goes back to 2015-01-01
  • our historical dataset currently goes back 2015-01
  • differs slightly that of stattnet in numbers. Tested now (16 vs. 8 MW) and tested in 2015-05-05 (316 vs 183MW).

Can you guys have another independent check on data consistency before we merge that?

@brunolajoie
Copy link
Collaborator

left a comment

Thanks a lot for the help here @tmslaine!
The closer we are to the data source, the better!

@brunolajoie brunolajoie merged commit d0c39a2 into tmrowco:master Feb 14, 2019

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.