This code is experimental and provided as is. Comments can be emailed to tony.bruguier@gmail.com

You need to download your PG&E data. This is the current instructions: https://www.pge.com/pge_global/common/pdfs/save-energy-money/analyze-your-usage/energy-data-hub/Download-My-Data-User-Guide.pdf

Be sure to use the option "Export usage for a range of days" so that you have hour-by-hour usage. There is a one-year limit so, if you want a longer period, you will have to repeat the process and concatenate the files. Otherwise, be sure to have an overlap with the irradiation data. I have my own usage data here.

In [1]:
import csv
from datetime import (date, datetime, time, timedelta)
import re

In [2]:
usage_filename = 'pge_usage.csv'  # Point to your file that you downloaded.

# The PG&E data takes into account daylight saving time, but the solar
# data (below) does not. So we convert everything to winter time. You might
# have to change the initial value of 'in_winter_time' because I didn't have
# the heart to handle timezone.
in_winter_time = False

prev_dt = None
usage_data = {}
with open(usage_filename, newline='') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',')
    for row in csvreader:
        if len(row) >= 7 and row[0] == 'Electric usage':
            d = date.fromisoformat(row[1])
            t = time.fromisoformat(row[2])
                
            if prev_dt:
                time_delta_hours = (datetime.combine(d, t) - prev_dt).seconds / 3600
                
                if time_delta_hours == 0:
                    if in_winter_time:
                        raise ValueError('Initial value of in_winter_time should have been False')
                    in_winter_time = True
                elif time_delta_hours == 2:
                    if not in_winter_time:
                        raise ValueError('Initial value of in_winter_time should have been True')
                    in_winter_time = False
                else:
                    assert time_delta_hours == 1
            
            u = float(row[4])
            
            dt = datetime.combine(d, t) 
            dt_corrected = dt if in_winter_time else dt - timedelta(seconds=3600)
            
            assert dt_corrected not in usage_data, '%s already inserted' % (dt)
            usage_data[dt_corrected] = u
            
            prev_dt = dt

Now, we can download the solar irradiation data. It is available there: https://nsrdb.nrel.gov/data-sets/download-instructions.html

The currently uses the approximation of using the GHI because I suspect it's more conservative.

There appear to be more fancy measures:
"Photovoltaic system derived data for determining the solar resource and
for modeling the performance of other photovoltaic systems" by Bill Marion and Benjamin Smith.
https://www.osti.gov/pages/servlets/purl/1349802

In [3]:
solar_filename = 'solar_data.csv'  # Point to your file that you downloaded.

ghi_index = -1
solar_data = {}
with open(solar_filename, newline='') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',')
    for row in csvreader:
        if ghi_index == -1:
            ghi_index = row.index('GHI')
        else:
            d = date(int(row[0]), int(row[1]), int(row[2]))
            # We ignore the minutes, and just add every value for a given hour.
            t = time(int(row[3]), 0)
            dt = datetime.combine(d, t)
            
            s = float(row[ghi_index])
            
            solar_data[dt] = solar_data.get(dt, 0.0) + s

# Technically, we should take into account leap years and daylight saving
# time, but the government somehow forgot that February 29 exists.
num_years_solar_data = len(solar_data) / (365.0 * 24.0 * 2.0)

We make another approximation. Vendors typically give us an amount of power that will be generated during a year. So what we do is compute a linearity coefficient between the number of energy that the vendor will give us and the DHI (kWh / m^2). This allows us to estimate the amount of power generated for any day and any time.

In [4]:
power_adverized = 5221  # [kWh / year]
print('Average power adverised: %.0f W' % (power_adverized / (365.0 * 24.0) * 1000.0))

# We assume that the vendor is lying about the energy produced, so if
# you have 0.80 below, it means that the system will only deliver 80%
# of what was advertized
vendor_lying_factor = 0.8 # []

yearly_solar_irradiation = sum(solar_data.values()) / num_years_solar_data  # [kWh / m^2 / year]

irradiation_to_power = power_adverized * vendor_lying_factor / yearly_solar_irradiation  # [m^2]

Average power adverised: 596 W
