In [1]:
import drms
import pandas as pd



2024-09-26 20:12:22 - numexpr.utils - INFO: NumExpr defaulting to 8 threads.


The code below that obtains metadata is based on (1), which documents the `drms` package.

- (1) https://docs.sunpy.org/projects/drms/en/stable/
 
Some other useful references are listed below. (2) has several code examples. (3) has lots of information about the data produced by the Solar Dynamics Observatory (SDO), how it is organized, how to access it, and how to process it. Section 4.2.2 ("Selecting Records") of (3) explains how to construct the queries that are used below. (4) has general information about the SHARP parameters; (5) gives the meanings of the bits of the QUALITY keyword. (6) and (7) are papers on the pipeline that creates data from the SDO's Helioseismic and Magnetic Imager (HMI), including the SHARP parameter data.

- (2) https://github.com/sunpy/drms/tree/main/examples
- (3) https://www.lmsal.com/sdodocs/doc/dcur/SDOD0060.zip/zip/entry/
- (4) http://jsoc.stanford.edu/doc/data/hmi/sharp/sharp.htm
- (5) http://jsoc.stanford.edu/jsocwiki/Lev1qualBits
- (6) https://doi.org/10.1007/s11207-014-0529-3
- (7) https://doi.org/10.1007/s11207-014-0516-8

In [2]:
cli = drms.Client()

There are several SHARP parameter series; `cea` series correct for the distortion that occurs when representing the 3D Sun in 2D, `dconS` series correct for scattered light, and `nrt` series contain near-real time data. We use `cea` series on the suggestion of Ward Manchester. For purposes of comparison, we also use `nrt` data; use of definitive data is recommended on page 3565 of (6). All available data series can be viewed using the `series()` method, which lists data series matching a given regular expression. For example, we can display all the SHARP parameter data series:

In [3]:
cli.series(regex="hmi\\.sharp")

['hmi.sharp_720s',
 'hmi.sharp_720s_dconS',
 'hmi.sharp_720s_nrt',
 'hmi.sharp_cea_720s',
 'hmi.sharp_cea_720s_dconS',
 'hmi.sharp_cea_720s_nrt']

The `info()` method of the `Client` class creates an object with information about a data series. For example, we can verify that the primary keys of the series are `HARPNUM` and `T_REC` as follows:

In [4]:
cli.info("hmi.sharp_cea_720s").primekeys

['HARPNUM', 'T_REC']

We next get all the keywords, their descriptions, and their data types for the definitive and `nrt` `cea` series:

In [5]:
keywords_cea = cli.info("hmi.sharp_cea_720s").keywords
keywords_cea_nrt = cli.info("hmi.sharp_cea_720s_nrt").keywords

In both data frames, each entry in the `linkinfo` column is `None`.

In [6]:
print(
    keywords_cea["linkinfo"].apply(lambda x: x is None).all(),
    keywords_cea_nrt["linkinfo"].apply(lambda x: x is None).all()
)

True True


Get rid of the `linkinfo` columns.

In [7]:
keywords_cea.drop("linkinfo", axis=1, inplace=True)
keywords_cea_nrt.drop("linkinfo", axis=1, inplace=True)

Identify the keywords that are in both data frames; of those, identify the ones with different information in the two data frames.

In [9]:
common_keywords = keywords_cea.index.intersection(keywords_cea_nrt.index)
keywords_cea_ = keywords_cea.loc[common_keywords]
keywords_cea_nrt_ = keywords_cea_nrt.loc[common_keywords]
comparison = keywords_cea_.eq(keywords_cea_nrt_)
keywords_w_diffs = comparison[~comparison.all(axis=1)].index

Insert columns with the series names to make comparison easier.

In [10]:
keywords_cea.insert(0, "series", "cea")
keywords_cea_nrt.insert(0, "series", "cea_nrt")

There are few keywords with differences, and those differences are minor, though some seem strange. For example, the definitions of `INVVLAVE` are different.

In [11]:
pd.concat([
    keywords_cea.loc[keywords_w_diffs],
    keywords_cea_nrt.loc[keywords_w_diffs]
])


Unnamed: 0_level_0,series,type,recscope,defval,units,note,is_time,is_integer,is_real,is_numeric
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
WCSNAME,cea,string,constant,Carrington Heliographic,none,WCS system name,False,False,False,False
INVVLAVE,cea,double,variable,,Maxwell,[USFLUXL] Total unsigned flux,False,False,True,True
INVBLAVE,cea,double,variable,,Gauss/Mm,[MEANGBL] Mean value of the line-of-sight fiel...,False,False,True,True
INVNPRCS,cea,int,variable,-1,number,[CMASKL] Number of pixels that contributed to ...,False,True,False,True
WCSNAME,cea_nrt,string,constant,Helioprojective-cartesian,none,WCS system name,False,False,False,False
INVVLAVE,cea_nrt,double,variable,,cm/s,avarage of inverted V_los over processed pixels,False,False,True,True
INVBLAVE,cea_nrt,double,variable,,gauss,avarage of inverted B_los over processed pixels,False,False,True,True
INVNPRCS,cea_nrt,int,variable,-1,none,Numer of pixels processed,False,True,False,True


Every keyword in the `nrt` `cea` data is also in the `cea` data.

In [12]:
keywords_cea_nrt.index.difference(keywords_cea.index)

Index([], dtype='object', name='name')

We examine the keywords that are in both series.

In [13]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
keywords_cea_


Unnamed: 0_level_0,type,recscope,defval,units,note,is_time,is_integer,is_real,is_numeric
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
cparms_sg000,string,variable,compress Rice,none,,False,False,False,False
magnetogram_bzero,double,variable,0,none,,False,False,True,True
magnetogram_bscale,double,variable,0.1,none,,False,False,True,True
cparms_sg001,string,variable,,none,,False,False,False,False
bitmap_bzero,double,variable,0,none,,False,False,True,True
bitmap_bscale,double,variable,1,none,,False,False,True,True
cparms_sg002,string,variable,compress Rice,none,,False,False,False,False
Dopplergram_bzero,double,variable,0,none,,False,False,True,True
Dopplergram_bscale,double,variable,0.5,none,,False,False,True,True
cparms_sg003,string,variable,compress Rice,none,,False,False,False,False


Based on the information above, we shall extract data for the following keywords:

In [14]:
keywords = [
    # SHARP parameters
    "USFLUX", "MEANGAM", "MEANGBT", "MEANGBZ", "MEANGBH", "MEANJZD", "TOTUSJZ", "MEANALP",
    "MEANJZH", "TOTUSJH", "ABSNJZH", "SAVNCPP", "MEANPOT", "TOTPOT", "MEANSHR", "SHRGT45",
    # Carrington quantities
    "CRLN_OBS", "CRLT_OBS", "CAR_ROT",
    # Data quality information
    "QUALITY", "QUAL_S", "QUALLEV1",
    # History and commentary
    "HISTORY", "COMMENT",
    # HARP merger and faintness indicators
    "H_MERGE", "H_FAINT",
    # Patch areas and pixel counts
    "NPIX", "SIZE", "AREA", "NACR", "SIZE_ACR", "AREA_ACR",
    # Patch latitudes and longitudes
    "LAT_MIN", "LON_MIN", "LAT_MAX", "LON_MAX", "LAT_FWT", "LON_FWT",
    "LAT_FWTPOS", "LON_FWTPOS", "LAT_FWTNEG", "LON_FWTNEG",
    # First and last HARP detection times
    "T_FRST1", "T_LAST1",
    # NOAA AR information
    "NOAA_AR", "NOAA_NUM", "NOAA_ARS"
]

We put the selected keywords in a file that the data downloading script will use.

In [15]:
with open("keywords.txt", "w") as file:
    for keyword in keywords:
        file.write(keyword + "\n")
