# Getting Started with socstatspy

Welcome to **socstatspy** - a Python wrapper for Socialstyrelsen's Statistics Database API!

This interactive tutorial will guide you through:
1. Basic setup and initialization
2. Exploring available data
3. Using the new list-based filter syntax
4. Fetching and analyzing data
5. Working with enriched DataFrames
6. Real-world examples

Let's get started! üöÄ

## 1. Installation and Setup

First, make sure you have socstatspy installed:

In [6]:
# Import the client
from socstatspy import SocstatsClient

# Create a client instance
client = SocstatsClient(language='sv')

print("‚úÖ socstatspy is ready to use!")

‚úÖ socstatspy is ready to use!


## 2. Exploring Available Data

Let's see what data is available in Socialstyrelsen's database.

In [7]:
# Get subjects as a DataFrame
df_subjects = client.list_subjects(as_dataframe=True)

# Display the first few subjects
df_subjects[['namn', 'text']].head(10)

Unnamed: 0,namn,text
0,amning,Amning
1,diagnoserislutenoppenvard,Diagnoser i sluten och/eller specialiserad √∂pp...
2,diagnoserislutenvard,Diagnoser i sluten v√•rd
3,diagnoserioppenvard,Diagnoser i √∂ppen v√•rd
4,drgstatistikislutenvard,DRG-statistik i sluten v√•rd
5,dodsorsaker,D√∂dsorsaker
6,dodsorsaker_manad,"D√∂dsorsaker, m√•nadsuppgifter"
7,graviditeterforlossningarochnyfodda,"Graviditeter, f√∂rlossningar och nyf√∂dda"
8,lakemedel,L√§kemedel
9,operationerislutenvard,Operationer i sluten v√•rd


## 3. Understanding Filter Variables

Each subject has different "distribution variables" (f√∂rdelningsvariabler) you can filter by.
Let's explore the 'dodsorsaker' (causes of death) subject.

In [3]:
# Get all distribution variables for 'dodsorsaker'
variables = client.get_subject_variables('dodsorsaker', as_dataframe=True)

variables[['namn', 'text']]

Unnamed: 0,namn,text
0,region,Region
1,alder,√Ölder
2,kon,K√∂n
3,matt,M√•tt
4,ar,√Ör
5,diagnos,Diagnos


### Let's explore measures

In [3]:
# Get available years
measures = client.get_variable_values('dodsorsaker',
                                   variable='matt',
                                   as_dataframe=True)

# Print latest available years
measures

Unnamed: 0,id,text
0,1,Antal d√∂da
1,2,Antal d√∂da per 100 000


### Let's explore available years

In [4]:
# Get available years
years = client.get_variable_values('dodsorsaker',
                                   variable='ar',
                                   as_dataframe=True)

# Print latest available years
years.tail(10)

Unnamed: 0,id,text
18,2015,2015
19,2016,2016
20,2017,2017
21,2018,2018
22,2019,2019
23,2020,2020
24,2021,2021
25,2022,2022
26,2023,2023
27,2024,2024


### And available regions

In [5]:
# Get available years
regions = client.get_variable_values('dodsorsaker',
                                     variable='region',
                                     as_dataframe=True)

# Print available regions
regions

Unnamed: 0,id,kod,text
0,0,0,Riket
1,1,1,Stockholms l√§n
2,3,3,Uppsala l√§n
3,4,4,S√∂dermanlands l√§n
4,5,5,√ñsterg√∂tlands l√§n
5,6,6,J√∂nk√∂pings l√§n
6,7,7,Kronobergs l√§n
7,8,8,Kalmar l√§n
8,9,9,Gotlands l√§n
9,10,10,Blekinge l√§n


## 4. Get data as dataframe ‚ú®

You can build your query bi filtering based variable **ids**.

socstatspy supports three ways to specify filter values:

1. **Single values**: `ar=2020`
2. **Lists**: `ar=[2020, 2021, 2022]`
3. **Ranges**: `ar=range(2018, 2024)`

Let's try each one!

### Example 1: Single Value

In [None]:
# Get data for a single year
df_single = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,          # Measure ID 1 (In this case "Number of deaths")
    ar=2020,         # Single year
    region=1,        # Stockholm    
    max_pages=1      # Limit to 1 page for this example
)

print(f"üìä Fetched {len(df_single)} records for year 2020")
df_single.tail()

INFO:socstatspy.client:Reached maximum page limit of 1
INFO:socstatspy.client:Fetched 5000 total records across 1 pages


üìä Fetched 5000 records for year 2020


Unnamed: 0,diagnosId,regionId,alderId,konId,mattId,ar,varde
4995,R99,1,15,2,1,2020,20
4996,R99,1,16,2,1,2020,19
4997,R99,1,17,2,1,2020,28
4998,R99,1,18,2,1,2020,20
4999,R99,1,19,2,1,2020,12


### Example 2: List of Values (RECOMMENDED)

In [14]:
# Get data for multiple years using a list
df_list = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,
    ar=[2020, 2021, 2022],  # ‚úÖ List of years
    region=['1','12'],          # ‚úÖ List of regions (Stockholm and Sk√•ne)
    max_pages=2
)

print(f"üìä Fetched {len(df_list)} records for years 2020-2022")
df_list.tail()

INFO:socstatspy.client:Fetching page 2...
INFO:socstatspy.client:Reached maximum page limit of 2
INFO:socstatspy.client:Fetched 10000 total records across 2 pages


üìä Fetched 10000 records for years 2020-2022


Unnamed: 0,diagnosId,regionId,alderId,konId,mattId,ar,varde
9995,E10,12,15,1,1,2020,2
9996,E10,12,16,1,1,2020,1
9997,E10,12,17,1,1,2020,2
9998,E10,12,18,1,1,2020,1
9999,E11,12,12,1,1,2020,1


### Example 3: Range (BEST for consecutive values!)

In [12]:
# Get data for consecutive years using range()
df_range = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,
    ar=range(2017, 2021),  # ‚úÖ Generates: 2017, 2018, 2019, 2020
    region=1,              # Stockholm
    max_pages=8
)

print(f"üìä Fetched {len(df_range)} records for years 2017-2020")
print(f"\nYears in data: {sorted(df_range['ar'].unique())}")
df_range.tail()

INFO:socstatspy.client:Fetching page 2...
INFO:socstatspy.client:Fetching page 3...
INFO:socstatspy.client:Fetching page 4...
INFO:socstatspy.client:Fetching page 5...
INFO:socstatspy.client:Fetching page 6...
INFO:socstatspy.client:Fetching page 7...
INFO:socstatspy.client:Fetched 34682 total records across 7 pages


üìä Fetched 34682 records for years 2017-2020

Years in data: [np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020)]


Unnamed: 0,diagnosId,regionId,alderId,konId,mattId,ar,varde
34677,Y86,1,13,3,1,2020,1
34678,Y86,1,16,3,1,2020,3
34679,Y86,1,18,3,1,2020,1
34680,Y86,1,19,3,1,2020,2
34681,Y88,1,18,3,1,2020,2


## 5. Enriched DataFrames with Labels

Enriched DataFrames include human-readable labels for all ID columns!

In [42]:
# Get enriched data with labels
df_enriched = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,
    region=['1', '3'],     # Stockholm and Uppsala
    ar=2020,
    max_pages=1,
    include_metadata=True   # Set this parameter to True to get labels
)

print(f"üìä Fetched {len(df_enriched)} enriched records")
print(f"\nColumns: {df_enriched.columns.tolist()}")
print("\nNotice the new '_label' columns! üëá")
df_enriched[['diagnosId', 'diagnos_label', 'regionId', 'region_label', 'ar', 'varde']].head()

INFO:socstatspy.client:Reached maximum page limit of 1
INFO:socstatspy.client:Fetched 5000 total records across 1 pages


üìä Fetched 5000 enriched records

Columns: ['diagnosId', 'regionId', 'alderId', 'konId', 'mattId', 'ar', 'varde', 'ar_label', 'diagnos_label', 'region_label', 'alder_label', 'kon_label', 'matt_label']

Notice the new '_label' columns! üëá


Unnamed: 0,diagnosId,diagnos_label,regionId,region_label,ar,varde
0,1,Vissa infektionssjukdomar och parasitsjukdomar,1,Stockholms l√§n,2020,1
1,1,Vissa infektionssjukdomar och parasitsjukdomar,1,Stockholms l√§n,2020,1
2,1,Vissa infektionssjukdomar och parasitsjukdomar,1,Stockholms l√§n,2020,2
3,1,Vissa infektionssjukdomar och parasitsjukdomar,1,Stockholms l√§n,2020,3
4,1,Vissa infektionssjukdomar och parasitsjukdomar,1,Stockholms l√§n,2020,5


## 6. Searching for Diagnoses

You can search for specific diagnoses and use the results in your filters!

In [28]:
# Search for heart-related diagnoses
heart_diagnoses = client.get_variable_values('dodsorsaker',
                                             variable='diagnos',
                                             text_filter='hj√§rt')

print(f"üíì Found {len(heart_diagnoses)} heart-related diagnoses\n")
print("First 5 results:")
for diag in heart_diagnoses[:5]:
    print(f"‚Ä¢ {diag['id']:8s} - {diag['text']}")

üíì Found 33 heart-related diagnoses

First 5 results:
‚Ä¢ C38      - Malign tum√∂r i hj√§rtat, mediastinum (lungmellanrummet) och lungs√§cken
‚Ä¢ I00      - Akut reumatisk feber utan uppgift om hj√§rtsjukdom
‚Ä¢ I01      - Akut reumatisk feber med hj√§rtsjukdom
‚Ä¢ 0902     - Kroniska reumatiska hj√§rtsjukdomar
‚Ä¢ I09      - Andra reumatiska hj√§rtsjukdomar


### Now use those diagnosis codes in a filter!

In [46]:
# Extract the codes from search results
heart_ids = [d['id'] for d in heart_diagnoses]
print(f"Using ids: {heart_ids}\n")

# Get data for these specific diagnoses
df_heart = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,
    diagnos=heart_ids,    # ‚úÖ List from search results!
    ar=2020,
    region=1,              # Stockholm
    include_metadata=True
)

print(f"üìä Fetched {len(df_heart)} records for heart-related diagnoses")
df_heart[['diagnosId', 'diagnos_label', 'regionId', 'region_label', 'alder_label', 'ar', 'varde']].head(n=10)

Using ids: ['C38', 'I00', 'I01', '0902', 'I09', 'I11', 'I13', '0904', 'I21', 'I22', 'I23', 'I24', 'I25', '0906', 'I30', 'I31', 'I32', 'I33', 'I38', 'I39', 'I42', 'I43', 'I46', 'I49', 'I50', 'I51', 'I52', 'Q20', 'Q21', 'Q24', 'R00', 'R01', 'Y52']

üìä Fetched 411 records for heart-related diagnoses


Unnamed: 0,diagnosId,diagnos_label,regionId,region_label,alder_label,ar,varde
0,902,Kroniska reumatiska hj√§rtsjukdomar,1,Stockholms l√§n,30-34,2020,1
1,902,Kroniska reumatiska hj√§rtsjukdomar,1,Stockholms l√§n,60-64,2020,1
2,902,Kroniska reumatiska hj√§rtsjukdomar,1,Stockholms l√§n,75-79,2020,1
3,902,Kroniska reumatiska hj√§rtsjukdomar,1,Stockholms l√§n,85-89,2020,3
4,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,15-19,2020,1
5,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,40-44,2020,1
6,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,45-49,2020,5
7,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,50-54,2020,19
8,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,55-59,2020,39
9,904,Ischemiska hj√§rtsjukdomar (sjukdomar orsakade ...,1,Stockholms l√§n,60-64,2020,58


## 7. Working with Age Groups

Analyze specific age demographics using range().

In [20]:
# Get all age groups first
age_groups = client.get_variable_values('dodsorsaker', 'alder')

print(f"üìä Total age groups: {len(age_groups)}\n")
print("First 10 age groups:")
for age in age_groups[:10]:
    print(f"‚Ä¢ ID {age['id']:3d}: {age['text']}")

üìä Total age groups: 20

First 10 age groups:
‚Ä¢ ID   1: 0-4
‚Ä¢ ID   2: 5-9
‚Ä¢ ID   3: 10-14
‚Ä¢ ID   4: 15-19
‚Ä¢ ID   5: 20-24
‚Ä¢ ID   6: 25-29
‚Ä¢ ID   7: 30-34
‚Ä¢ ID   8: 35-39
‚Ä¢ ID   9: 40-44
‚Ä¢ ID  10: 45-49


In [8]:
# Analyze working age population (example: age groups 5-15)
df_age = client.get_data_as_dataframe(
    subject='dodsorsaker',
    matt=1,
    alder=list(range(5, 16)),  # ‚úÖ Age groups 5 through 15
    ar=2020,
    max_pages=1,
    include_metadata=True
)

print(f"üìä Fetched {len(df_age)} records for working age groups")
print(f"\nAge group IDs: {sorted(df_age['alderId'].unique())}")
df_age[['diagnosId', 'diagnos_label', 'regionId', 'region_label', 'alder_label', 'ar', 'varde']].head(n=10)

INFO:socstatspy.client:Reached maximum page limit of 1
INFO:socstatspy.client:Fetched 5000 total records across 1 pages
INFO:socstatspy.data_fetcher:Fetching metadata for subject: dodsorsaker


üìä Fetched 5000 records for working age groups

Age group IDs: [np.int64(5), np.int64(6), np.int64(7), np.int64(8), np.int64(9), np.int64(10), np.int64(11), np.int64(12), np.int64(13), np.int64(14), np.int64(15)]


Unnamed: 0,diagnosId,diagnos_label,regionId,region_label,alder_label,ar,varde
0,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,30-34,2020,1
1,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,35-39,2020,2
2,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,40-44,2020,3
3,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,45-49,2020,6
4,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,50-54,2020,11
5,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,55-59,2020,13
6,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,60-64,2020,22
7,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,65-69,2020,54
8,1,Vissa infektionssjukdomar och parasitsjukdomar,0,Riket,70-74,2020,116
9,101,Infektionssjukdomar utg√•ende fr√•n mag-tarmkanalen,0,Riket,35-39,2020,1


## 8. Exploring Other Subjects

Let's try a different subject - breastfeeding statistics!

In [22]:
# Explore the 'amning' (breastfeeding) subject
variables_amning = client.get_subject_variables('amning')

print("üçº Available variables for 'amning' (breastfeeding):\n")
for var in variables_amning:
    print(f"‚Ä¢ {var['namn']:12s} - {var['text']}")

üçº Available variables for 'amning' (breastfeeding):

‚Ä¢ region       - L√§n
‚Ä¢ alder        - Barnets √•lder
‚Ä¢ matt         - M√•tt
‚Ä¢ ar           - Barnets f√∂delse√•r
‚Ä¢ variabel     - Variabel


In [24]:
# Get breastfeeding data
df_breastfeeding = client.get_data_as_dataframe(
    subject='amning',
    matt=1,
    ar=range(2015, 2020),  # ‚úÖ Years 2015-2019
    max_pages=1,
    include_metadata=True
)

print(f"üìä Fetched {len(df_breastfeeding)} breastfeeding records")
df_breastfeeding.head()

INFO:socstatspy.data_fetcher:Fetching metadata for subject: amning


üìä Fetched 1410 breastfeeding records


Unnamed: 0,variabelId,regionId,alderId,mattId,ar,varde,ar_label,variabel_label,region_label,alder_label,matt_label
0,D,0,1,1,2015,19771,2015,Delvis ammade barn,Riket,1 vecka,Antal
1,D,0,2,1,2015,23779,2015,Delvis ammade barn,Riket,2 m√•nader,Antal
2,D,0,4,1,2015,26097,2015,Delvis ammade barn,Riket,4 m√•nader,Antal
3,D,0,6,1,2015,54308,2015,Delvis ammade barn,Riket,6 m√•nader,Antal
4,E,0,1,1,2015,88405,2015,Enbart ammade barn,Riket,1 vecka,Antal


## 9. Advanced: Complex Multi-Filter Query

Combine everything you've learned!

In [26]:
# Complex query with multiple filters
df_complex = client.get_data_as_dataframe(
    subject='diagnoserislutenvard',  # Diagnoses in inpatient care
    matt=6,
    region=['0', '1', '3'],          # ‚úÖ Multiple regions
    alder=list(range(1, 11)),        # ‚úÖ Age groups 1-10
    kon=['1', '2'],                  # ‚úÖ Both genders
    ar=range(2018, 2021),            # ‚úÖ Years 2018-2020
    max_pages=1,
    include_metadata=True
)

print(f"üìä Fetched {len(df_complex)} records with complex filters")
print(f"\nShape: {df_complex.shape}")
print(f"Columns: {len(df_complex.columns)} columns")
df_complex.head()

INFO:socstatspy.client:Reached maximum page limit of 1
INFO:socstatspy.client:Fetched 5000 total records across 1 pages
INFO:socstatspy.data_fetcher:Fetching metadata for subject: diagnoserislutenvard


üìä Fetched 5000 records with complex filters

Shape: (5000, 13)
Columns: 13 columns


Unnamed: 0,diagnosId,regionId,alderId,konId,mattId,ar,varde,ar_label,diagnos_label,region_label,alder_label,kon_label,matt_label
0,1,0,1,1,6,2018,412,2018,V√•rdtillf√§llen som saknar diagnos\r\n,Riket,0-4,M√§n,Antal patienter
1,1,0,1,2,6,2018,372,2018,V√•rdtillf√§llen som saknar diagnos\r\n,Riket,0-4,Kvinnor,Antal patienter
2,1,0,1,1,6,2019,369,2019,V√•rdtillf√§llen som saknar diagnos\r\n,Riket,0-4,M√§n,Antal patienter
3,1,0,1,2,6,2019,297,2019,V√•rdtillf√§llen som saknar diagnos\r\n,Riket,0-4,Kvinnor,Antal patienter
4,1,0,1,1,6,2020,252,2020,V√•rdtillf√§llen som saknar diagnos\r\n,Riket,0-4,M√§n,Antal patienter


## 10. Error Handling

Always good to handle potential errors gracefully.

In [27]:
from socstatspy.exceptions import SocstatsAPIError, SocstatsNotFoundError

try:
    # Try to get data for a non-existent subject
    df_error = client.get_data_as_dataframe(
        subject='nonexistent_subject',
        matt=1
    )
except SocstatsNotFoundError as e:
    print(f"‚ùå Not found: {e}")
except SocstatsAPIError as e:
    print(f"‚ùå API error: {e}")
else:
    print("‚úÖ Request successful!")

‚ùå Not found: Resource not found: https://sdb.socialstyrelsen.se/api/v1/sv/nonexistent_subject/resultat/matt/1


## 11. Next Steps

Now that you've completed this tutorial, you can:

1. ‚úÖ **Explore more subjects** - Try `diagnoserioppenvard`, `lakemedel`, etc.
2. ‚úÖ **Build your own analysis** - Combine what you've learned
3. ‚úÖ **Read the documentation** - Check README.md and Getting_started.ipynb
4. ‚úÖ **Experiment** - The API is flexible and powerful!

### Useful Resources

- **API Documentation**: https://sdb.socialstyrelsen.se/api
- **Package README**: Complete guide to all features

**Happy analyzing! üìä**