# PxStatsPy (beta) - the Python wrapper for The PxAPI 2.0

A comprehensive Python wrapper for the Statistics Sweden PxAPI-2 REST API, providing easy access to Swedish statistical data.

## Setup and configuration
Configure the API by providing the base url for the PxAPI 2.0 beta version and set up the client. Get the API configuration by calling `get_config()`.

In [1]:
#Import pxstatspy
from pxstatspy import PxAPI, PxAPIConfig, OutputFormat, OutputFormatParam

# Set up client
config = PxAPIConfig(
    base_url="https://api.scb.se/OV0104/v2beta/api/v2",     # Set base url for API
    language="en"  # or "sv" for Swedish
)
client = PxAPI(config)

In [5]:
# Get API configuration
client.get_config()

{'apiVersion': '2.0.0-beta.10',
 'appVersion': '1.0.0',
 'languages': [{'id': 'sv', 'label': 'Svenska'},
  {'id': 'en', 'label': 'English'}],
 'defaultLanguage': 'en',
 'maxDataCells': 150000,
 'maxCallsPerTimeWindow': 30,
 'timeWindow': 10,
 'license': 'https://creativecommons.org/share-your-work/public-domain/cc0/',
 'sourceReferences': [{'language': 'sv', 'text': 'Källa: SCB'},
  {'language': 'en', 'text': 'Source: Statistics Sweden'}],
 'defaultMetadataFormat': 0,
 'defaultDataFormat': 'px',
 'dataFormats': ['json-stat2',
  'csv',
  'px',
  'xlsx',
  'html',
  'json-px',
  'parquet'],
 'features': [{'id': 'CORS', 'params': [{'key': 'enabled', 'value': 'True'}]}]}

## Search and navigate database
The PxStatsPy wrapper offers multiple ways to search for tables in the database.

### Search tables in database
Use the `find_tables()` or `find_tables_as_df()` functions to search for tables matching a keyword. Filter your search with the past_days parameter to get recently updated tables. 

These functions are useful for setting up automated data pipelines as you can get ids for updated tables and look up against your reference table before reading in new data to your application or database.

In [None]:
# Search for specific tables with query
tables = client.find_tables(
    query="Population",
    page_number=1,  # Use tho view other pages
    past_days=5,    # Updated in the 5 past days
    page_size=3,    # Number of results per page
    display=True    # True returns pretty results, False returns raw API response
)


Found 42 tables matching 'Population' updated in the last 5 days (Page 1 of 14)

ID         Updated      First    Last     Title
----------------------------------------------------------------------
TAB1625    2025-02-21  2000M01 2024M12 Population statistics by region and sex. Month 2000M01-2024M12
           Variables: region, population changes, sex, observations, month

TAB5169    2025-02-21  2000    2024    Population statistics per quarter by region and sex. Year 2000-2024
           Variables: region, population changes, period, sex, observations, year

TAB4365    2025-02-21  1749    2024    Population and population changes in Sweden by sex. Year 1749-2024
           Variables: sex, observations, year

Page 1 of 14. Use page_number parameter to view other pages.


#### Get results as a Pandas DataFrame
Use the `find_tables_as_dataframe()` function to retrieve table search results as a Pandas DataFrame.

In [2]:
# Search for specific tables with query and return as Pandas DataFrame
tables_df = client.find_tables_as_dataframe(
    query="CPI",
    past_days=15,
    all_pages=True
)

tables_df


Found 7 tables matching 'CPI' updated in the last 15 days


Unnamed: 0,id,label,updated,first_period,last_period,variables,category
0,TAB3673,"Consumer Price Index (CPI), monthly and annual...",2025-02-18 07:00:00+00:00,2014M01,2025M01,"[economic indicator, observations, month]",public
1,TAB5737,"Consumer Price Index (CPI), total 1980=100. Mo...",2025-02-18 07:00:00+00:00,1980M01,2025M01,"[observations, month]",public
2,TAB5921,Consumer Price Index at constant taxes (CPI-CT...,2025-02-18 07:00:00+00:00,1980M01,2025M01,"[observations, month]",public
3,TAB5512,Consumer Price Index (CPI) by product group (C...,2025-02-18 07:00:00+00:00,1980M01,2025M01,"[Product group, observations, month]",public
4,TAB674,"Consumer Price Index (CPI) Year, Weights and I...",2025-02-18 07:00:00+00:00,2005,2023,"[Product group, observations, year]",public
5,TAB5160,"Consumer Price Index (CPI) by product group, 1...",2025-02-18 07:00:00+00:00,1980M01,2025M01,"[product group, observations, month]",public
6,TAB2075,"Consumer Price Index (CPI)/Living Cost Index, ...",2025-02-18 07:00:00+00:00,1914M01,2025M01,"[observations, month]",public


### Navigate the database
The PxStatsPy wrapper provides full navigation of the database.

In [37]:
# Create navigator
navigator = client.navigator

# Start at root
navigator.get_root()
navigator.print_current_location()


Current folder: 

Folders:
  - Agriculture, forestry and fishery (ID: JO)
  - Business activities (ID: NV)
  - Democracy (ID: ME)
  - Education and research (ID: UF)
  - Energy (ID: EN)
  - Environment (ID: MI)
  - Financial markets (ID: FM)
  - General statistics (ID: AA)
  - Household finances (ID: HE)
  - Housing, construction and building (ID: BO)
  - Labour market (ID: AM)
  - Living conditions (ID: LE)
  - National accounts (ID: NR)
  - Population (ID: BE)
  - Prices and Consumption (ID: PR)
  - Public finances (ID: OE)
  - Trade in goods and services (ID: HA)
  - Transport and communications (ID: TK)


In [38]:
# Navigate to a specific folder
contents = navigator.navigate_to("BE")  # Folder ID for Population
navigator.print_current_location()


Current folder: Population

Folders:
  - Population statistics (ID: BE0101)
  - Demographic Analysis (Demography) (ID: BE0701)
  - Name statistics (ID: BE0001)
  - Population projections (ID: BE0401)


In [39]:
# Go back one step
navigator.go_back()
navigator.print_current_location()


Current folder: 

Folders:
  - Agriculture, forestry and fishery (ID: JO)
  - Business activities (ID: NV)
  - Democracy (ID: ME)
  - Education and research (ID: UF)
  - Energy (ID: EN)
  - Environment (ID: MI)
  - Financial markets (ID: FM)
  - General statistics (ID: AA)
  - Household finances (ID: HE)
  - Housing, construction and building (ID: BO)
  - Labour market (ID: AM)
  - Living conditions (ID: LE)
  - National accounts (ID: NR)
  - Population (ID: BE)
  - Prices and Consumption (ID: PR)
  - Public finances (ID: OE)
  - Trade in goods and services (ID: HA)
  - Transport and communications (ID: TK)


## Get data from database table

To get data from a table you need the unique table id.

### Print table variables
To view available variables in a table you can use the `print_table_variables()`function.

In [2]:
# Get data for a specific table
table_id = "TAB1267"

# Print available variables
client.print_table_variables(table_id, max_values=1)  # Use "*" to show all values


Table: Population 1 November, number by region, age, sex, observations and year

Available variables:

region (Region):
First 1 values:
  - Sweden (Code: 00)
... and 311 more values

age (Alder):
First 1 values:
  - 0 years (Code: 0)
... and 101 more values

sex (Kon):
First 1 values:
  - men (Code: 1)
... and 1 more values

observations (ContentsCode):
First 1 values:
  - Number (Code: BE0101A9)

year (Tid):
First 1 values:
  - 2002 (Code: 2002)
... and 22 more values


In [42]:
# Show specific variable
client.print_table_variables(table_id, variable_id="Region") # Specify variable_id


Table: Population 1 November, number by region, age, sex, observations and year

Values for region (Region):
First 10 values:
  - 00 Sweden (Code: 00)
  - 01 Stockholm county (Code: 01)
  - 0114 Upplands Väsby (Code: 0114)
  - 0115 Vallentuna (Code: 0115)
  - 0117 Österåker (Code: 0117)
  - 0120 Värmdö (Code: 0120)
  - 0123 Järfälla (Code: 0123)
  - 0125 Ekerö (Code: 0125)
  - 0126 Huddinge (Code: 0126)
  - 0127 Botkyrka (Code: 0127)

... and 302 more values


### Get data as Pandas DataFrame
Use the `get_data_as_dataframe()`function to get data as a Pandas DataFrame.

In [None]:
# Get data as Pandas DataFrame
df = client.get_data_as_dataframe(
    table_id=table_id,
    value_codes={
        "Tid": ["FROM(2020)"],         # Use FROM(Start) to get all values from specified
        "Region": ["01"],              
        "ContentsCode": ["BE0101A9"]
    },
    clean_colnames=True     # Get dataframe with cleaner output
)

df.head()


Retrieving data from table TAB1267
Calculated data cells: 5

Successfully retrieved 5 rows of data


Unnamed: 0,region_code,region,year,number
0,1,Stockholm county,2020,2391841
1,1,Stockholm county,2021,2411859
2,1,Stockholm county,2022,2437158
3,1,Stockholm county,2023,2455914
4,1,Stockholm county,2024,2471773


The PxStatsPy wrapper automatically handles the API rate limits to ensure calls do not exceed neither data nor time limits. If the calculated number of data cells exceeds the API limit (currently 150 000 data cells) the data will be retrieved in batches.

In [3]:
# Run large API call
df_batches = client.get_data_as_dataframe(
    table_id=table_id,
    value_codes={
        "Tid": ["*"],           # Use the "*" wildcard to get all values
        "Region": ["14*"],      # Use "14*" wildcard to get all values starting with 14
        "Alder": ["*"],
        "Kon": ["*"],
        "ContentsCode": ["BE0101A9"]
    }
)

df_batches


Retrieving data from table TAB1267
Calculated data cells: 234.6K
Request will be split into 2 parts using variable 'Alder'
Processing data in batches... . Done!

Successfully retrieved 234,600 rows of data


Unnamed: 0,region_code,region,age,sex,year,Number
0,14,Västra Götaland county,0 years,men,2002,6926
1,14,Västra Götaland county,0 years,men,2003,7311
2,14,Västra Götaland county,0 years,men,2004,7536
3,14,Västra Götaland county,0 years,men,2005,7627
4,14,Västra Götaland county,0 years,men,2006,7962
...,...,...,...,...,...,...
234595,1499,Falköping,total,women,2020,16545
234596,1499,Falköping,total,women,2021,16494
234597,1499,Falköping,total,women,2022,16518
234598,1499,Falköping,total,women,2023,16395


#### Selection Criteria

The API supports powerful selection expressions for filtering data. These expressions can be used with `get_data_as_dataframe()` and other data retrieval methods through the value_codes parameter.

Selection Expression Rules:

- Wildcard (*) can be used at start, end, or middle of pattern
- Question mark (?) matches exactly one character
- RANGE requires both start and end values: RANGE(start,end)
- FROM/TO expressions take a single value: FROM(start) or TO(end)
- BOTTOM/TOP expressions take a single value: BOTTOM(5) show last five, TOP(5) shows first five

All expressions are case-insensitive and a maximum of two wildcards can be given per pattern.

In [None]:
# Combine multiple selection criteria
df_multiple = client.get_data_as_dataframe(
    table_id=table_id,
    value_codes={
        "Region": ["01*", "12*"],            # Regions starting with 01 or 12
        "Tid": ["RANGE(2020,2024)"],         # Years 2020-2024
        "ContentsCode": ["BE0101A9"]
    }
)

df_multiple


Retrieving data from table TAB1267
Calculated data cells: 305

Successfully retrieved 305 rows of data


Unnamed: 0,region_code,region,year,Number
0,01,Stockholm county,2020,2391841
1,01,Stockholm county,2021,2411859
2,01,Stockholm county,2022,2437158
3,01,Stockholm county,2023,2455914
4,01,Stockholm county,2024,2471773
...,...,...,...,...
300,1293,Hässleholm,2020,52086
301,1293,Hässleholm,2021,52290
302,1293,Hässleholm,2022,52366
303,1293,Hässleholm,2023,52245


In [None]:
# Match exactly one character
df_match = client.get_data_as_dataframe(
    table_id=table_id,
    value_codes={
        "Region": ["0?", "1?", "2?"], # Two-digit codes starting with 0, 1 and 2 (Sweden plus all counties)
        "Tid": ["2024"],
        "ContentsCode": ["BE0101A9"]
    }
)

df_match


Retrieving data from table TAB1267
Calculated data cells: 22

Successfully retrieved 22 rows of data


Unnamed: 0,region_code,region,year,Number
0,0,Sweden,2024,10587140
1,1,Stockholm county,2024,2471773
2,3,Uppsala county,2024,407698
3,4,Södermanland county,2024,301723
4,5,Östergötland county,2024,472723
5,6,Jönköping county,2024,369857
6,7,Kronoberg county,2024,203445
7,8,Kalmar county,2024,246408
8,9,Gotland county,2024,61000
9,10,Blekinge county,2024,157365


#### Output format parameters
The API supports retrieval of data in different forms. Use the class OutputFormatParam to retrieve data in any of the available forms.

- USE_TEXTS (default)
- USE_CODES
- USE_CODES_AND_TEXTS

In [2]:
# Include both codes and text labels
df_detailed = client.get_data_as_dataframe(
    table_id="TAB1278",
    value_codes={
        "Tid": ["2023", "2024"],
        "Region": ["00", "01", "0180"],
        "Fordonsslag": ["10"],
        "ContentsCode": ["TK1001AC"]
    },
    output_format_param=OutputFormatParam.USE_CODES_AND_TEXTS,
    clean_colnames=True
)

df_detailed


Retrieving data from table TAB1278
Calculated data cells: 6

Successfully retrieved 6 rows of data


Unnamed: 0,region_code,region,type_of_vehicle,year,number
0,0,Sweden,10 - passenger cars,2023,4977163
1,0,Sweden,10 - passenger cars,2024,4977791
2,1,Stockholm county,10 - passenger cars,2023,967916
3,1,Stockholm county,10 - passenger cars,2024,983506
4,180,Stockholm,10 - passenger cars,2023,353523
5,180,Stockholm,10 - passenger cars,2024,352934


#### Region types

The PxStatsPy wrapper supports selection of DeSO and RegSO regional types when applicable. The function `get_data_as_dataframe()` will get both deso and regso but setting region_type to "deso" or "regso" filters for region type.

In [62]:
# Filter for specific region types (DeSO/RegSO)
df_regso = client.get_data_as_dataframe(
    table_id="TAB6258",
    value_codes={
        "Tid": ["2023"],
        "Region": ["0114*"],
        "ContentsCode": ["000006OC"]
    },
    region_type="regso"         # Use "deso" or "regso"
)

df_regso.head()


Retrieving data from table TAB6258
Calculated data cells: 35

Successfully retrieved 12 rows of data


Unnamed: 0,region_code,region,year,Number
23,0114R001,Upplands Väsby (Bollstanäs),2023,2208
24,0114R002,Upplands Väsby (Hammarby-Vaxmyra),2023,1148
25,0114R003,Upplands Väsby (Odenslunda norra-Frestaby-Ekeby),2023,1738
26,0114R004,Upplands Väsby (Odenslunda södra-Bredden),2023,1154
27,0114R005,Upplands Väsby (Runby norra),2023,899


#### Raw Data Formats

Retrieve data in various formats using `get_table_data()`. Use the class OutputFormat to select from available formats.

- JSON_STAT2
- JSON_PX
- PX
- CSV
- XLSX
- HTML
- PARQUET

In [64]:
# Get JSON-stat2 format
json_data = client.get_table_data(
    table_id=table_id,
    value_codes={
        "Tid": ["2023"],
        "Region": ["00", "01"],
        "ContentsCode": ["BE0101A9"]
    },
    output_format=OutputFormat.JSON_STAT2 # Use OutputFormat class
)

json_data


Retrieving data from table TAB1267
Calculated data cells: 2


{'version': '2.0',
 'class': 'dataset',
 'label': 'Population 1 November, number by region, observations and year',
 'source': 'Statistics Sweden',
 'updated': '2024-12-10T07:00:00Z',
 'note': ['The population is reported according to the division into administrative areas that applied on 1 January each Year.'],
 'role': {'time': ['Tid'], 'metric': ['ContentsCode']},
 'id': ['Region', 'ContentsCode', 'Tid'],
 'size': [2, 1, 1],
 'dimension': {'Region': {'label': 'region',
   'category': {'index': {'00': 0, '01': 1},
    'label': {'00': 'Sweden', '01': 'Stockholm county'}},
   'extension': {'elimination': True,
    'eliminationValueCode': '00',
    'show': 'code_value'}},
  'ContentsCode': {'label': 'observations',
   'category': {'index': {'BE0101A9': 0},
    'label': {'BE0101A9': 'Number'},
    'unit': {'BE0101A9': {'base': 'number', 'decimals': 0}}},
   'extension': {'elimination': False,
    'refperiod': {'BE0101A9': '1 November each year'},
    'show': 'value'}},
  'Tid': {'label':

### Get metadata
Use the `get_table_metadata()`function to retrieve all available metadata for a table.

In [None]:
# Get metadata
metadata = client.get_table_metadata(
    table_id,
    output_format="json-stat2"  # output_format is either "json-stat2" or "json-px"
)

metadata

{'version': '2.0',
 'class': 'dataset',
 'href': 'https://api.scb.se/OV0104/v2beta/api/v2/tables/TAB1267/metadata?lang=en&outputFormat=json-stat2',
 'label': 'Population 1 November, number by region, age, sex, observations and year',
 'source': 'Statistics Sweden',
 'updated': '2024-12-10T07:00:00Z',
 'link': {'data': [{'href': 'https://api.scb.se/OV0104/v2beta/api/v2/tables/TAB1267/data?lang=en&outputFormat=px'}]},
 'note': ['The population is reported according to the division into administrative areas that applied on 1 January each Year.'],
 'role': {'time': ['Tid'], 'metric': ['ContentsCode']},
 'id': ['Region', 'Alder', 'Kon', 'ContentsCode', 'Tid'],
 'size': [312, 102, 2, 1, 23],
 'dimension': {'Region': {'label': 'region',
   'category': {'index': {'00': 0,
     '01': 1,
     '0114': 2,
     '0115': 3,
     '0117': 4,
     '0120': 5,
     '0123': 6,
     '0125': 7,
     '0126': 8,
     '0127': 9,
     '0128': 10,
     '0136': 11,
     '0138': 12,
     '0139': 13,
     '0140': 14

## Error Handling

The library provides comprehensive error handling.

Common error scenarios:
- Invalid table ID or other parameters
- Invalid URL
- Network connectivity issues
- Wrapper bug (please report an issue on github)

When reporting an issue, please enable the `client.debug=True` mode and provide the entire error message for replication and troubleshooting.

In [None]:
# Initialize client with debug=True mode to troubleshoot if errors occur
client = PxAPI(PxAPIConfig(
    base_url="https://api.scb.se/OV0104/v2beta/api/v2",
    language="en"
))
client.debug=True

In [None]:
from pxstatspy import PxAPIError

try:
    # Attempt to get data for invalid table
    data = client.get_table_data("INVALID_ID")
except PxAPIError as e:
    print(f"API Error: {e}")
    # Handle specific error cases
except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle other errors


DEBUG: Making API request
Method: GET
URL: https://api.scb.se/OV0104/v2beta/api/v2/tables/INVALID_ID/metadata
Parameters: {'outputFormat': None, 'defaultSelection': False}

Error Response:
Status code: 404
Error details: 404 Client Error: Not Found for url: https://api.scb.se/OV0104/v2beta/api/v2/tables/INVALID_ID/metadata?defaultSelection=False&lang=en
API Error: HTTP error: 404 Client Error: Not Found for url: https://api.scb.se/OV0104/v2beta/api/v2/tables/INVALID_ID/metadata?defaultSelection=False&lang=en
