# EuStatsPy
## A Python wrapper for the Eurostat API

A Python wrapper for the Eurostat statistics and catalogue APIs, providing easy access to European statistical data.

## Setup and configuration
Initialize the client with caching configuration. Caching is recommended, the default cache directory is your current working directory. Use the cache_dir paramater to set another directory.

It is recommended to also pre-load the metabase for optimal performance. The metabase provides comprehensive and up to date information about dataset and tables dimensions and variables.

In [2]:
import eustatspy as est

# Initialize the client with caching enabled
client = est.EurostatClient(cache_enabled=True)

# Pre-load metabase for optimal performance (one-time cost)
metabase = client.preload_metabase()

🚀 Pre-loading Eurostat metabase...
✅ Metabase loaded successfully!
   📊 7,501 datasets available


In [3]:
# Clear cache when needed
client.clear_cache()

Cache cleared successfully.


## Search and browse database
The EuStatsPy wrapper offers multiple ways to search for datasets and tables in the database.

### Search for datasets
Use the `search_datasets()` functions to search for datasets and tables matching a keyword. Filter your search with the updated_since parameter to get recently updated datasets and tables. 

This function is useful for exploring the database and for setting up automated data pipelines as you can get codes for updated datasets and tables and look up against your reference table before reading in new data to your application or database.

In [19]:
# Basic search
client.search_datasets("GDP", max_results=5)

Unnamed: 0,code,title,type,last_update,last_modified,data_start,data_end,values_count,short_description,unit,source
0,namq_10_gdp,Gross domestic product (GDP) and main componen...,dataset,2025-07-10,2025-04-29,1975-Q1,2025-Q1,7893568.0,,,
1,namq_10_pc,Gross domestic product (GDP) and main componen...,dataset,2025-07-10,2025-04-29,1980-Q1,2025-Q1,364931.0,,,
2,teina110,GDP deflator,table,2025-07-10,2025-04-29,2022-Q2,2025-Q1,456.0,,,
3,sdg_10_10,Purchasing power adjusted GDP per capita,table,2025-07-10,2025-06-18,2000,2024,2216.0,,,
4,namq_10_gdp,Gross domestic product (GDP) and main componen...,dataset,2025-07-10,2025-04-29,1975-Q1,2025-Q1,7893568.0,,,


In [None]:
# Search with date filter (data updated since specific date)
recent_data = client.search_datasets(
    query="unemployment", 
    updated_since="2025-07-07",
    max_results=5
)

recent_data

Unnamed: 0,code,title,type,last_update,last_modified,data_start,data_end,values_count,short_description,unit,source
0,ei_lmhu_m,Unemployment (1 000) - monthly data,table,2025-07-07,2025-07-07,1983-01,2025-06,247943,,,
1,ei_lmhr_m,Unemployment rate (%) - monthly data,table,2025-07-07,2025-07-07,1983-01,2025-06,244492,,,
2,ei_lm_m_vtg,Unemployment - monthly data - vintages from 20...,dataset,2025-07-07,2025-07-07,1983-01,2025-05,1277404,,,
3,une_rt_m,Unemployment by sex and age - monthly data,dataset,2025-07-07,2025-07-07,1983-01,2025-06,719647,,,
4,teilm010,Unemployment by sex,table,2025-07-07,2025-07-07,2024-07,2025-06,1179,,,


### Browse the database
The EuStatsPy wrapper provides full and intuitive navigation of the Eurostat database.

In [7]:
# Start at the root to see main themes
client.browse_database()

Eurostat Database - Main Themes:
📊 Themes:
  📁 general: General and regional statistics (8 items)
  📁 economy: Economy and finance (7 items)
  📁 popul: Population and social conditions (14 items)
  📁 icts: Industry, trade and services (4 items)
  📁 agric: Agriculture, forestry and fisheries (3 items)
  📁 external: International trade (2 items)
  📁 transp: Transport (9 items)
  📁 envir: Environment and energy (2 items)
  📁 science: Science, technology, digital society (2 items)
  📁 tb_eu: Tables on EU policy (6 items)
  📁 cc: Cross cutting topics (9 items)

Showing all 11 items in this folder.

Use browse_database('folder_code') to explore subfolders or describe_dataset('dataset_code') for dataset details.


In [8]:
# Explore specific themes
client.browse_database('general')  # General statistics

📁 general: General and regional statistics
📁 Folders:
  📁 euroind: European and national indicators for short-term analysis (10 items)
  📁 reg: Regional statistics by NUTS classification (17 items)
  📁 reg_typ: Regional statistics by typology (3 items)
  📁 degurb: Degree of urbanisation (7 items)
  📁 urb: City statistics (3 items)
  📁 reg_nat: Other sub-national statistics (3 items)
  📁 lan: Land cover and land use, landscape (LUCAS) (6 items)
  📁 noneu: Non EU countries (3 items)

Showing all 8 items in this folder.

Use browse_database('folder_code') to explore subfolders or describe_dataset('dataset_code') for dataset details.


In [None]:
# Find datasets and tables
client.browse_database('reg_demfer')  # Fertility

📁 reg_demfer: Fertility

📄 Datasets and Tables:
  📄 demo_r_births: Live births (total) by NUTS 3 region (Updated: 2025-04-01, 58,272 values)
  📄 demo_r_fagec3: Live births by age group of the mothers and NUTS 3 region (Updated: 2025-03-07, 242,464 values)
  📄 demo_r_fagec: Live births by mother's age and NUTS 2 region (Updated: 2025-04-01, 525,487 values)
  📄 demo_r_frate2: Fertility rates by age and NUTS 2 region (Updated: 2025-07-03, 475,594 values)
  📄 demo_r_find2: Fertility indicators by NUTS 2 region (Updated: 2025-07-03, 39,350 values)
  📄 demo_r_find3: Fertility indicators by NUTS 3 region (Updated: 2025-07-03, 59,035 values)

Showing all 6 items in this folder.

Use browse_database('folder_code') to explore subfolders or describe_dataset('dataset_code') for dataset details.


## Get data from database

To get data from a dataset or table you need the unique dataset/table code.

### View dataset dimensions and variables
To view the available dimensions and variables in a in a dataset or table you can use the `describe_dataset()`function.

In [21]:
# Get data for a specific table
dataset_code = "nama_10_gdp"

# See all available dimensions and variables for specific dataset or table
client.describe_dataset(dataset_code, max_values_per_dimension=5)

Dataset: nama_10_gdp
Title: Gross domestic product (GDP) and main components (output, expenditure and income)
Type: dataset
Last Updated: 2025-07-08 00:00:00
Data Period: 1975 - 2024
Number of Values: 1,050,045

Available Dimensions and Filters:
-----------------------------------
(Found 4 dimensions in metabase)

geo:
  - EU27_2020
  - EA
  - EA20
  - EA19
  - EA12
  ... and 40 more values
  (Use show_all_for_dimension='geo' to see all 45 values)

na_item:
  - B1GQ
  - B1G
  - P3
  - P3_S13
  - P31_S13
  ... and 34 more values
  (Use show_all_for_dimension='na_item' to see all 39 values)

time:
  - 2024
  - 2023
  - 2022
  - 2021
  - 2020
  ... and 45 more values
  (Use show_all_for_dimension='time' to see all 50 values)

unit:
  - CLV_I20
  - CLV_I15
  - CLV_I10
  - CLV_I05
  - PC_GDP
  ... and 27 more values
  (Use show_all_for_dimension='unit' to see all 32 values)

Tip: Use show_all_for_dimension='dimension_name' to see all available values for a specific dimension.
     Example: 

### Get data as Pandas DataFrame
Use the `get_data_as_dataframe()`function to get data as a Pandas DataFrame.

In [None]:
# Get data as Pandas DataFrame
df = client.get_data_as_dataframe(
       dataset_code,
       geo='SE',
       unit='CP_MEUR',
       lastTimePeriod=5
   )

df.head()

Unnamed: 0,freq,unit,na_item,geo,time,value,freq_label,unit_label,na_item_label,geo_label,time_label
0,A,CP_MEUR,B1GQ,SE,2020,478106.9,Annual,"Current prices, million euro",Gross domestic product at market prices,Sweden,2020
1,A,CP_MEUR,B1GQ,SE,2021,533953.6,Annual,"Current prices, million euro",Gross domestic product at market prices,Sweden,2021
2,A,CP_MEUR,B1GQ,SE,2022,547190.4,Annual,"Current prices, million euro",Gross domestic product at market prices,Sweden,2022
3,A,CP_MEUR,B1GQ,SE,2023,535176.8,Annual,"Current prices, million euro",Gross domestic product at market prices,Sweden,2023
4,A,CP_MEUR,B1GQ,SE,2024,559138.7,Annual,"Current prices, million euro",Gross domestic product at market prices,Sweden,2024


#### Selection Criteria

The API supports powerful selection expressions for filtering data. These expressions can be used with the `get_data_as_dataframe()` function.

Selection Expression Examples:

- **geo**: Geographic areas - `'SE'`, `['SE', 'DK']`, or `'all'`
- **time**: Time periods - `'2020'`, `['2020', '2021']`, `'2020-Q1'`
- **geoLevel**: Geographic level - `'country'`, `'nuts1'`, `'nuts2'`, `'nuts3'`, `'city'`, `'aggregate'`
- **lastTimePeriod**: Number of latest periods - `1`, `5`, `10`
- **sinceTimePeriod**: Start period - `'2020'`, `'2020-Q1'`, `'2020-01'`
- **untilTimePeriod**: End period - `'2023'`, `'2023-Q4'`, `'2023-12'`

Plus dataset-specific dimensions like `unit`, `na_item`, `sex`, `age` etc.

In [None]:
# By geographic level
df_geoLevel = client.get_data_as_dataframe(
    dataset_code,
    geoLevel='country', # Get data for countries
    unit='CP_MEUR',
    lastTimePeriod=1
)

df_geoLevel.head()

Unnamed: 0,freq,unit,na_item,geo,time,value,status,freq_label,unit_label,na_item_label,geo_label,time_label
0,A,CP_MEUR,B1GQ,BE,2024,613983.9,p,Annual,"Current prices, million euro",Gross domestic product at market prices,Belgium,2024
1,A,CP_MEUR,B1GQ,BG,2024,103723.0,p,Annual,"Current prices, million euro",Gross domestic product at market prices,Bulgaria,2024
2,A,CP_MEUR,B1GQ,CZ,2024,320741.7,,Annual,"Current prices, million euro",Gross domestic product at market prices,Czechia,2024
3,A,CP_MEUR,B1GQ,DK,2024,392400.7,,Annual,"Current prices, million euro",Gross domestic product at market prices,Denmark,2024
4,A,CP_MEUR,B1GQ,DE,2024,4305260.0,p,Annual,"Current prices, million euro",Gross domestic product at market prices,Germany,2024


In [None]:
# Time ranges
df_timeRange = client.get_data_as_dataframe(
    'ei_isbr_m',
    geo='SE',
    sinceTimePeriod='2024-01',  # From date
    untilTimePeriod='2024-12'   # To date
)

df_timeRange.head()

Unnamed: 0,freq,unit,nace_r2,indic,geo,time,value,status,freq_label,unit_label,nace_r2_label,indic_label,geo_label,time_label
0,M,RT1-SCA,B-D_F,IS-IP,SE,2024-01,2.0,i,Monthly,Percentage change (t/t-1) - seasonally and cal...,Mining and quarrying; manufacturing; electrici...,Production index,Sweden,2024-01
1,M,RT1-SCA,B-D_F,IS-IP,SE,2024-02,-1.9,i,Monthly,Percentage change (t/t-1) - seasonally and cal...,Mining and quarrying; manufacturing; electrici...,Production index,Sweden,2024-02
2,M,RT1-SCA,B-D_F,IS-IP,SE,2024-03,0.1,i,Monthly,Percentage change (t/t-1) - seasonally and cal...,Mining and quarrying; manufacturing; electrici...,Production index,Sweden,2024-03
3,M,RT1-SCA,B-D_F,IS-IP,SE,2024-04,-3.9,i,Monthly,Percentage change (t/t-1) - seasonally and cal...,Mining and quarrying; manufacturing; electrici...,Production index,Sweden,2024-04
4,M,RT1-SCA,B-D_F,IS-IP,SE,2024-05,1.7,p,Monthly,Percentage change (t/t-1) - seasonally and cal...,Mining and quarrying; manufacturing; electrici...,Production index,Sweden,2024-05


In [None]:
# Complex filtering with multiple dimensions
df_multiDim = client.get_data_as_dataframe(
    'lfst_r_lfsd2pop',
    geo=['SE11', 'SE12'],                   # Specific NUTS2 regions
    age='Y25-64',                           # Age group
    isced11=['ED0-2', 'ED3_4', 'ED5-8'],    # Educational level
    sex=['M', 'F'],                         # Males and Females
    lastTimePeriod=3                        # Last three time periods
)

df_multiDim.head()

Unnamed: 0,freq,isced11,sex,age,unit,geo,time,value,freq_label,isced11_label,sex_label,age_label,unit_label,geo_label,time_label
0,A,ED0-2,M,Y25-64,THS_PER,SE11,2022,76.0,Annual,"Less than primary, primary and lower secondary...",Males,From 25 to 64 years,Thousand persons,Stockholm,2022
1,A,ED0-2,M,Y25-64,THS_PER,SE11,2023,76.3,Annual,"Less than primary, primary and lower secondary...",Males,From 25 to 64 years,Thousand persons,Stockholm,2023
2,A,ED0-2,M,Y25-64,THS_PER,SE11,2024,74.4,Annual,"Less than primary, primary and lower secondary...",Males,From 25 to 64 years,Thousand persons,Stockholm,2024
3,A,ED0-2,M,Y25-64,THS_PER,SE12,2022,62.6,Annual,"Less than primary, primary and lower secondary...",Males,From 25 to 64 years,Thousand persons,Östra Mellansverige,2022
4,A,ED0-2,M,Y25-64,THS_PER,SE12,2023,68.7,Annual,"Less than primary, primary and lower secondary...",Males,From 25 to 64 years,Thousand persons,Östra Mellansverige,2023


## Error Handling

The library provides comprehensive error handling.

Common error scenarios:
- Invalid table ID or other parameters
- Network connectivity issues
- Wrapper bug (please report an issue on github)

In [18]:
try:
    df = client.get_data_as_dataframe('invalid_dataset')
except est.DatasetNotFoundError:
    print("Dataset not found")
except est.InvalidParameterError as e:
    print(f"Invalid parameters: {e}")
except est.EurostatAPIError as e:
    print(f"API error: {e}")

API error: Failed to get data: Dataset not found: ERR_NOT_FOUND_4: INVALID_DATASET (DATA_FLOW:ALL,1.0) is not available for dissemination.
