# How to use 'id_query' ?

the 'id_query' is used in the Geo Pandas DataFrame as a key to distinguish different requests passed through the same GeoDataFrame.

Exact same query given twice with distinct 'id_query' will result in excact same results but stored twice in the output DataFrame.

The 'id_query' is also stored in the output DataFrame as 'id_original_query'.

In [None]:
import geopandas as gpd
import numpy as np
import cdsodatacli.query as qr
def example():
    """
    define a query with 2 identical sets of parameters except 'id_query'
    """
    gdf_multi_id = gpd.GeoDataFrame(
        {
            "start_datetime": [
                np.datetime64("2022-05-03 00:00:00"),
                np.datetime64("2022-05-03 00:00:00"),
            ],
            "end_datetime": [
                np.datetime64("2022-05-03 00:02:00"),
                np.datetime64("2022-05-03 00:02:00"),
            ],
            "geometry": [None, None],
            "collection": ["SENTINEL-2", "SENTINEL-2"],
            "name": [None, None],
            "sensormode": [None, None],
            "producttype": [None, None],
            "Attributes": [None, None],
            "id_query": ["test1", "test2"],
        }
    )
    result = qr.fetch_data(gdf=gdf_multi_id, top=1000)
    # check that a product is present twice with different id_original_query
    counts = result["id_original_query"].value_counts()
    return result,counts


In [None]:
result,counts = example()
assert all(counts[counts > 1].index.isin(["test1", "test2"]))
result

In [None]:
counts

## we can check that product IDs are exatcly the same

In [None]:
cpt_common = 0
cpt_uncommon = 0
total = 0
for uu in result[result['id_original_query']=='test1']['Id']:
    total += 1
    is_comon = False
    for vv in result[result['id_original_query']=='test2']['Id']:
        if uu==vv:
            # print('bingo')
            is_comon = True
    if is_comon is True:
        cpt_common += 1
    else:
        cpt_uncommon += 1
        # print(uu)
print('cpt_common',cpt_common,'/',total)
print('cpt_uncommon',cpt_uncommon,'/',total)

 doing the same query with a single 'id_query' returns a DataFrame with 78 lines instead of 156

In [None]:
gdf_unique_id = gpd.GeoDataFrame(
        {
            "start_datetime": [
                np.datetime64("2022-05-03 00:00:00"),
            ],
            "end_datetime": [
                np.datetime64("2022-05-03 00:02:00"),
            ],
            "geometry": [None],
            "collection": ["SENTINEL-2"],
            "name": [None],
            "sensormode": [None],
            "producttype": [None],
            "Attributes": [None],
            "id_query": ["test1"],
        }
    )
result_2 = qr.fetch_data(gdf=gdf_unique_id, top=1000)
result_2

When there is multiple "id_query", the input GeoDataFrame is split/group-by into smaller GeoDataFrame with identical "id_query".

Setting carefully the "id_query" column in the input GeoDataFrame is important because it impacts the number of queries and the output returned.

In most of the case the 'id_query' column can be filled with a unique value, but for some cases, for instance when co-locating a set of moored buoys with Sentinel products, the user needs the exhaustive list of Sentinel products per location and he also needs to be able to separate the results per location (i.e. buoys).