Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDSCatalog does not include base path in access_url #724

Open
jm-cook opened this issue Aug 24, 2023 · 2 comments
Open

TDSCatalog does not include base path in access_url #724

jm-cook opened this issue Aug 24, 2023 · 2 comments

Comments

@jm-cook
Copy link

jm-cook commented Aug 24, 2023

TDSCatalog is not constructing the access_url correctly when there is a base path.

from siphon.catalog import TDSCatalog
cat_url = "https://opendap1-test.nodc.no/opendap/hyrax/DSG/Physics/project/SmartOcean/Austevoll-Nord/catalog.xml"
cat = TDSCatalog(cat_url)
for cds in cat.datasets:
    url = cat.datasets[cds].access_urls['dap']
    print(f'url = {url}')

The result is

url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202205.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202206.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202207.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202209.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202210.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202211.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202212.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202301.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202302.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202303.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202304.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202305.nc
url = https://opendap1-test.nodc.no/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202306.nc

but 'opendap/hyrax' is not in the path.

The correct path should be:

https://opendap1-test.nodc.no/opendap/hyrax/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202205.nc
... etc

In the catalog.xml this is given as base="/opendap/hyrax" for the dap service

<thredds:catalog xmlns:thredds="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:bes="http://xml.opendap.org/ns/bes/1.0#">
<thredds:service name="dap" serviceType="OPeNDAP" base="/opendap/hyrax"/>
<thredds:service name="file" serviceType="HTTPServer" base="/opendap/hyrax"/>
<thredds:service name="WCS-coads" serviceType="WCS" base="/opendap/wcs"/>
<thredds:dataset name="/DSG/Physics/project/SmartOcean/Austevoll-Nord" ID="/opendap/hyrax/DSG/Physics/project/SmartOcean/Austevoll-Nord/">
<thredds:dataset name="NMDC_AR_MO_Austevoll-Nord_202205.nc" ID="/opendap/hyrax/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202205.nc">
<thredds:dataSize units="bytes">1145638</thredds:dataSize>
<thredds:date type="modified">2023-06-23T09:28:13</thredds:date>
<thredds:access serviceName="dap" urlPath="/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202205.nc"/>
<thredds:access serviceName="file" urlPath="/DSG/Physics/project/SmartOcean/Austevoll-Nord/NMDC_AR_MO_Austevoll-Nord_202205.nc"/>
</thredds:dataset>
...

Python version : 3.8.8
Siphon version: 0.9

@jm-cook
Copy link
Author

jm-cook commented Aug 25, 2023

Might be related to #114

However that was fixed?

@jm-cook
Copy link
Author

jm-cook commented Aug 28, 2023

I looked some more into this since I was curious as to why the fix to #114 does not solve my issue. I ran my little test on the oceandata opendap catalog link in #114 (now updated to https://oceandata.sci.gsfc.nasa.gov/opendap/SeaWiFS/L3SMI/2000/0101/catalog.xml) and see the same issue, ie opendap/hyrax is not inserted into the url.

What is happening is that when access_urls is constructed in make_access_urls(), the server_base url is correctly constructed as 'https://oceandata.sci.gsfc.nasa.gov/opendap/hyrax', the url_path is obtained as the absolute path /SeaWiFS/L3SMI/2000/0101/SEASTAR_SEAWIFS_GAC.20000101.L3m.DAY.CHL.chlor_a.9km.nc but then on the next line

                    access_urls[subservice.service_type] = urljoin(server_base,  self.url_path)

urljoin will create the absolute url, so the opendap/hyrax part of the server_base url is lost.

This seems to be how hyrax is presenting the paths (ie with a leading slash).

According to the documentation here: https://docs.unidata.ucar.edu/tds/5.2/userguide/basic_client_catalog.html, the service base, urlPath, and dataset should be concatenated together.

The code below shows the incorrectly constructed access_urls (the urls created cannot be accessed):

from siphon.catalog import TDSCatalog
cat_url = "https://oceandata.sci.gsfc.nasa.gov/opendap/SeaWiFS/L3SMI/2000/0101/catalog.xml"
cat = TDSCatalog(cat_url)
for cds in cat.datasets:
    url = cat.datasets[cds].access_urls['dap']
    print(f'access url = {url}')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant