Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue regarding the use of "dods:" in DODSNetcdfFile #161

Open
lesserwhirls opened this issue Dec 5, 2019 · 4 comments
Open

Issue regarding the use of "dods:" in DODSNetcdfFile #161

lesserwhirls opened this issue Dec 5, 2019 · 4 comments

Comments

@lesserwhirls
Copy link
Collaborator

TL;DR;

Making an HTTP GET request to http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time works and https://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time[0:1:1] works, but http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time[0:1:1] fails with a 403 (request too big).

Perhaps a server side issue, but netCDF-Java could be able to make things work by doing the right thing in terms of using the proper protocol (https in this case).

Details

In ucar.nc2.dods.DODSNetcdfFile, any dataset url that starts with dods: is changed to use http:

// canonicalize name
String urlName = datasetURL; // actual URL uses http:
this.location = datasetURL; // canonical name uses "dods:"
if (datasetURL.startsWith("dods:")) {
urlName = "http:" + datasetURL.substring(5);
} else if (datasetURL.startsWith("http:")) {
this.location = "dods:" + datasetURL.substring(5);
} else if (datasetURL.startsWith("https:")) {
this.location = "dods:" + datasetURL.substring(6);
} else if (datasetURL.startsWith("file:")) {
this.location = datasetURL;
} else {
throw new java.net.MalformedURLException(datasetURL + " must start with dods: or http: or file:");
}

Of course, that's not always the correct thing to do, but if redirects are handled properly, and the server responds properly, it should all just work. For certain code paths, everything does work. For example, if we look at the following dataset url:

dods://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml

We can open the file using NetcdfDataset.acquireFile(), and we can successfully read the dds and das because redirects work and the server behaves well. However, if we try to open with NetcdfDataset.openDataset(), we fail because the OPeNDAP server returns a 403 when reading a slice (in this case, trying to get http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time[0:1:108082]). It's the "reading a slice" part that seems to be the key.

Doing a GET request on http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time works, but once I introduce the constraint, I run into problems. For example, if I try to HTTP Get http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.dods?time[0:1:1], I get:

Status = 403 HTTP/1.1 403 Forbidden
Status Line = HTTP/1.1 403 Forbidden
Response Headers = 
  Date: Thu, 05 Dec 2019 19:45:27 GMT
  Server: Apache-Coyote/1.1
  Strict-Transport-Security: max-age=31536000
  XDODS-Server: opendap/3.7
  Content-Description: dods-error
  Content-Type: text/plain
  Access-Control-Allow-Origin: *
  Access-Control-Allow-Headers: X-Requested-With, Content-Type
  Connection: close
  Transfer-Encoding: chunked

ResponseBody---------------
Error {
    code = 403;
    message = "Request too big=1.1117421067232E7 Mbytes, max=500.0";
};

If I change the same request to use https:, it works. It's almost like the the entire query (after the ?) is being dropped after a redirect when requesting a slice of data from a variable.

This behavior is also seen in the latest netCDF-Java 4.6.x code (current master branch over at https://github.com/unidata/thredds). The ability to handle dods: as a dataset url through NetcdfDataset used to work, at least as recently as 4.6.12-SNAPSHOT (from February of this year), so it's a somewhat recent change affecting both 4.6.x and 5.0.x.

It seems to me that, regardless if this is a server side issue or not (likely is), netCDF-Java could handle this by making the right choice when trying to map dods: in DODSNetcdfFile.

@lesserwhirls
Copy link
Collaborator Author

Tagging @DennisHeimbigner

@lesserwhirls
Copy link
Collaborator Author

Ok, I think I've found the root issue here, so I'll cut right to the chase. The NCEI server does not deal with redirects of encoded urls properly.

Let's follow the steps of a request to http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time[0:1:2]

We can immediately see an issue if we make a request, but encode the brackets:

curl -G -v "http://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggre
gation.ncml.ascii?time%5b0:1:2%5d"
*   Trying 205.167.25.171...
* TCP_NODELAY set
* Connected to www.ncei.noaa.gov (205.167.25.171) port 80 (#0)
> GET /thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time%5b0:1:2%5d HTTP/1.1
> Host: www.ncei.noaa.gov
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Fri, 24 Jan 2020 20:21:09 GMT
< Server: Apache
< Location: https://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time%255b0:1:2%255d
< Content-Length: 310
< Connection: close
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time%255b0:1:2%255d">here</a>.</p>
</body></html>
* Closing connection 0

Notice in the response header, the Location field has been double encoded. Specifically, the initial request uses time%5b0:1:2%5d as the query, whereas the redirection location shows time%255b0:1:2%255d.

Now if we try to use that location, we get:

curl -G -v "https://www.ncei.noaa.gov/thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time%255b0:1:2%255d"
*   Trying 205.167.25.178...
  <snip>
> GET /thredds/dodsC/cdr/gridsat/GridSat-Aggregation.ncml.ascii?time%255b0:1:2%255d HTTP/1.1
> Host: www.ncei.noaa.gov
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Date: Fri, 24 Jan 2020 20:31:51 GMT
< Server: Apache-Coyote/1.1
< Strict-Transport-Security: max-age=31536000
< XDODS-Server: opendap/3.7
< Content-Description: dods-error
< Content-Type: text/plain
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: X-Requested-With, Content-Type
< Connection: close
< Transfer-Encoding: chunked
<
Error {
    code = 403;
    message = "Request too big=1.1117421067232E7 Mbytes, max=50.0";
};
* Closing connection 0

@oxelson - this smells like a mod_* gotcha (where mod_* is not mod_jk). Have you ran into this before?

We also see in the 403 response header that the Server field is set as Apache-Coyote/1.1, which also makes me think there are proxy configuration issues at play on their end.

@oxelson
Copy link
Member

oxelson commented Jan 28, 2020

@lesserwhirls no, I've not seen this before with mod_jk, but that doesn't mean it's not a mod_* proxy issue. That said, I'd really need to know what their environment looks like (apache/tomcat configs) to see if the server environment is causing the encoding issues.

@lesserwhirls
Copy link
Collaborator Author

I'll reach out to NCEI to see what's going on with the double encoding on the httphttps redirect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants