Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static legacy files that potentially need to be served #205

Closed
baskaufs opened this issue Oct 11, 2018 · 11 comments
Closed

static legacy files that potentially need to be served #205

baskaufs opened this issue Oct 11, 2018 · 11 comments
Labels

Comments

@baskaufs
Copy link

@MattBlissett had given me a link to the vhosts redirect file, which now is no longer available online. However, I had been working on setting up redirects to basically every static file found at https://github.com/tdwg/dwc/tree/gh-pages .

Perhaps a better way to get at this is to look at the list of the requests to the server that did not result in 404s (i.e. documents actually delivered). This is a better indication of what users are actually asking for. The numbers in the first column indicate the number of requests (requests for DwC term and guide dereferencing are omitted).

   4223 /dwc/DarwinCore_files/default.css
   3995 /dwc/DarwinCore_files/TDWGlogo_Twiki.gif
   3947 /dwc/DarwinCore_files/default.js
   2603 /dwc/text/tdwg_dwc_text.xsd
    324 /dwc/xsd/tdwg_dwc_class_terms.xsd
    175 /dwc/xsd/tdwg_dwc_simple.xsd
    163 /dwc/xsd/tdwg_dwcterms.xsd
    120 /dwc/xsd/tdwg_basetypes.xsd
     93 /dwc/xsd/tdwg_dwc_extensions.xsd
     61 /dwc/rdf/dwctermshistory.rdf
     52 /dwc/rdf/dwcterms.rdf
     39 /dwc/tdwg_dw_geospatial.xsd
     35 /dwc/xsd/tdwg_dwc_classes.xsd
     30 /dwc/tdwg_dw_curatorial.xsd
     25 /dwc/tdwg_dw_core.xsd
     18 /dwc/examples/text/example_text_simpledwc_complete.xml
      2 /dwc/./xsd/tdwg_dwcterms.xsd
      2 /dwc/./xsd/tdwg_dwc_simple.xsd
      2 /dwc/./xsd/tdwg_dwc_class_terms.xsd
      2 /dwc/xsd/simpledarwincore/
      2 /dwc/trackback/
      2 /dwc/./text/tdwg_dwc_text.xsd
      2 /dwc/./rdf/dwcterms.rdf
      2 /dwc/./rdf/dwctermshistory.rdf
      2 /dwc/examples/text/simplemetafile.xml
      2 /dwc/./examples/text/example_text_simpledwc_complete.xml
      1 /dwc/tdwg_dw_record.xsd
      1 /dwc/tdwg_dw_record_tapir.xsd

As you can see there are some files that seem to be pretty important as they were requested hundreds or thousands of times. @tucotuco would have better insight about what these files might be being used for, and what would be likely to break if we no longer provided them.

ping @peterdesmet

@tucotuco
Copy link
Member

The first three are just files for page styling. Those should no longer be necessary.

The file /dwc/text/tdwg_dwc_text.xsd is the XML Schema file for Darwin Core archives - essential. This now lives at /standards/documents/text.

The .xsd files in /dwc/xsd/ are the XML Schema files for Darwin Core as XML - still used by some.

The file /dwc/rdf/dwctermshistory.rdf is the previous normative Darwin Core. The normative document is now /vocabulary/term_versions.csv.

The file /dwc/rdf/dwcterms.rdf was an extract of normative Darwin Core with only the currently recommended terms in it. This is deprecated.

The three files /dwc/tdwg_dw_geospatial.xsd, /dwc/tdwg_dw_curatorial.xsd and /dwc/tdwg_dw_core.xsd are XML Schema files for Darwin Core in DiGIR. This is not part of the Darwin Core standard, but there remain some DiGIR servers still in operation.

The files /dwc/tdwg_dw_record.xsd and /dwc/tdwg_dw_record_tapir.xsd are XML Schema files for Darwin Core in TAPIR. This is not part of the Darwin Core standard, but there remain some TAPIR servers still in operation as well.

All the references to /dwc/. look redundant and shouldn't exist.

The two ,xml files in /dwc/examples/text are examples of what Darwin Core Archive XML should look like.

The content at /dwc/xsd/simpledarwincore/ is now at /standard/documents/simple.

/dwc/trackback/ - no idea.

@baskaufs
Copy link
Author

OK, after some effort, I've gone through @MattBlissett 's not-404 list and eliminated all of the stuff that I don't think should produce a 200. Here is what I have left:

number of hits URL
1 /abcd2/terms/dataset-copyright-uri
1 /abcd2/terms/datasets
37 /abcd2/terms/gathering-siteimage-license-details
18 /dwc/examples/text/example_text_simpledwc_complete.xml
2 /dwc/examples/text/simplemetafile.xml
17 /dwc/examples/xml/example_simple.xml
52 /dwc/rdf/dwcterms.rdf
61 /dwc/rdf/dwctermshistory.rdf
6 /dwc/tdwg_basetypes.xsd
25 /dwc/tdwg_dw_core.xsd
30 /dwc/tdwg_dw_curatorial.xsd
39 /dwc/tdwg_dw_geospatial.xsd
1 /dwc/tdwg_dw_record_tapir.xsd
1 /dwc/tdwg_dw_record.xsd
20 /dwc/terms/history/dwctoabcd/
133 /dwc/terms/history/dwctoabcd/index.htm
11 /dwc/terms/history/versions/
92 /dwc/terms/history/versions/index.htm
120 /dwc/xsd/tdwg_basetypes.xsd
324 /dwc/xsd/tdwg_dwc_class_terms.xsd
35 /dwc/xsd/tdwg_dwc_classes.xsd
93 /dwc/xsd/tdwg_dwc_extensions.xsd
175 /dwc/xsd/tdwg_dwc_simple.xsd
163 /dwc/xsd/tdwg_dwcterms.xsd
4 /tapir/1.0/anyOutputModels
4 /tapir/1.0/mappedConcept
4 /tapir/1.0/maxElementRepetitions
4 /tapir/1.0/outputModel
4 /tapir/1.0/response
9 /tapir/1.0/schema/tdwg_tapir.xsd
4 /tapir/1.0/summary
4 /tapir/1.0/template
2 /UBIF/2006

Here are some notes about items on this list:

  1. The original vhosts redirect file redirected /tapir/1.0/* files to https://raw.githubusercontent.com/tdwg/tapir/1.0/* and I have also set that redirect, but there doesn't seem to be any actual "1.0" directory in the tapir repo. So it gets a 404 from Github.
  2. I'm wondering if the two .rdf files in /dwc/rdf/* should continue to be served for historical purposes (perhaps with a deprecation comment). I'm a bit reluctant to just redirect to the new CSV since the requested file is RDF/XML and the returned file would be text/CSV
  3. Based on what @tucotuco said the /dwc/xsd/* directory should be served. I'm currently redirecting to https://dwc.tdwg.org/xsd/* but am getting a 404 from Github pages. The individual TAPIR XSD files are being redirected to URLs like https://dwc.tdwg.org/tdwg_dw_curatorial.xsd, but aren't being served by Github pages.

So the bottom line is that I think all of the redirects are probably working right in my script and if any of the .xsd files would become available from Github pages, the problem would correct itself without needing to change the script. So @MattBlissett if you want to load the restxq.xqm file from https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/html/restxq.xqm into your labs.gbif.org BaseX server instance, we can see how it behaves there.

@tucotuco
Copy link
Member

tucotuco commented Sep 5, 2020

Given that we have not had any reports of issues in the last two years, I suspect that this issue can be closed. Opinions @peterdesmet @baskaufs ?

@tucotuco tucotuco added the task label Sep 5, 2020
@peterdesmet
Copy link
Member

Agree to close.

@baskaufs
Copy link
Author

baskaufs commented Sep 5, 2020

I think that this issue has been fixed and as far as I know everything is getting redirected to the right place with no complaints.

There is only one technical issue that should probably be considered before closing this. Refer to this section of code and beyond, which handles the legacy redirects. When I first set up these redirects, I used 301 redirects for the URLs that really shouldn't be used any more. However, I think it was @timrobertson100 who noted that 301 (moved permanently) redirects were hard to undo and that at least for the time, 307s (moved temporarily) would be better. So I changed all 301s to 307s.

Should some or all of the 307s be changed to 301's now that things seem stable?

@tucotuco
Copy link
Member

tucotuco commented Sep 5, 2020 via email

@baskaufs
Copy link
Author

baskaufs commented Sep 5, 2020

It probably doesn't make any difference, but I'm not expert enough on these matters to say for sure. I think that if you use a 301, then Google will stop indexing the old URL and just use the new one. So that may be significant.

@tucotuco
Copy link
Member

tucotuco commented Sep 5, 2020

Maybe we can get an opinion from @timrobertson100 and/or @MattBlissett and then move this issue toward closure.

@MattBlissett
Copy link
Member

301 is probably OK now, although it does introduce a risk in case of future changes to whatever code this is implemented in.

If it's complicated, I would leave it as 307s. If it's straightforward and would be difficult to accidentally mess up (e.g. a list of those URLs) then 301s would be safe.

@tucotuco
Copy link
Member

tucotuco commented Sep 7, 2020

Sounds like either route is Ok. Who would actually make the changes? If whoever has the time to make the change to 301s, let's do it. If not, let me know and we'll officially stick with 307s and close the issue.

@baskaufs
Copy link
Author

baskaufs commented Sep 8, 2020

There actually are not a lot of 301s and they are mostly to handle specific URLs, like redirecting the pattern

/dwc/terms/history/decisions

which doesn't work any more (and never will) to

/decisions

which does work.

The redirects for patterns of URLs are generally not 301s. Since I think there is consensus to make the changes, I will just do it as part of the next release of the rs.tdwg.org repo and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants