-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
define machine-readable? #2
Comments
Totally agree and support your point of view. I'd although add the following point extra:
|
Well but that's basically your first point. The only difference is in the public statement. |
Well, even if a dataset is proprietary, this does not prevent one from implementing a (access-restricted) API. How should we proceed? Should I make a pull request? And would you like to keep "machine-readable" as a basic requirement or would you rather provide a "machine-friendly" sticker that highlights those entries which make an effort to be machine-readable? |
Let's keep the |
Fine!
Here it is not really clear to me what this means... Some of the databases in the list can be downloaded, so that's fine. Some may have documented APIs for automated querying. But several also don't or am I missing something? In essence, what I am looking for is the set of criteria that led you to the choice of the databases in the list (so that I know how to add to it). |
OK, let me try to formulate... |
…e answer on the mass downloads for data mining (relevant for #2)
@ltalirz I thought on your suggestion and ended up with the following. Any database is machine-readable by design. Only the access policies matter (and they aren't necessarily FAIR!). For instance, upon a private agreement, one may be granted an unrestricted access to a conservative, otherwise HTML-only data source. After contacting some of the uncertain participants of my list, I received explicit or implicit requests for deletion. So why shouldn't we follow the canary principle? We just include anything we know was or would be of use for the mentioned or similar software frameworks and delete immediately by request. |
Agreed.
Do I understand correctly that you are proposing to include any potentially useful database, as long as they do not explicitly state (publicly or to us) that they are not open for machine-based data mining? In this case, however, I would suggest two things:
|
Great!
There's proprietary label already. Its absence assumes the data are open.
OK, makes sense. |
Since you are restricting the list to machine-readable datasets (and rightfully so, I would say), it would be very helpful to explain what this means, perhaps best using a few examples.
In practical terms: Many of these materials science efforts provide a HTML form, which connects to a database and spits out another HTML page with search results (possibly paginated). Should this count as machine readable?
In principle, of course, all information made available in digital form can be considered machine readable, but then we can drop the requirement in the first place.
In my view:
What did you have in mind?
In the end, perhaps it is best to drop the requirement and rather put something like a FAIR sticker (or similar) to those entries that actually make it easy to query the data automatically.
The text was updated successfully, but these errors were encountered: