define machine-readable? #2

ltalirz · 2018-02-16T19:08:28Z

Since you are restricting the list to machine-readable datasets (and rightfully so, I would say), it would be very helpful to explain what this means, perhaps best using a few examples.

In practical terms: Many of these materials science efforts provide a HTML form, which connects to a database and spits out another HTML page with search results (possibly paginated). Should this count as machine readable?
In principle, of course, all information made available in digital form can be considered machine readable, but then we can drop the requirement in the first place.

In my view:

if the whole database can be downloaded, basically in whatever format, it's machine readable
if there is documented API for automated requests, it's machine readable
if there is just a web form that allows to query the database... it kind of makes things unnecessarily difficult

What did you have in mind?

In the end, perhaps it is best to drop the requirement and rather put something like a FAIR sticker (or similar) to those entries that actually make it easy to query the data automatically.

blokhin · 2018-02-17T00:06:08Z

Totally agree and support your point of view. I'd although add the following point extra:

if the authors provide their dataset in full privately (e.g. being unable to implement any APIs)

blokhin · 2018-02-17T00:08:23Z

Well but that's basically your first point. The only difference is in the public statement.

ltalirz · 2018-02-17T21:32:14Z

Well, even if a dataset is proprietary, this does not prevent one from implementing a (access-restricted) API.
But even if such an API is not present, if the whole database can be downloaded that's fine from my point of view.

How should we proceed? Should I make a pull request?
Perhaps I would rename "contributing" to "guidelines" and include a section there describing the "machine-readable" part.

And would you like to keep "machine-readable" as a basic requirement or would you rather provide a "machine-friendly" sticker that highlights those entries which make an effort to be machine-readable?

blokhin · 2018-02-18T00:43:30Z

Let's keep the machine-readable criterion as a basic requirement? I think, it is crucial. On top of that, to my knowledge, all those mentioned datasets are (or were) investigated with the data science methods.

ltalirz · 2018-02-18T00:58:36Z

Let's keep the machine-readable criterion as a basic requirement

Fine!

to my knowledge, all those mentioned datasets are (or were) investigated with the data science methods

Here it is not really clear to me what this means...

Some of the databases in the list can be downloaded, so that's fine. Some may have documented APIs for automated querying. But several also don't or am I missing something?
What about Zeolite Structures Database, WURM, phonon database, NREL, ...
I guess you can reverse-engineer the web forms quite easily, but where does one draw the line?

In essence, what I am looking for is the set of criteria that led you to the choice of the databases in the list (so that I know how to add to it).

blokhin · 2018-02-18T01:11:41Z

OK, let me try to formulate...

…e answer on the mass downloads for data mining (relevant for #2)

blokhin · 2018-03-10T13:54:11Z

@ltalirz I thought on your suggestion and ended up with the following. Any database is machine-readable by design. Only the access policies matter (and they aren't necessarily FAIR!). For instance, upon a private agreement, one may be granted an unrestricted access to a conservative, otherwise HTML-only data source.

After contacting some of the uncertain participants of my list, I received explicit or implicit requests for deletion. So why shouldn't we follow the canary principle? We just include anything we know was or would be of use for the mentioned or similar software frameworks and delete immediately by request.

ltalirz · 2018-03-10T14:17:50Z

Any database is machine-readable by design. Only the access policies matter (and they aren't necessarily FAIR!).

Agreed.

For instance, upon a private agreement, one may be granted an unrestricted access to a conservative, otherwise HTML-only data source.

We just include anything we know was or would be of use for the mentioned or similar software frameworks and delete immediately by request.

Do I understand correctly that you are proposing to include any potentially useful database, as long as they do not explicitly state (publicly or to us) that they are not open for machine-based data mining?
I think this is a reasonable approach.

In this case, however, I would suggest two things:

Define a set of symbols (can even by just words for the moment) that identify for each entry of the list its data-mining openness (free / commercial / unknown)
somewhere (doesn't need to be on the main page) keep the list of databases that have explicitly been excluded (new proposals will be checked against this list)

blokhin · 2018-03-10T16:40:14Z

Great!

Define a set of symbols (can even by just words for the moment) that identify for each entry of the list its data-mining openness (free / commercial / unknown)

There's proprietary label already. Its absence assumes the data are open.

somewhere (doesn't need to be on the main page) keep the list of databases that have explicitly been excluded (new proposals will be checked against this list)

OK, makes sense.

ltalirz mentioned this issue Feb 18, 2018

category for simulation platforms? #3

Closed

blokhin added a commit that referenced this issue Feb 26, 2018

Contacted RRUFF, WURM, and Zeolites and received eventually a negativ…

114b5d5

…e answer on the mass downloads for data mining (relevant for #2)

blokhin closed this as completed Mar 10, 2018

ltalirz mentioned this issue Jan 3, 2019

icsd not machine-readable? #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

define machine-readable? #2

define machine-readable? #2

ltalirz commented Feb 16, 2018

blokhin commented Feb 17, 2018

blokhin commented Feb 17, 2018

ltalirz commented Feb 17, 2018

blokhin commented Feb 18, 2018

ltalirz commented Feb 18, 2018 •

edited

blokhin commented Feb 18, 2018

blokhin commented Mar 10, 2018

ltalirz commented Mar 10, 2018 •

edited

blokhin commented Mar 10, 2018

define machine-readable? #2

define machine-readable? #2

Comments

ltalirz commented Feb 16, 2018

blokhin commented Feb 17, 2018

blokhin commented Feb 17, 2018

ltalirz commented Feb 17, 2018

blokhin commented Feb 18, 2018

ltalirz commented Feb 18, 2018 • edited

blokhin commented Feb 18, 2018

blokhin commented Mar 10, 2018

ltalirz commented Mar 10, 2018 • edited

blokhin commented Mar 10, 2018

ltalirz commented Feb 18, 2018 •

edited

ltalirz commented Mar 10, 2018 •

edited