Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Fetch qualifiers #72

Open
wetneb opened this issue Jan 29, 2020 · 19 comments
Open

Fetch qualifiers #72

wetneb opened this issue Jan 29, 2020 · 19 comments

Comments

@wetneb
Copy link
Owner

wetneb commented Jan 29, 2020

There is currently no way to fetch qualifiers in the data extension API (or to refine during reconciliation). A syntax for such qualifiers should be picked and implemented.

@antoine2711
Copy link

Yes, that would be great. And we should be able to link the column for the fetch data in the WD schema for pushing back data.

@wetneb
Copy link
Owner Author

wetneb commented Jan 30, 2020

I am not sure what you mean by "link the column". Do you mean using column groups? I don't see how column groups can be relied on in the WD schema.

@antoine2711
Copy link

What I meant is that if I could query quantifiers and references, than, they can also be push back. This makes a round trip (get the data, fill the blanks, push the data back).

Now, this can't be done since quantifiers and references can't be imported before.

@hughlilly
Copy link

Would this be why I'm having this issue? Sorry if the terminology is off -- perhaps I should have said "qualifier" instead of "flag" in the subject line…

@wetneb
Copy link
Owner Author

wetneb commented Sep 9, 2021

No your issue is not linked to qualifiers - but it's also an interesting one, I replied there :)

@wetneb
Copy link
Owner Author

wetneb commented Nov 23, 2021

Use case mentioned here by @mshd:

I would like to reconcile Wikidata with a certain qualifier. Is it that possible, if not, could you implement it?

Exampl

Screenshot from 2021-11-23 14-47-56
Set qualifier property to North Sumatera III. or give me all people which ever had a candidacy at this district.

@Pluralog
Copy link

I would love it. In my usecase I have annual data like "total revenue" and without fetching qualifiers it's really difficult to update only those with no data from a certain year.

@wetneb
Copy link
Owner Author

wetneb commented Feb 16, 2022

Let me expand on the design questions that need to be resolved before this can be implemented.
This issue can be understood in multiple ways:

  1. I want to fetch the qualifier values on all statements of a given property. For instance: give me all the years for which the total revenue is available on Wikidata.
  2. I want to fetch the qualifier values on statements of a given property with a given value. For instance, give me the "member of political party" qualifier of the "candidacy in election: 2014 Indonesian People's Representative Council election" statement.
  3. I want to fetch main statement values, but select the ones I care about by specifying qualifier values. Example: give me the total revenue of this company in 2018 (so, filtering all "total revenue" statements to only keep the ones with a "point in time":2018 qualifier).
  4. I want to fetch "candidacy in election" statements, fetching simultaneously the main statement value and the qualifier values, representing them in OpenRefine with a record-like structure. This seems difficult to implement in a natural way with the current protocol.

Possible syntaxes we could add to support these use cases (where P3602 is candidacy in election, P1111 is votes received and P768 is electoral district):

  1. P3602#P1111 (all P1111 qualifiers on all P3602 statements)
  2. P3602=Q108816797#P1111 (all P1111 qualifiers on P3602=Q108816797 statements)
  3. P3602[P768=Q96984689] (all main statement values on P3602 statements with P768=Q96984689 qualifier)
  4. I do not see a clean way to implement this given the existing API.

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

@wetneb wetneb pinned this issue Feb 16, 2022
@Pluralog
Copy link

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

Looks good to me.

Only if the qualifiers are not Items themselves, case 3 could look more complicated. I.e. in case of point in time, which could just be the year, but sometimes is a certain data. In wikidata I would use FILTER for the qualifier. As a workaround we could use case 1 and do the filtering in Open Refine later.

@wetneb
Copy link
Owner Author

wetneb commented Feb 16, 2022

As a workaround we could use case 1 and do the filtering in Open Refine later.

The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.

@antoine2711
Copy link

  1. P3602[P768=Q96984689] (all main statement values on P3602 statements with P768=Q96984689 qualifier)

  2. I do not see a clean way to implement this given the existing API.

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

@wetneb : fine for 1. and 2. But why not P3602#P768=Q96984689 for 3.? And for 4.: why not Pxxx#*?

Regards, Antoine

@Pluralog
Copy link

As a workaround we could use case 1 and do the filtering in Open Refine later.

The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.

I thought it would only work in multiple steps. In my case (total revenue and point in time) I would try:

  1. fetch all point in time values for total revenue
  2. filter in Open Refine all point in time values between 2017-00-00 and 2018-00-00
  3. fetch all main statements for those

But you are right. It would only work if I could use the values of a column as qualifiers in my query.

@wetneb
Copy link
Owner Author

wetneb commented Feb 16, 2022

@antoine2711 for 4., the problem is not to find a syntax for it, but rather to see how it would fit in the protocol. At the moment, when the user requests a property, we can only return one column for it.

@wetneb
Copy link
Owner Author

wetneb commented Feb 16, 2022

I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine. After all, there is a ton more fields we are not exposing (ranks, references…) and it is unlikely we can find a satisfactory syntax to fetch all those fields, so it would be good to have this fallback option for power users.

It would still be more convenient than having to query the Wikibase API directly.

@antoine2711
Copy link

antoine2711 commented Feb 16, 2022

The problem with 1. is that it would only fetch the qualifier values, not the main statement values

Oh! I see @wetneb. So, the problem is bring the structure in OR? Why couldn't 2 columns be brought at the same time? I understand it requires creating rows at 2 levels, the outer statements and the inner qualifiers. But still, is that so complicated?

Also, OR has a (not very functional) grouping of column, like what you get from importing XML or JSON. Could that mechanism be reused?

I write that because, for me, in all 4 scenarii, I would like the statement value AND the qualifier's property AND the value of the qualifier's property.

Regards, Antoine

@wetneb
Copy link
Owner Author

wetneb commented Feb 16, 2022

All I can say is that I do not know how that should be implemented. Again, proposals and pull requests are welcome.

@antoine2711
Copy link

I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine.

That would be great in many ways. Because, we could expand the syntax to add @ and the source property, with the same logic.

For the access of that data, since all those query starts from a recon column, maybe add fields to the recon...

Or, in the new column, save the data as a new recondata object. It would save either recon or values, and the cell of the initial recon column (the element of the statement).

In the same logic, we could want to have columns of reconcialied property that could replace properties in the Wikidata schema.

So the recondata could have a type of statement value, statement property, qualifier property or qualifier value, source property, or source value.

Expanding this logic seams quite in phase with the wikibase généralisation (though another topic).

Sorry @wetneb and the others if I am OT with too much OpenRefine, it's just here the two are so link/dependant of each other in my view.

Regards,
Antoine

@trnstlntk
Copy link

I have just received a request via email from another user who would find this very helpful.

It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.

@VojtechDostal
Copy link

I have just received a request via email from another user who would find this very helpful.

It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.

That user is me :-) .
I like @wetneb 's solution to enable loading full statement JSONs. This would solve many possible feature requests in one go :)

@wetneb wetneb unpinned this issue Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants