Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore the creation of graphical timelines for donors, donees, and donor-donee pairs based on intended_funding_timeframe_in_months #125

Closed
vipulnaik opened this issue Sep 17, 2019 · 24 comments
Assignees

Comments

@vipulnaik
Copy link
Owner

@vipulnaik vipulnaik commented Sep 17, 2019

I'm thinking of the kind of timeline where the time is plotted horizontally, and then we have bars for the time periods that various grants are for.

I think this is most helpful for donor-donee pairs, e.g., https://donations.vipulnaik.com/donorDonee.php?donor=Open+Philanthropy+Project&donee=Machine+Intelligence+Research+Institute In particular, it allows us to visually check if there is any "gap" in funding (a time period when Open Phil wasn't funding MIRI) and if there are periods of double-funding (a second grant was made while an existing one was still within timeframe).

We'll have to figure out how to deal with grants with no intended_funding_timeframe_in_months. We could show them as dots, or something like dot with a question mark to indicate unknown timeframe.

Here's a super-complicated graphical timeline; I'm obviously looking for something similar (in particular, the various bars won't be connected to or branching out from one another).

https://commons.wikimedia.org/wiki/File:Timeline_of_web_browsers.svg

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Sep 17, 2019

Here's a super-complicated graphical timeline

Was this part supposed to link to something?

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Sep 18, 2019

@riceissa -- oops! I edited to add the link

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Sep 18, 2019

The browser timeline reminded me of https://en.wikipedia.org/wiki/Ubuntu_version_history#Version_timeline

I'm wondering if you're imagining the vertical dimension being used for anything (e.g. maybe each new grant gets bumped up to its own level, like the Ubuntu releases), or if you're imagining just a 1-dimensional plot with the grant timeframes marked as intervals on the line.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Sep 19, 2019

@riceissa great question!

There are a few ways we could use the vertical dimension:

  • For cases where we are doing a plot for a single donor and multiple donees, or a single donee and multiple donors, the vertical dimension could store the identity of the donee (respectively, donor). However, we still have a problem of the same donor making overlapping time period grants
  • For single donor/donee pairs, I don't have good ideas yet (without adding more metadata to individual grants, that is).

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Sep 19, 2019

Here is a quick and dirty version https://gist.github.com/riceissa/016664523d6edc72e83649220830eeec

[ETA: It looks like the large number of donors/donees is a problem (the plot labels just become unreadable if we do e.g. every single Open Phil donee for single_donor_multiple_donees). So we need to either restrict the number of donors/donees (which is what I did by filtering for cause area or amount) or we need to give up labeling who owns which line.]

Example plots

single_donor_single_donee("Open Philanthropy Project", "Machine Intelligence Research Institute")

Screenshot at 2019-09-18 22-17-50

single_donor_single_donee("Open Philanthropy Project", "Center for Applied Rationality")

Screenshot at 2019-09-18 22-18-17

single_donor_multiple_donees("Open Philanthropy Project") (note: this is restricted to AI safety cause area)

Screenshot at 2019-09-18 22-18-45

single_donee_multiple_donors("Machine Intelligence Research Institute") (note: this is restricted to donations over 100,000)

Screenshot at 2019-09-18 22-19-12

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Nov 9, 2019

@riceissa, I think we should go ahead with this. Here are my thoughts regarding what versions to go ahead with:

  • single_donor_single_donee looks perfect -- use as is
  • single_donor_multiple_donees -- restrict to top 10 donees within the filters (so e.g. if the page is using a cause area filter, restrict to donations within that cause area filter)
  • single_donee_multiple_donors -- restrict to top ten donors

I'm coming up with the number 10 arbitrarily, but you can choose a slightly different number based on your experimentation. I think capping by number rather than amount threshold (of amount paid by donor to donee) is better, because we may not find a good amount threshold that works consistently across donor-donee combinations.

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Jan 8, 2020

within the filters (so e.g. if the page is using a cause area filter, restrict to donations within that cause area filter)

This phrasing seems to imply that there are filters other than cause area, but I can't recall any other filters. Did you have something else in mind that the plot should handle?

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Jan 9, 2020

Ok, I'm done with this (modulo potential non-cause-area filters; see previous comment), and my work is in this branch: https://github.com/vipulnaik/donations/tree/timeframe-plot

Some notes:

  • you will need to create a python/login.py file with MySQL login info, because I decided to query MySQL directly from python (instead of doing the queries via PHP and writing to a CSV file like with our previous graphs)
  • you will need to set the variable $generateTimeframeGraphCmdBase in access-portal/backend/globalVariables/passwordFile.inc to point to the new python script
  • you may need to uncomment the import matplotlib/matplotlib.use('Agg') lines in the python script to get plotting working on your server
  • I am currently placing the plots under the full donations list section on each page, but you may prefer a different spot or an entirely new section

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Jan 21, 2020

you will need to create a python/login.py file with MySQL login info, because I decided to query MySQL directly from python (instead of doing the queries via PHP and writing to a CSV file like with our previous graphs)

What's the content of this login.py supposed to be? Do you have a sample somewhere (a dummy file) @riceissa? Sorry if I missed this; I didn't see it in the repo or in the comments on this GitHub issue.

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Jan 21, 2020

you will need to create a python/login.py file with MySQL login info, because I decided to query MySQL directly from python (instead of doing the queries via PHP and writing to a CSV file like with our previous graphs)

What's the content of this login.py supposed to be? Do you have a sample somewhere (a dummy file) @riceissa? Sorry if I missed this; I didn't see it in the repo or in the comments on this GitHub issue.

It should look like:

USER = "yourname"
DATABASE = "donations"
PASSWORD = "secret"

(The documentation for this is printed to stderr if you run timeframe_plot.py without a login.py file present. I guess I should have included it in my comment too.)

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 15, 2020

Thanks @riceissa, this is awesome!!! I am currently running this code on donations.vipulnaik.com (even though it is not yet merged to master). I had to commit a small fix to make it work (adding an extra space). I also made a PR #131 out of your branch so that it's easier to explore the diff on GitHub.

I have a couple of pieces of feedback:

  1. Is it possible to add a unit of margin at the top and bottom? Currently the top and bottom lines graze the axes and it's hard to read them. For instance, https://donations.vipulnaik.com/images/5d75ab304db048e4c7f2778e0b31a451-timeframe.png (Open Phil as donor), https://donations.vipulnaik.com/images/924c43104d9cb7f646fc8e8de00cd907-timeframe.png (MIRI as donee), https://donations.vipulnaik.com/images/9adcb7b3e0b7c3e6d51a2a31f041dfa7-timeframe.png (Open Phil to MIRI)
  2. I think there's something fishy with the Open Phil graph https://donations.vipulnaik.com/images/5d75ab304db048e4c7f2778e0b31a451-timeframe.png particularly the Johns Hopkins line that goes all the way to 2024, even though there is no grant going all that far? Can you check?

9adcb7b3e0b7c3e6d51a2a31f041dfa7-timeframe
924c43104d9cb7f646fc8e8de00cd907-timeframe
5d75ab304db048e4c7f2778e0b31a451-timeframe

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Dec 16, 2020

@vipulnaik Can you please give the URLs/queries that produced those plots? I can't reproduce the error without seeing which plots I am supposed to be looking at in my local DLW instance.

For example at http://localhost:8000/donor.php?donor=Open+Philanthropy+Project#donorDonationList I see the following, which looks fine to me:

image

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 17, 2020

Ah @riceissa this brings me to another bug similar to #121; I think the key being used to hash doesn't have the right incorporation of cause area filters. That's why you are seeing a completely different list of top donees than the true, canonical one (I think you are seeing a list filtered to the AI safety cause area). So please first fix that bug, using thte idea of #121 as a guide.

For a simpler example of a donor with much fewer donees (and therefore less subject to the bug of the preceding para) see https://donations.vipulnaik.com/donor.php?donor=Vipul+Naik#donorDonationList

https://donations.vipulnaik.com/images/ae1d3d74a5d4f2679e251d384f61393b-timeframe.png

ae1d3d74a5d4f2679e251d384f61393b-timeframe

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Dec 19, 2020

@vipulnaik

I think there's something fishy with the Open Phil graph https://donations.vipulnaik.com/images/5d75ab304db048e4c7f2778e0b31a451-timeframe.png particularly the Johns Hopkins line that goes all the way to 2024, even though there is no grant going all that far? Can you check?

I'm really confused by this part.

https://www.openphilanthropy.org/focus/global-catastrophic-risks/biosecurity/johns-hopkins-center-health-security-masters-phd-program-support has 48 months (=4 years) starting in early 2020, so that grant should go to early 2024, which is exactly what is shown on the graph.

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Dec 19, 2020

Is it possible to add a unit of margin at the top and bottom? Currently the top and bottom lines graze the axes and it's hard to read them.

I'm not able to reproduce this bug. Here's what it looks like on my machine:

image

Since I can't reproduce it, this makes it pretty annoying to debug and fix. I think there must be something special about your version of matplotlib or something.

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Dec 19, 2020

I had to commit a small fix to make it work (adding an extra space).

Sorry about that. It turns out I had a space at the end of my $generateTimeframeGraphCmdBase so the bug wasn't affecting me.

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Dec 19, 2020

Ah @riceissa this brings me to another bug similar to #121; I think the key being used to hash doesn't have the right incorporation of cause area filters. That's why you are seeing a completely different list of top donees than the true, canonical one (I think you are seeing a list filtered to the AI safety cause area). So please first fix that bug, using thte idea of #121 as a guide.

I was actually loading a subset of the SQL files so didn't have all of the grants in the database when generating the plot. I get the same list of grantees now so I don't think the bug you are talking about was the cause.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 19, 2020

@riceissa:

I'm really confused by this part

I was comparing the graphical timeline against the data shown on the page below it. This masters and Ph.D. grant showed as having a timeframe of 1 month, not 48 months, hence my confusion. However, after your poke, I checked and found that the bug was in the table display! Specifically, a missing paren. I've fixed that now: 5c0d4ee

So at least as far as I can make out the durations in the graphical timeline display are correct.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 19, 2020

@riceissa:

I was actually loading a subset of the SQL files so didn't have all of the grants in the database when generating the plot. I get the same list of grantees now so I don't think the bug you are talking about was the cause.

Nonetheless, that bug does exist. Just now, I repro'd it by following analogous steps to those in #121 (comment) so it was a fortuitous accident that it got surfaced here.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 19, 2020

@riceissa I think the bug can be fixed by tweaking these two lines https://github.com/vipulnaik/donations/pull/131/files#diff-9cb3f3c9dbff485308195cdc97afac16ab50fa087cfd9f68816aca98cd3eb283R19-R20 to include cause area filter in the hash key -- do you want to give it a try? If not, I should be able to do it too.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Dec 20, 2020

@riceissa: I just merged master into the branch, FYI.

vipulnaik added a commit that referenced this issue Jun 4, 2021
vipulnaik added a commit that referenced this issue Jun 4, 2021
* Add initial timeframe plotting code

* Rename to timeframe_plot.py

* Add arguments

* Pass in all the arguments

* Add timeframe plot for donorDonee

* Fix query for single_donor_multiple_donees plot

* Fix query for single_donee_multiple_donors

* Add timeframe graph python command

* Add timeframe plots to the website code

* Use better input to hash for filename

* Encode arguments to python using base64

* Uncomment import statements

* Add space before '--base64' so that graphical timeline generation actually works

* Fix bug noted in #125 (comment)

Co-authored-by: Issa Rice <riceissa@gmail.com>
@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Jun 4, 2021

I fixed the bug and merged the PR (#131) into master so you can cross this out from your TODO list @riceissa!

@riceissa
Copy link
Collaborator

@riceissa riceissa commented Jun 9, 2021

Is it possible to add a unit of margin at the top and bottom? Currently the top and bottom lines graze the axes and it's hard to read them.

This was bugging me so I looked into it more, and you might be able to fix it by using the following line:

plt.margins(y=0.1)

You can put this line right before the plt.savefig calls (there should be three such occurrences in the timeframe_plot.py file). You might need to adjust the parameter from 0.1 to something else to get a good spacing.

@vipulnaik
Copy link
Owner Author

@vipulnaik vipulnaik commented Jun 10, 2021

Thanks @riceissa! This worked so I pushed the change.

@vipulnaik vipulnaik changed the title Explore the creation of graphical timeline for donors, donees, and donor-donee pairs based on intended_funding_timeframe_in_months Explore the creation of graphical timelines for donors, donees, and donor-donee pairs based on intended_funding_timeframe_in_months Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants