Skip to content

ubc-library-rc/future-waters-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Future Waters Visualizations

This project details technologies, scripts, and workflow for visualizing data about a research cluster using Open Source Technologies.

Most of the data is publicly available at Wikidata and we make use of Crossref, Open Refine, and Scholia to gather publication data about a list of authors, upload the data to Wikidata, and visualize it, respectively.


  • Sample visualization of topics investigated by cluster members
  • Sample visualization of relationships between topics and cluster members
  • Sample visualization of publications per year per author for the research cluster

Setup

  1. Download and install docker
  2. Download and install Python 3.7 or higher
    • You only need python locally for small tasks such as clearing cached data
  3. Download and install OpenRefine, preferably the stable version --- OpenRefine 3.3

Instructions

  1. Run the data-gathering scripts to fetch data
  2. Upload and clean the data with Open Refine
  3. View visualizations via the future-waters-viz Docker container

Docker

The bulk of the project is available in a self-contained environment, namely a Docker container. Instructions on running docker are available below and also on the python scripts.

Data Gathering

  1. Input: your input should be a cluster-members.csv file that must be copied in the /data-gathering/resources folder

An example for key columns in your csv is presented below:


Full Name Affiliation Position Department Faculty Campus wikidata
Ali Ameli University of British Columbia Assistant Professor Earth Ocean and Atmospheric Sciences Sciences Vancouver
Alice Guimaraes University of British Columbia PhD Student Norman B Keevil Institute of Mining Engineering Applied Sciences Vancouver Q27980222
Gunilla Öberg University of British Columbia Professor Institute for Resources Environment and Sustainability Sciences Vancouver
John Janmaat University of British Columbia Associate Professor Economics,Philosophy and Political Science Arts Okanagan

Note that the scripts are case sensitive and the input columns must match the ones provided in the example


  1. Build the base docker container running:
cd data-gathering
docker build -t libraryrc/future-waters .

  1. Run the container

First get the path where you downloaded the project

pwd

The output will be something similar to /home/msarthur/Workspace/future-waters-project

Update the path in the volume argument in the command below, e.g.: -v /home/msarthur/Workspace/future-waters-project/resources:/tmp/src/resources

docker run --name=future-waters -v !!your path!!/resources:/tmp/src/resources libraryrc/future-waters

For example, for the output path that I got, the volume path should read:

docker run --name=future-waters -v /home/msarthur/Workspace/future-waters-project/data-gathering/resources:/tmp/src/resources libraryrc/future-waters

IMPORTANT The results from the scripts will be under the resources folder (in various subfolders).


  1. Change file permissions at output folder running the following command
sudo chown -R $USER:$USER resources

  1. Running the container after its creation

You need to remove previous named containers with the future-waters identifier. Run

docker rm future-waters && docker run --name=future-waters -v !!your path!!:/tmp/src/resources libraryrc/future-waters

For example:

docker rm future-waters && docker run --name=future-waters -v /home/msarthur/Workspace/future-waters-project/data-gathering/resources:/tmp/src/resources libraryrc/future-waters

  1. Important

If there are updates on the python scripts, you must build a new image to reflect these changes on the container. To rebuild the entire pipeline, run:

docker rm future-waters &&
docker build -t libraryrc/future-waters . &&
docker run --name=future-waters -v :/tmp/src/resources libraryrc/future-waters

For example:

docker rm future-waters && \
docker build -t libraryrc/future-waters . && \
docker run --name=future-waters -v /home/msarthur/Workspace/future-waters-project/data-gathering/resources:/tmp/src/resources libraryrc/future-waters

Data Visualization

IMPORTANT all other docker commands are executed inside a specific folder. This command should run in the project root folder

  1. Build the container
docker build -t libraryrc/future-waters-viz .

  1. Run the container
docker run --name=future-waters-viz -p 8100:8100  libraryrc/future-waters-viz

  1. Subsequent executions:
docker rm future-waters-viz && \
docker run --name=future-waters-viz -p 8100:8100  libraryrc/future-waters-viz

  1. Helpful for development environment:

Remove last container, build and run new version in a single command

docker rm future-waters-viz && \
docker build -t libraryrc/future-waters-viz . && \
docker run --name=future-waters-viz -p 8100:8100  libraryrc/future-waters-viz

  1. Check visualizations on your local browser at http://localhost:8100

    • Find details on how we create each visualization here

Troubleshooting

  • Check some known issues in the project GitHub Web page

  • In case you encounter problems trying to replicate this project, please submit a new issue. When submitting an issue, maintainers would appreciate if you could disclose:

    • What is your Operating system?

    • What version of Python do you have installed?

    • Is there a stack trace or error log in the application console?