# Globalization of Science - data, calculation and preparing for web

## Phase 1: Downloading Scopus data

* Currently not working due to API request issues

* Outputs into `180802_1611_AllJournals_ArReCp_2001_2017.sqlite` DB, that is available in the root directory

* here to verify python code rather than real downloading, that takes two weeks to proceed

* Requires API KEY with sufficient limit ( > 500 000 requests ) - special permit from Scopus

In [None]:
MY_API_KEY = 'e3fd43198781e92e0e07b7f543064003'
from DownloadData import Download
Download.downloadAll(MY_API_KEY)

## Phase 2: Calculate raw globalizations

from the journal level data in sqlite DB (see above) calculates Globalizations for all countries, in all years in the disciplines in particular level (4 top disciplines + All => 'TOP'; 27 narrow disciplines => 'bottom')


* outputs into csv files specified in the `topPath` and `botPath`

* The results are already available in the following files:

    1. Narrow disciplines: 20181218_AllFieldsCountriesMethods_bot_all.csv
    
    2. Top disciplines: 20181218_AllFieldsCountriesMethods_TOP.xlsx

In [None]:
topPath = '20190415_AllFieldsCountriesMethods_TOP.csv'
botPath = '20190415_AllFieldsCountriesMethods_bot.csv'

from CalculateGlobalization import InternationalityCalculations as calc
calc.CalculateEverything(topPath,'TOP')
calc.CalculateEverything(botPath,'bottom')

## Phase 3: Transforms data for web

Transforms raw globalization data so that it can be used in the interactive application

* the data for the database are saved into the `AWS_Import` directory. These should be subsequently imported to the database

* the data for dropdown lists in the application are stored in `controls_data.js` in the root directory. This file should be copied to `public/javascripts` 



In [2]:
from TransformToWeb import transform
topData = '20181218_AllFieldsCountriesMethods_TOP.xlsx' 
bottomData = '20181218_AllFieldsCountriesMethods_bot_all.csv'
additionalData = 'populateAmazon.xlsx'
csvDir = 'InteractiveWeb/DataForDB/'
ddlPath = 'controls_data.js'

transform.processDataForWeb(topData,bottomData,additionalData,csvDir,ddlPath)

  mask |= (ar1 == a)


Data from previous calculation succesfully loaded ...
Excluded all globalizations from countries and disciplines that contribute to less than 30 journals ...
Succesfully calculated group averages ...


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Results normalized between 0 and 1
Data for database saved in the InteractiveWeb/DataForDB/ directory
Data for dropdown lists in interactive webpage saved into controls_data.js
Processing for web finished!


## Phase 4: Import DB and deploy server

srvAdr = `ec2-18-188-88-0.us-east-2.compute.amazonaws.com`
srvUser = `ubuntu`
sshKey = ''

dbAdr = `science-internationality-dbinstance.c3aa5fkeiz2h.us-east-2.rds.amazonaws.com:5432`
dbUsr = `root`
dbPass = `IDEA_Science2018`


Postup:

1) obsah slozky `InteractiveWeb` dostan na ten server

2) naimportuj databazi (psql prikazy v `InteractiveWeb/DataForDB` )

3) Set up node.js, install dependencies

3) Spust prismu pomoci obsahu slozky v `InteractiveWeb/prisma` a `prisma.yml` a db info je predpokladam taky v `docker-compose`

4) run `node bin/www > stdout.txt 2>stderr.txt  &` in root directory


## App description

* Backend runs on AWS EC2 instance ubuntu@ec2-18-188-88-0.us-east-2.compute.amazonaws.com

* Backend communicates with AWS RDS instance with PostgresSQL database science-internationality-dbinstance.c3aa5fkeiz2h.us-east-2.rds.amazonaws.com

* Communication between EC2 and RDS is chanelled via two channels - `Prisma` and `pg` module in node.js - see `main/routes` directory


### Backend
* Main routes are decribed in the `main/routes` directory. There are only two post-requests routes

1) POST route on address  `/prisma` serves the prisma route

2) POST route on address `/map`  channels map data via `pg` module in node.js

3) Frontend is stored in the `main/public` directory

4) Node.js server is set up in the `main/bin/www` file

Beckend requirements: 
* Node.js
* npm 
* psql
* prisma
* docker 
* pg module
*

### Setting Postgres and Amazon RDS

1) Postgres is hosted on Amazon RDS science-internationality-dbinstance.c3aa5fkeiz2h.us-east-2.rds.amazonaws.com


## Importing a database

1) Run `CalculateEverything` in the `InternationalityIndex.InternationalityCalculations.py`

2) Copy the output xlsx file in the same folder as this notebook.

3) Edit the first two rows in the following cell and run all cells in the notebook

4) After finishing computation, copy all files from the AWS_Import directory using WinSCP

    a. Connect to AWS EC2 IDEA (ubuntu@ec2-18-188-88-0.us-east-2.compute.amazonaws.com)
    
    b. Copy csv files from the `AWS_Import` directory to `\home\ubuntu\db-admin\csv`
    
5) Using Putty, run the import to Postgres

    a. Connect to AWS EC2 IDEA (ubuntu@ec2-18-188-88-0.us-east-2.compute.amazonaws.com)
    
    b. Go to `db-admin` directory
    
    c. run: `psql --host=science-internationality-dbinstance.c3aa5fkeiz2h.us-east-2.rds.amazonaws.com --port=5432 --username=root --password --dbname=scienceInternationalitydb -f drop_generate_schema.sql`
    
    d. run: `psql --host=science-internationality-dbinstance.c3aa5fkeiz2h.us-east-2.rds.amazonaws.com --port=5432 --username=root --password --dbname=scienceInternationalitydb -f psql-import-csvs.txt`
    
    
In case of problems check
    a. Variable names - from the original excel in additionalData, through the table schema in drop_generate_schema.sql to variable names in psql-import-csvs.txt
    
    b. Data validity in CSVs.
    
    c. Also prisma query in fetcher.js should contain valid variable names! If they change, prisma should be rerun as follows:
        1. docker-compose down
        2. change the datamodel.yml
        3. docker-compose up -d prisma
        4. prisma deploy