# MAG Sample: Get Partner Data

## Prerequisites

Complete these tasks before you begin this tutorial:

- Setting up provisioning of Microsoft Academic Graph to an Azure blob storage account. See [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning).
- Setting up Azure Databricks service. See [Set up Azure Databricks](https://docs.microsoft.com/academic-services/graph/get-started-setup-databricks).

## Gather the information

Before you begin, you should have these items of information:

- The name of your Azure Storage (AS) account containing MAG dataset from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The access key of your Azure Storage (AS) account from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The name of the container in your Azure Storage (AS) account containing MAG dataset.

## Import notebooks

- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) samples/pyspark/MagClass.py in MAG dataset under your working folder.
- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) this notebook under the same folder.

### Initialize storage account and container details

  | Variable  | Value | Description  |
  | --------- | --------- | --------- |
  | AzureStorageAccount | Replace **`<AzureStorageAccount>`** | This is the Azure Storage account containing MAG dataset. |
  | AzureStorageAccessKey | Replace **`<AzureStorageAccessKey>`** | This is the Access Key of the Azure Storage account. |
  | MagContainer | Replace **`<MagContainer>`** | This is the container name in Azure Storage account containing MAG dataset, usually in the form of mag-yyyy-mm-dd. |
  | OutputContainer | Replace **`<OutputContainer>`** | This is the container name in Azure Storage account where the output goes to, this container needs to be created before running this script. |

In [0]:
AzureStorageAccount = '<AzureStorageAccount>'
AzureStorageAccessKey = '<AzureStorageAccessKey>'
MagContainer = '<MagContainer>'
OutputContainer = '<OutputContainer>'

### Define MicrosoftAcademicGraph class

Run the MagClass notebook to define MicrosoftAcademicGraph class.

In [0]:
%run "./MagClass"

### Create a MicrosoftAcademicGraph instance to access MAG dataset
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer.

In [0]:
mag = MicrosoftAcademicGraph(account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer)

### Create a AzureStorageUtil to access other Azure Storage files
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer.

In [0]:
asu = AzureStorageUtil(account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer)

### Load data previousely generated

In [0]:
# Get all paper details for the input organization from previous output
orgPapers = asu.load('Affiliation/Paper.tsv')

# Get all Paper-Author-Affiliation relationships for the input organization from previous output
orgPaperAuthorAffiliation = asu.load('Affiliation/PaperAuthorAffiliationRelationship.tsv')

# Get all paper-author-affiliation relationships
paperAuthorAffiliations = mag.getDataframe('PaperAuthorAffiliations')

# Get all affiliation details
affiliations = mag.getDataframe('Affiliations')

# Get all author details
authors = mag.getDataframe('Authors')

### Get partner Paper-Author-Affiliation relationships

In [0]:
# Get all Paper-Author-Affiliation relationships for papers published by the input organization
orgAllPaperAuthorAffiliations = paperAuthorAffiliations \
    .join(orgPapers, paperAuthorAffiliations.PaperId == orgPapers.PaperId, 'inner') \
    .select(orgPapers.PaperId, paperAuthorAffiliations.AuthorId, \
            paperAuthorAffiliations.AffiliationId, paperAuthorAffiliations.AuthorSequenceNumber)

# Get partner Paper-Author-Affiliation relationships by excluding those relationships of the input organization
partnerPaperAuthorAffiliation = orgAllPaperAuthorAffiliations.subtract(orgPaperAuthorAffiliation)
display(partnerPaperAuthorAffiliation.head(5))

PaperId,AuthorId,AffiliationId,AuthorSequenceNumber
2760379373,2136372366,126520041,5
2760406857,2740302016,35440088,2
2760379373,2726568720,126520041,1
2760406857,2563709617,35440088,1
2760406857,2776066402,35440088,4


### Save PartnerPaperAuthorAffiliationRelationship.tsv

In [0]:
asu.save(partnerPaperAuthorAffiliation, 'Affiliation/PartnerPaperAuthorAffiliationRelationship.tsv')

### Get partner affiliations

In [0]:
# Get all partner affiliation Ids
partnerAffiliationIds = partnerPaperAuthorAffiliation \
    .where(partnerPaperAuthorAffiliation.AffiliationId.isNotNull()) \
    .select(partnerPaperAuthorAffiliation.AffiliationId) \
    .distinct()

# Get all partner affiliation details
partnerAffiliations = affiliations \
    .join(partnerAffiliationIds, affiliations.AffiliationId == partnerAffiliationIds.AffiliationId, 'inner') \
    .select(partnerAffiliationIds.AffiliationId, affiliations.DisplayName.alias('AffiliationName'))

display(partnerAffiliations.head(5))

AffiliationId,AffiliationName
61057504,Fujian Agriculture and Forestry University
8408910,Cardiff Metropolitan University
177156846,South Dakota State University
145608581,University of Miami
161103922,University of Marburg


### Save PartnerAffiliation.tsv

In [0]:
asu.save(partnerAffiliations, 'Affiliation/PartnerAffiliation.tsv')

### Get partner authors

In [0]:
# Get all partner author Ids
partnerAuthorIds = partnerPaperAuthorAffiliation.select(partnerPaperAuthorAffiliation.AuthorId).distinct()
partnerAuthors = authors.join(partnerAuthorIds, partnerAuthorIds.AuthorId == authors.AuthorId) \
                        .select(partnerAuthorIds.AuthorId, authors.DisplayName.alias('AuthorName'))

display(partnerAuthors.head(5))

AuthorId,AuthorName
1998877,Max Wintermark
16214300,Yassin Refahi
20502297,Iadine Chadès
36625720,David Bodoff
166615085,Nico Görnitz


### Save PartnerAuthor.tsv

In [0]:
asu.save(partnerAuthors, 'Affiliation/PartnerAuthor.tsv')