# MAG Sample: Extract Affiliation

## Prerequisites

Complete these tasks before you begin this tutorial:

- Setting up provisioning of Microsoft Academic Graph to an Azure blob storage account. See [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning).
- Setting up Azure Databricks service. See [Set up Azure Databricks](https://docs.microsoft.com/academic-services/graph/get-started-setup-databricks).

## Gather the information

Before you begin, you should have these items of information:

- The name of your Azure Storage (AS) account containing MAG dataset from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The access key of your Azure Storage (AS) account from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The name of the container in your Azure Storage (AS) account containing MAG dataset.

## Import notebooks

- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) samples/pyspark/MagClass.py in MAG dataset under your working folder.
- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) this notebook under the same folder.

### Initialize storage account and container details

  | Variable  | Value | Description  |
  | --------- | --------- | --------- |
  | AzureStorageAccount | Replace **`<AzureStorageAccount>`** | This is the Azure Storage account containing MAG dataset. |
  | AzureStorageAccessKey | Replace **`<AzureStorageAccessKey>`** | This is the Access Key of the Azure Storage account. |
  | MagContainer | Replace **`<MagContainer>`** | This is the container name in Azure Storage account containing MAG dataset, usually in the form of mag-yyyy-mm-dd. |
  | OutputContainer | Replace **`<OutputContainer>`** | This is the container name in Azure Storage account where the output goes to, this container needs to be created before running this script. |

In [0]:
AzureStorageAccount = '<AzureStorageAccount>'
AzureStorageAccessKey = '<AzureStorageAccessKey>'
MagContainer = '<MagContainer>'
OutputContainer = '<OutputContainer>'

### Define MicrosoftAcademicGraph class

Run the MagClass notebook to define MicrosoftAcademicGraph class.

In [0]:
%run "./MagClass"

### Create a MicrosoftAcademicGraph instance to access MAG dataset
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer.

In [0]:
mag = MicrosoftAcademicGraph(account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer)

### Create a AzureStorageUtil to access other Azure Storage files
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer.

In [0]:
asu = AzureStorageUtil(account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer)

#### Load Affiliations data

In [0]:
affiliations = mag.getDataframe('Affiliations')

# Peek the result
display(affiliations.head(5))

AffiliationId,Rank,NormalizedName,DisplayName,GridId,OfficialPage,WikiPage,PaperCount,PaperFamilyCount,CitationCount,Iso3166Code,Latitude,Longitude,CreatedDate
20455151,9877,air liquide,Air Liquide,grid.476009.c,https://web.archive.org/web/20100205175402/http://airliquide.com/en/home.html,http://en.wikipedia.org/wiki/Air_Liquide,7692,5632,60056,GB,52.50359344482422,-1.8051600456237795,2016-06-24
24386293,13932,hellenic national meteorological service,Hellenic National Meteorological Service,,http://www.hnms.gr/hnms/english/index_html,http://en.wikipedia.org/wiki/Hellenic_National_Meteorological_Service,86,86,1975,GR,37.97613906860352,23.736400604248047,2016-06-24
32956416,12969,catholic university of the west,Catholic University of the West,grid.448708.7,http://www.uco.fr/,http://en.wikipedia.org/wiki/Catholic_University_of_the_West,363,354,4316,FR,47.4647216796875,-0.5486099720001221,2016-06-24
35926432,11668,mackay medical college,Mackay Medical College,grid.452449.a,http://www.mmc.edu.tw/,http://en.wikipedia.org/wiki/Mackay_Medical_College,1510,1504,14671,TW,25.25436019897461,121.49508666992188,2016-06-24
37448385,11875,chinese people s public security university,Chinese People's Public Security University,,http://www.ppsuc.edu.cn/,http://en.wikipedia.org/wiki/People's_Public_Security_University_of_China,1792,1786,2613,CN,39.90468978881836,116.40717315673828,2016-06-24


#### Extract AffiliationId and DisplayName for Microsoft

In [0]:
# Extract the AffiliationId for Microsoft
microsoft = affiliations.where(affiliations.NormalizedName == 'microsoft').select(affiliations.AffiliationId, affiliations.DisplayName)

# Peek the result
display(microsoft)

AffiliationId,DisplayName
1290206253,Microsoft


#### Save Affiliation.tsv

In [0]:
# Output result
asu.save(microsoft, 'Affiliation/Affiliation.tsv', coalesce=True)