# MAG Sample: Get Venues

## Prerequisites

Complete these tasks before you begin this tutorial:

- Setting up provisioning of Microsoft Academic Graph to an Azure blob storage account. See [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning).
- Setting up Azure Databricks service. See [Set up Azure Databricks](https://docs.microsoft.com/academic-services/graph/get-started-setup-databricks).

## Gather the information

Before you begin, you should have these items of information:

- The name of your Azure Storage (AS) account containing MAG dataset from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The access key of your Azure Storage (AS) account from [Get Microsoft Academic Graph on Azure storage](https://docs.microsoft.com/academic-services/graph/get-started-setup-provisioning#note-azure-storage-account-name-and-primary-key).
- The name of the container in your Azure Storage (AS) account containing MAG dataset.

## Import notebooks

- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) samples/pyspark/MagClass.py in MAG dataset under your working folder.
- [Import](https://docs.databricks.com/user-guide/notebooks/notebook-manage.html#import-a-notebook) this notebook under the same folder.

### Initialize storage account and container details

  | Variable  | Value | Description  |
  | --------- | --------- | --------- |
  | AzureStorageAccount | Replace **`<AzureStorageAccount>`** | This is the Azure Storage account containing MAG dataset. |
  | AzureStorageAccessKey | Replace **`<AzureStorageAccessKey>`** | This is the Access Key of the Azure Storage account. |
  | MagContainer | Replace **`<MagContainer>`** | This is the container name in Azure Storage account containing MAG dataset, usually in the form of mag-yyyy-mm-dd. |
  | OutputContainer | Replace **`<OutputContainer>`** | This is the container name in Azure Storage account where the output goes to, this container needs to be created before running this script. |

In [0]:
AzureStorageAccount = '<AzureStorageAccount>'
AzureStorageAccessKey = '<AzureStorageAccessKey>'
MagContainer = '<MagContainer>'
OutputContainer = '<OutputContainer>'

### Define MicrosoftAcademicGraph class

Run the MagClass notebook to define MicrosoftAcademicGraph class.

In [0]:
%run "./MagClass"

### Create a MicrosoftAcademicGraph instance to access MAG dataset
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer.

In [0]:
mag = MicrosoftAcademicGraph(account=AzureStorageAccount, key=AzureStorageAccessKey, container=MagContainer)

### Create a AzureStorageUtil to access other Azure Storage files
Use account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer.

In [0]:
asu = AzureStorageUtil(account=AzureStorageAccount, key=AzureStorageAccessKey, container=OutputContainer)

### Load ConferenceSeries data

In [0]:
conferences = mag.getDataframe('ConferenceSeries')

# Peek result
display(conferences.head(5))

ConferenceSeriesId,Rank,NormalizedName,DisplayName,PaperCount,PaperFamilyCount,CitationCount,CreatedDate
1134804816,12797,ICIDS,International Conference on Interactive Digital Storytelling,608,607,2696,2016-06-24
1165160117,14799,SWAT4LS,Semantic Web Applications and Tools for Life Sciences,81,81,197,2016-06-24
1192093291,12251,TRIDENTCOM,Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities,570,570,5047,2016-06-24
1199066382,10256,BIOINFORMATICS,International Conference on Bioinformatics,9226,9226,14451,2016-06-24
1201746639,15536,AIS,Autonomous and Intelligent Systems,165,165,963,2016-06-24


### Load Journals data

In [0]:
journals = mag.getDataframe('Journals')

# Peek result
journals.show(5)

### Union ConferenceSeries and Journals as Venues

In [0]:
conferences = conferences \
    .select(conferences.ConferenceSeriesId, conferences.DisplayName, conferences.NormalizedName) \
    .selectExpr('ConferenceSeriesId as VId', 'DisplayName as VenueName', 'NormalizedName as VenueShortName')

journals = journals \
    .select(journals.JournalId, journals.DisplayName, journals.NormalizedName) \
    .selectExpr('JournalId as VId', 'DisplayName as VenueName', 'NormalizedName as VenueShortName')

venue = conferences.union(journals)

# Peek result
display(venue.head(5))

# Count number of rows in result
print('Number of rows in venue: {}'.format(venue.count()))

VId,VenueName,VenueShortName
1134804816,International Conference on Interactive Digital Storytelling,ICIDS
1165160117,Semantic Web Applications and Tools for Life Sciences,SWAT4LS
1192093291,Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities,TRIDENTCOM
1199066382,International Conference on Bioinformatics,BIOINFORMATICS
1201746639,Autonomous and Intelligent Systems,AIS


### Save Venue.tsv

In [0]:
asu.save(venue, 'Venue.tsv', coalesce=True)