# User Migration

Administrators may have need to migrate users' content from one authentication directory (aka "datasource") to another. For example, you may need to migrate from on-premise Windows Active Directory to Azure Active Directory (AAD). Since Azure Active Directory usernames are a somewhat obfuscated alphanumeric sequence, administrators cannot easily proactively create AAD accounts in order to perform the migration. It is easier to quickly migrate the old account's content to the new account _after_ the new account has been created (as a result of the user logging in for the first time).

This script scans all users in the old and new directory/datasource, and if they can be matched, it will migrate the content from old to new and delete the old account. This means that, from the user's standpoint, they log in with their new account, and then some time later (say, 15 minutes) all of their folders, analyses, topics, projects appear in the **My Folder** section, and all of their access control is "restored" to their new account.

Generally, you will want to run this script on a schedule so that the user's content is migrated within, say, 15 minutes of logging into their new account. See the final cell below for instructions on how to do this.

⚠️ Only Seeq Administrators can run this notebook.

⚠️ This notebook makes irreversible changes to the system. Set `dry_run = True` and review the output after making
any changes to the configuration or implementation of this notebook.

In [1]:
from collections import namedtuple

from pandas import DataFrame

from seeq import sdk, spy

# Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
spy.options.compatibility = 193

## Authentication

Log into Seeq Server if you're not using Seeq Data Lab

In [None]:
spy.login(url='http://localhost:34216', credentials_file='../credentials.key', force=False)

## Configuration

- `dry_run`: When `True` the migration details will be displayed, but the migration will NOT be executed.
- `criteria`: Users where ANY of these properties match between datasources will be migrated.
- `old_datasource_name`: The name of the datasource to migrate users from.
- `new_datasource_name`: The name of the datasource to migrate users to.

In [None]:
dry_run = True
criteria = ['email', 'fullname', 'username']
old_datasource_name = 'Old SSO Provider'
new_datasource_name = 'New SSO Provider'

## Implementation

In [None]:
users_api = sdk.UsersApi(spy.client)
NormalizedUser = namedtuple('NormalizedUser', ['id', 'email', 'fullname', 'username'])

In [None]:
def normalize_str(s):
    return s.strip().lower() if isinstance(s, str) else s


def normalize_user(user):
    return NormalizedUser(
        id=user.id,
        email=normalize_str(user.email),
        fullname=(normalize_str(user.first_name), normalize_str(user.last_name)),
        username=user.username,
    )


def get_user_map(datasource_name):
    users = {}
    limit = 100
    offset = 0
    while True:
        page = users_api.get_users(datasource_name_search=datasource_name, offset=offset, limit=limit)
        for user in page.users:
            # datasource_name_search can result in unwanted results when the user datasource name
            # being searched for is a subset of another user datasource name on the server
            if user.datasource_name == datasource_name:
                normalized_user = normalize_user(user)
                for criterion in criteria:
                    users[(criterion, getattr(normalized_user, criterion))] = normalized_user
        if not page.next:
            return users
        offset += limit

In [None]:
old_user_map = get_user_map(old_datasource_name)
new_user_map = get_user_map(new_datasource_name)

matches = sorted(dict(
    (old_user_map[common_key], new_user_map[common_key])
    for common_key in old_user_map.keys() & new_user_map.keys()
).items())

df = DataFrame([{
    'Old ID': old.id,
    'Old Username': old.username,
    'Old Email': old.email,
    'Old Name': old.fullname,
    '': '->',
    'New ID': new.id,
    'New Username': new.username,
    'New Email': new.email,
    'New Name': new.fullname,
} for old, new in matches])

dfStyler = df.style.set_properties(**{'text-align': 'left', 'white-space': 'nowrap', })
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left'), ('white-space', 'nowrap')])])

In [None]:
results = []

for old, new in matches:
    result_args = [old.id, '->', new.id]

    if dry_run:
        results.append([*result_args, 'SKIPPED', None])
    else:
        try:
            users_api.delete_user(id=old.id, new_owner_id=new.id, transfer_acl_and_group_membership=True)
        except sdk.rest.ApiException as e:
            results.append([*result_args, 'FAILURE', e.body])
        else:
            results.append([*result_args, 'SUCCESS', None])

df = DataFrame(results, columns=['Old ID', '', 'New ID', 'Status', 'Error'])

dfStyler = df.style.set_properties(**{'text-align': 'left', 'white-space': 'nowrap'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left'), ('white-space', 'nowrap')])])

## Scheduling

You will generally want this script to run on a schedule so that it is constantly trying to migrate users whenever it can. This reduces the time between a user logging in using their new credentials and them seeing all of their content from their old account.

To schedule, remove the '#' comment character from the cell below and execute it.

In [1]:
# spy.jobs.schedule('every 15 minutes')