<a href="https://colab.research.google.com/github/totvslabs/carol-notebooks/blob/main/notebooks/DatamodelFromStaging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Carol Create Datamodel from Staging table
`This script will create a datamodel using Staging table fields. Fields already created in the platform but with diferent data types will not be created in the datamodel.`
`The script will ask for the following json on execution:`

```python
{
    "authentication_config" : {
        "username": "username@totvs.com.br",
        "password": "password",
        "organization": "YourOrganization",
        "tenantName": "YourTenantName"
    },
    "script_config" : {
        "staging": "stg_myconnector_mystaging",
        "datamodel": "TargetDatamodelName",
        "profile_title": "DatamodelPK"
    }
}
```
`You need TENANT ADMIN permissions to run this script.`

#### REQUIREMENTS
`These are the packages the script needs before execution.`

In [28]:
%%capture
!pip install --quiet pycarol=="2.54.18"
import json, os, sys
from google.api_core import exceptions
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
import pycarol
from pycarol.tools.data_model_generator import DataModelGenerator

#### CAROL LOGIN FUNCTIONS
`These are the functions made to login into Carol. They will be the same for all notebooks (ideally) and will use pyCarol.`

[pyCarol reference](https://github.com/totvslabs/pyCarol)

In [29]:
def carol_connect(username, password, organization, tenantName):
    print(f"Connecting to Carol tenant {tenantName}... ", end="\n")

    return pycarol.Carol(domain=tenantName,
                auth=pycarol.PwdAuth(username, password), organization=organization)

#### SCRIPT FUNCTIONS
`If the script requires more functions to execute, they will be here.`

In [30]:
def get_data_bq(carolObject, staging):
    credentials = pycarol.bigquery.TokenManager(carolObject).get_token().to_dict()

    environment = carolObject.get_current()["env_id"]
    project=f"carol-{environment[0:20]}"
    dataset=f"{project}.{environment}"

    service_account = Credentials.from_service_account_info(credentials['service_account'])
    bq = bigquery.Client(project=project, credentials=service_account)
    config = bigquery.QueryJobConfig(priority="BATCH", default_dataset=dataset)

    sql = f"""
    SELECT * EXCEPT(mdmCounterForEntity__DATETIME__,mdmCounterForEntity,mdmId,mdmCreated,mdmLastUpdated,mdmTenantId,mdmEntityType,mdmDeleted,mdmConnectorId,_ingestionDatetime),
    FROM `{dataset}`.{staging}
    LIMIT 1
    """

    try:
        result = (
            bq.query(sql, config)
            .result()
            .to_dataframe(
                create_bqstorage_client=True,
            )
        )
        result.to_json(f'./{staging}.json', orient = 'records')
        print(f'result of table `{dataset}`.{staging} extracted to ./{staging}.json')
        return(result)

    except exceptions.ClientError:
        print({"error" : str(sys.exc_info()[1])})
        return({"error" : str(sys.exc_info()[1])})

#### CONFIGURATION FILE
`Now you will need to upload the configuration file with the format given above.`

In [31]:
try:
    from google.colab import files
    config_file = files.upload()
    config_json = json.loads(config_file[next(iter(config_file))].decode("utf-8"))
    config_json_print = json.loads(config_file[next(iter(config_file))].decode("utf-8"))
except:
    with open('./carol.json') as config_file:
        config_json = json.loads(config_file.read())
        config_json_print = json.loads(config_file.read())
    config_file.close()
finally:
    del config_json_print['authentication_config']['password']
    print(json.dumps(config_json_print, indent=2))

Saving carol.json to carol (1).json
{
  "authentication_config": {
    "username": "breno.zipoli@totvs.com.br",
    "organization": "datascience",
    "tenantName": "brenopapa3"
  },
  "script_config": {
    "staging": "stg_myconnector_pcnfsaid",
    "datamodel": "pcnfsaid",
    "profile_title": "numtransvenda"
  }
}


#### SCRIPT EXECUTION
`The main execution of the script will happen here.`

In [32]:
Carol = carol_connect(
    config_json['authentication_config']['username'],
    config_json['authentication_config']['password'],
    config_json['authentication_config']['organization'],
    config_json['authentication_config']['tenantName'])

print('If you are using Google Colaboratory, remember to refresh the contents of working folder after script execution!')

bq = get_data_bq(Carol, config_json["script_config"]["staging"])

la = DataModelGenerator(Carol)

with open(f'./{config_json["script_config"]["staging"]}.json', encoding="utf-8") as d:
    data = json.loads(d.read())[0]
print(data)

la.start(data, dm_name=config_json["script_config"]["datamodel"], profile_title=config_json["script_config"]["profile_title"], publish=True, overwrite=True)



Connecting to Carol tenant brenopapa3... 
If you are using Google Colaboratory, remember to refresh the contents of working folder after script execution!
{'nfatualwms': None, 'codfiscaloutrasdesp': None, 'telefone': None, 'numseriesat': None, 'vldesconto': 3.93, 'iesubsttribut': None, 'vltotbrutoprodajuste': None, 'cnpjfornec': None, 'totvolume': None, 'datahoraregistroepec': None, 'numprevenda': None, 'numtab': None, 'recibonfe': None, 'percofins': None, 'prazoponderado': None, 'numecf': None, 'idparceiro': None, 'vlcustofin': None, 'vlicmfreteauton': None, 'numnftransf': None, 'cartaodotz': None, 'sulframa': None, 'numtransvendatv13': None, 'numnotaorigem': None, 'vlicmsantecipado': None, 'vlpunitmed': None, 'nomearquivodotz': None, 'percicmfrete': None, 'codpraca': None, 'valorpedagio': None, 'transpautonomo': None, 'datahoraemissaosat': None, 'chavecte': None, 'codmedicoprescrit': None, 'tipoemissao': None, 'codsupervisor': None, 'comissaomot': None, 'nfbrinde': None, 'deduzirdeso

KeyboardInterrupt: ignored