# Get and Extract Signals from Signavio sources: API calls to workspaces and load from saved JSON files 
This notebook provides a collection of cells where we extract Signals from several workspaces by means of Signavio API backend calls.
You would need Signavio credentials - check if you can access:

https://editor.signavio.com/g/statics/pi/areas

Create *.env* file
```js
MY_SIGNAVIO_PASSWORD=*****
MY_SIGNAVIO_NAME=alexey.streltsov@sap.com
```


# Credentials / API  initialization

In [1]:
import sys
sys.path.append("./")
print(sys.path)

['/Users/d071211/projects/signavio/text2SIGNAL/notebooks', '/Users/d071211/.pyenv/versions/3.11.6/lib/python311.zip', '/Users/d071211/.pyenv/versions/3.11.6/lib/python3.11', '/Users/d071211/.pyenv/versions/3.11.6/lib/python3.11/lib-dynload', '', '/Users/d071211/Library/Caches/pypoetry/virtualenvs/text2signal-_sxAEIlr-py3.11/lib/python3.11/site-packages', '/Users/d071211/projects/signavio/text2SIGNAL', './']


In [2]:
from text2signal.authenticator import initialize_signavio_client, WORKSPACES

auth_clients = {}

for workspace_name, workspace_id in WORKSPACES.items():
    auth_clients[workspace_name] = initialize_signavio_client(workspace_id)


In [3]:
! echo $MY_SIGNAVIO_NAME

darko.velkoski@sap.com


## Example how to run manually -constructed SIGNALS against workspace DB:

In [4]:
workspace_name = "Solutions Demo Workspace"
signavio_client = auth_clients[workspace_name]

for process in signavio_client.pi.subjects():
    print(process.id, process.name)
    schema = signavio_client.signal.schema(process.id)
    columns = [field.column_name for field in schema.fields]
    print(columns)

sl-itc-dashboard-test-1 000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)
['case_id', 'event_name', 'end_time', 'Invoice_u0020_Posting_u0020_Gross_u0020_Amount', 'Event_u0020_Created_u0020_By_u0020_User_u0020_Id', 'Event_u0020_Created_u0020_By_u0020_User_u0020_Type', 'Change_u0020_Old_u0020_Value', 'Change_u0020_New_u0020_Value', 'Changed_u0020_Table', 'Changed_u0020_Field', 'Changed_u0020_Object_u0020_Id', 'Change_u0020_Type', 'Change_u0020_Number', 'Invoice_u0020_Line_u0020_Item', 'Accounting_u0020_Document_u0020_Segment_u0020_Primary_u0020_Key', 'Invoice_u0020_Due_u0020_Date', 'Transaction_u0020_Code', 'Reverse_u0020_Document_u0020_Number', 'Reverse_u0020_Document_u0020_Fiscal_u0020_Year', 'Reversal_u0020_Indicator', 'Document_u0020_Currency', 'Amount_u0020_in_u0020_Document_u0020_Currency', 'Converted_u0020_USD_u0020_Amount', 'Amount_u0020_Eligible_u0020_for_u0020_Cash_u0020_Discount_u0020_in_u0020_Document_u0020_Currency', 'Converted_u0020_USD_u0020_Amount_u0020_Eligible_u0020_

## Step 1. Select Signavio workspace here and Get list of processes in given workspace
Later we will extract all Dashboards/Investigations and Metrics from the selected workspace

The selection of the workspace is controlled by:

```python
# Select workspace
workspace_name="Solutions Demo Workspace"
#workspace_name="Process AI"
 ```

In [6]:
# Workspace selection 
workspace_name="Solutions Demo Workspace"
#workspace_name="SAP Signavio Suite Power Challenge"
#workspace_name="Process AI"

In [6]:
# Get list of available process:
workspace_name="Solutions Demo Workspace"
signavio_client = auth_clients[workspace_name]
list_processes =  signavio_client.pi.subjects()
print(f"In Workspace Name: {workspace_name}. Number of processes: {len(list_processes)}")

workspace_name="Process AI"
signavio_client = auth_clients[workspace_name]
list_processes =  signavio_client.pi.subjects()
print(f"In Workspace Name: {workspace_name}. Number of processes: {len(list_processes)}")


In Workspace Name: Solutions Demo Workspace. Number of processes: 30
In Workspace Name: Process AI. Number of processes: 7


In [7]:
list_processes

[PISubject(
     id='demo01-1',
     name='demo01',
     business_area='ba-3',
     currency_code=None,
     preprocessing_status=<PISubjectPreprocessingStatus.PROCESSING_FINISHED: 'PROCESSING_FINISHED'>,
     changed_at=datetime.datetime(2023, 10, 20, 8, 35, 59, 373000, tzinfo=TzInfo(UTC)),
     source_system=None,
     process_types=None,
     event_log_updated_at=datetime.datetime(2023, 10, 20, 8, 35, 57, 484000, tzinfo=TzInfo(UTC)),
     investigation_count=0,
     dashboard_count=0
 ),
 PISubject(
     id='dvtest-1',
     name='dv_test',
     business_area='ba-3',
     currency_code=None,
     preprocessing_status=<PISubjectPreprocessingStatus.PROCESSING_FINISHED: 'PROCESSING_FINISHED'>,
     changed_at=datetime.datetime(2023, 11, 9, 17, 14, 55, 609000, tzinfo=TzInfo(UTC)),
     source_system=None,
     process_types=None,
     event_log_updated_at=datetime.datetime(2023, 11, 9, 17, 14, 53, 385000, tzinfo=TzInfo(UTC)),
     investigation_count=1,
     dashboard_count=0
 ),
 PISubj

## Step 2. Get lists of Dashboards, Investigations, Metrics for each of the processes from the above Step 1. nd store in all_json JSON

The selection of the workspace is controlled by:

```python
# Select workspace
workspace_name="Solutions Demo Workspace"
#workspace_name="Process AI"
 ```

First,  we run collection of Signavio API calls to upload information about Dashboards , Investigations for each process in the defined above workspace

Then, we save all of them also in fs as  **"all_json_WORKSPACENAME_DATE.json"** and al respective dashboards, investigations, metrics.
e.g for workspace_name="Solutions Demo Workspace":

Dumping all extracted Dashboards, Investigations, Metrics data to ../data/all_json_Solutions_Demo_Workspace_2023-12-01T07_54_49.json

```js
data
├── [0]_Dashboards_ITP\ Process
│   │   ├── dashboards
│   │   │   ├── 00_Empty\ Template_2023-12-01.json
│   │   │   ├── 0[SL]-\ Payment\ Term\ Optimization\ (for\ Final\ dashboard)_2023-12-01.json
│   │   │   └── [SL]\ Payment\ Term\ Analysis\ -\ Use\ case\ 3\ -\ SOL-\ 2727_2023-12-01.json
│   │   ├── investigations
│   │   │   └── Event\ lvl\ metrics\ test_2023-12-01.json
│   │   ├── metrics
│   │   │   └── metrics.json
│   │   └── views
│   │       └── views.json
│   ├── [NK]\ Show&Tell\ -\ ITP\ Dashboards
│   │   ├── dashboards
│   │   │   └── [04]\ Payment\ Behaviour_2023-12-01.json
│   │   ├── metrics
│   │   │   └── metrics.json
│   │   └── views
│   │       └── views.json
│   └── [SIGPIA-2936]\ Dashboard_PTP_Ariba
│       ├── dashboards
│       │   ├── 00_Overview\ Dashboard_2023-12-01.json
│       │   ├── Automation\ Dashboard_2023-12-01.json
│       │   ├── Cycle\ Time\ Dashboard_2023-12-01.json
│       │   ├── Payment\ Behaviour_2023-12-01.json
│       │   └── Spend\ Analysis_2023-12-01.json
│       ├── investigations
│       │   ├── SAP_ARIBA_PTP_2023-12-01.json
│       │   └── Test_2023-12-01.json
│       ├── metrics
│       │   └── metrics.json
│       └── views
│           └── views.json
└── all_json_Solutions_Demo_Workspace_2023-12-01T07_54_49.json
```

In [13]:
from text2signal.data.utils import load_configurations
from text2signal.data.pi_extractor import PIDataManager

config = load_configurations('../text2signal/configs/workspaces_temp.yaml')
workspaces = config.get('workspaces', {})
root_dir = config.get('root_dir', '../data/temp')

manager = PIDataManager(workspaces)

for workspace_name in workspaces.keys():
    print(f"Processing workspace: {workspace_name}")
    all_json = manager.fetch_data(workspace_name, root_dir)
    manager.save_all_jsons(root_dir, workspace_name, all_json)


Processing workspace: Process AI
Failed to perform request for entity askdata-poc-1: 403 Client Error: Forbidden for url: https://editor.signavio.com/g/api/pi-graphql/investigations/askdata-poc-1/export
Failed to perform request for entity q1-1: 403 Client Error: Forbidden for url: https://editor.signavio.com/g/api/pi-graphql/investigations/q1-1/export
Dumping all extracted Dashboards, Investigations, Metrics data to data/temp/all_json_Process_AI_2024-02-11T23_09_18.json


## Step 3. Get Signals from json/json_files

Here we use all_json JSONs to extract Signals therein.

Additionally, we can extract Signals from files metrics, dashboards, investigations JSON-files stored on your filesystem.

Problem here is that we have specify view variable - needed to run API calls against real log DB, typically we need this to 
validate Signals, human or LLM-generated.

```js
data/
├── ContentPackage_November2023
│   ├── KFFI000960\ -\ Supplier\ payment\ without\ purchasing
│   ├── KFFI001060\ -\ Cash\ collection\ for\ non-sales\ (invoices)
│   ├── KPLE000360\ -\ Delivery\ processing\ for\ subcontracting\ wit
...
── NLP\ POC\ (JustAsk)
│   ├── 01\ Introduction
│   ├── 02\ OTC\ Data\ Set
│   ├── 03\ Mining\ Questions\ and\ SIGNAL\ Queries
│   ├── 04\ PI\ Investigation
│   └── JustAsk\ -\ Signavio\ Widget\ Builder.pptx
├── OPAL
│   ├── DRAFT_1.json
│   └── DRAFT_2.json
data/From_API/
├── Process\ AI
│   ├── Test
│   ├── demo01
...
├── Solutions\ Demo\ Workspace
│   ├── 000_DEMO�\234�\ -\ Dashboards\ ITC\ ECC\ &\ S4H\ (DO\ NOT\ CHANGE)
│   ├── 000_DEMO�\234�\ -\ Dashboards\ ITP\ ECC\ &\ S4H\ (DO\ NOT\ CHANGE
...
├── all_extracted_signals_2023-12-01T09_59_45.json
├── all_json_Process_AI_2023-12-01T09_53_37.json
└── all_json_Solutions_Demo_Workspace_2023-12-01T09_53_24.json
...
```

###  Extract Signals and save them to all_extracted_signals_DATE.json

In [4]:
import datetime
import pandas as pd

from pathlib import Path

from text2signal.data.signal_extraction import SignalExtractor
from text2signal.data.utils import load_configurations, save_json_to_file

"""Extract signals from JSON files."""
yaml_config = load_configurations('../text2signal/configs/signal_extraction_config_temp.yaml')
print(f"Loaded configuration: {yaml_config.get('ouput_dir')}")
configurations = yaml_config.get('configurations', [])
all_signals = []

for config in configurations:
    extractor = SignalExtractor(workspace_name=config["workspace"], view_id=config["view_id"], base_path=Path(config["path"]))
    signals = extractor.process_jsons()
    all_signals.extend(signals)

print(f"Total signals extracted: {len(all_signals)}")
# Save extracted signals
date_str = datetime.datetime.now().strftime("%Y-%m-%dT%H_%M_%S")
file_path = Path(f"{yaml_config.get('ouput_dir')}/all_extracted_signals_{date_str}.json")
print(f"Dumping {len(all_signals)} extracted signals  to {file_path}")
save_json_to_file(all_signals, file_path)

df = pd.DataFrame(all_signals)
signals_file_path = Path(f"{yaml_config.get('ouput_dir')}/signals_{date_str}.csv")
print(f"Saving CSV file with Signals: {signals_file_path}")
df.to_csv(signals_file_path)


Loaded configuration: data/temp
None: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/metrics/metrics.json : 1
['dashboard', 'metrics']: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/dashboards/Cycle Times_2024-02-01.json : 20
['dashboard', 'metrics']: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/dashboards/Automation_2024-02-01.json : 30
['dashboard', 'metrics']: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/dashboards/00_Overview_2024-02-01.json : 28
['dashboard', 'metrics']: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/dashboards/Optimize Your Payment Term Usage_2024-02-01.json : 22
['dashboard', 'metrics']: ../data/From_API/Solutions Demo Workspace/000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)/dashboards/Improve Days Sales Outstanding

# Pandas and data Analysis

Here we:
- *vdf* - Load view column names for each involved view: from all_json and all_json1
- *df* -  Combine Signals from different sources into Pandas for Statistical analysis.

## Schemas to Data Frames

In [17]:

from text2signal.data.utils import load_json
from text2signal.data.fetch_schema_info import fetch_and_prepare_schema_data

workspace_name = "Solutions Demo Workspace"
signavio_client = auth_clients[workspace_name]

json_filepath = "../data/From_API/all_json_Solutions_Demo_Workspace_2024-02-01T08_17_38.json"
all_json = load_json(json_filepath)

df_schema = fetch_and_prepare_schema_data(signavio_client, all_json)
df_schema


Fetching schema for process: sl-itc-dashboard-test-1, 000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)
Found views ['defaultview-529'] for process: 000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT CHANGE)
Fetching schema for process: 00demo---sap-s4h-itp-dashboar-1, 000_DEMO✨ - Dashboards ITP ECC & S4H (DO NOT CHANGE)
Found views ['defaultview-320'] for process: 000_DEMO✨ - Dashboards ITP ECC & S4H (DO NOT CHANGE)
Fetching schema for process: 00demo---sap-s4h-otc-dashboar-1, 000_DEMO✨ - Dashboards OTC ECC & S4H (DO NOT CHANGE) 
Found views ['defaultview-383'] for process: 000_DEMO✨ - Dashboards OTC ECC & S4H (DO NOT CHANGE) 
Fetching schema for process: sademo-dashboardss4h-p2p-proc-1, 000_DEMO✨ - Dashboards PTP ECC & S4H (DO NOT CHANGE)
Found views ['defaultview-124'] for process: 000_DEMO✨ - Dashboards PTP ECC & S4H (DO NOT CHANGE)
Fetching schema for process: lg-00demo---ato-sap-s4-hana-d-1, 00_DEMO - Acquire-to-Onboard for  SAP ECC & S/4 HANA (DO NOT CHANGE)
Found views ['defaultvi

Unnamed: 0,column_name,name,column_role,dataType,view,process,process_id
0,case_id,case_id,DIMENSION,TEXT,defaultview-529,000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT C...,sl-itc-dashboard-test-1
1,event_name,event_name,DIMENSION,LIST_TEXT,defaultview-529,000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT C...,sl-itc-dashboard-test-1
2,end_time,end_time,DIMENSION,LIST_TIMESTAMP,defaultview-529,000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT C...,sl-itc-dashboard-test-1
3,Invoice_u0020_Posting_u0020_Gross_u0020_Amount,Invoice Posting Gross Amount,DIMENSION,LIST_NUMBER,defaultview-529,000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT C...,sl-itc-dashboard-test-1
4,Event_u0020_Created_u0020_By_u0020_User_u0020_Id,Event Created By User Id,DIMENSION,LIST_TEXT,defaultview-529,000_DEMO✨ - Dashboards ITC ECC & S4H (DO NOT C...,sl-itc-dashboard-test-1
...,...,...,...,...,...,...,...
3876,last_u0020_gr_u0020_shipment_u0020_notice_u002...,Last GR Shipment Notice Reference,DIMENSION,TEXT,defaultview-545,[SIGPIA-2936] Dashboard_PTP_Ariba,sm-ptpariba-1
3877,last_u0020_gr_u0020_supplier_u0020_id,Last GR Supplier Id,DIMENSION,TEXT,defaultview-545,[SIGPIA-2936] Dashboard_PTP_Ariba,sm-ptpariba-1
3878,last_u0020_gr_u0020_supplier_u0020_location_u0...,Last GR Supplier Location Id,DIMENSION,TEXT,defaultview-545,[SIGPIA-2936] Dashboard_PTP_Ariba,sm-ptpariba-1
3879,last_u0020_gr_u0020_item_u0020_processed_u0020...,Last GR Item Processed State,DIMENSION,TEXT,defaultview-545,[SIGPIA-2936] Dashboard_PTP_Ariba,sm-ptpariba-1


In [18]:
filepath = Path(f'data/temp/schemas_from_views_{date_str}.csv')  
filepath.parent.mkdir(parents=True, exist_ok=True)  
df_schema.to_csv(filepath)  
print(f"CSV files with Schemas for Signals: {filepath}")


CSV files with Schemas for Signals: data/temp/schemas_from_views_2024-02-11T23_24_11.csv
