# Pull raw event sample

In cosmos notebook, use maglev kernel to run this code.
Original reference notebook:
- https://github.paypal.com/DART/eve-builder-util/blob/master/bizlog-pull/src/main/resources/pull-event-bizlog-example-pyspark.ipynb

- For raw data path, please refer `/sys/pp_dm/dm_hdp_batch/kafka_data/RISK/BLOGGING/riskdataprocessmsgd/`, one folder for each event. Note that raw data is downsampled, usually sample rate is 0.0002. Raw data file is partitioned by time, for example, /data/file/2023/05/01/00 fefers to event published in 2023-05-01 00:00:00 to 00:59:59.
  Can use spark wildcard pattern to load multiple data file.
- For wildcard pattern, pls refer here: https://hail.is/docs/0.2/hadoop_glob_patterns.html and https://man7.org/linux/man-pages/man7/glob.7.html
- For general elements, just use dot access schema. For list elements, use dot.{index} to access. In below example, user_cc is a list, use `user_cc.0` to get first list element.
- If schema path contains dash (`-`), pls use `path.to.'some-dash-linked-schema'` to access.
- Way to pull K-V pair schema path: `"findInKVPairs(request.body.context.data_set.k_v_pairs, 'ucc_sender_email_address')"`

In [None]:
from automation_utils.spark.session import get_spark
from bizlog_pull_py.bom_bizlog_tool import BomBizlogTool
from bizlog_pull_py.event_log_tool import EventLogTool

spark = None # spark instance

data_dir = '' # event raw message data dir

event_tool = EventLogTool(spark)


event_tool.eventPath(f"{data_dir}/2023/05/10/10/*")
# defaut event type value is applicable to most events but in some cases raw data
# is json format. You need to specify data type here.
event_tool.eventType('JSON') 


# put elements schema here to extract fields you want.
event_tool.elements("payment_attempt.user_cc.0.first_name",
                    "payment_attempt.user_cc.0.last_name",
                    "payment_attempt.account_number",
                    "payment_attempt.time_event_published",
                    "payment_attempt.counterparty",
                    "payment_attempt.usd_amount.amount",
                    "payment_attempt.sender_account.first_name", 
                    "payment_attempt.sender_account.last_name"
                    )

df = event_tool.pull()

df.show(10)

# EVE builder code snippet

You can validate your expression in EVE: https://gds.paypalinc.com/evebuilder/express_validation

## filter not null

In [None]:

(ACEEvent.firstName != null && ACEEvent.firstName  != '') || (ACEEvent.lastName != null && ACEEvent.lastName  != '')



## string to lower case

In [None]:

PaymentAttempt.transactionType == null ? '': PaymentAttempt.transactionType.toLowerCase() == 'a'



## concate string

In [None]:
PaymentAttempt.phoneCountryCode.concat(PaymentAttempt.phoneNumber)