# Machine Learning Model - Cloud Computing and AWS

## Preparation



- In the event it shuts down, but you still have your notebook open, you may copy your notebook cell by cell to another notebook, say in your own juyter notebook environment. That way, you at least save your scripts without the output.


Using the following code to create a Spark Session so that you can use it subsequently for query and processing.

In [1]:
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.appName("PySparkApp")
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.2")
    .getOrCreate()
)
spark


:: loading settings :: url = jar:file:/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-0e2eb87d-88d7-4a99-a1b3-78654279e2e8;1.0
	confs: [default]
	found org.apache.hadoop#hadoop-aws;3.2.2 in central
	found com.amazonaws#aws-java-sdk-bundle;1.11.563 in central
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.2/hadoop-aws-3.2.2.jar ...
	[SUCCESSFUL ] org.apache.hadoop#hadoop-aws;3.2.2!hadoop-aws.jar (40ms)
downloading https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.563/aws-java-sdk-bundle-1.11.563.jar ...
	[SUCCESSFUL ] com.amazonaws#aws-java-sdk-bundle;1.11.563!aws-java-sdk-bundle.jar (1691ms)
:: resolution report :: resolve 2961ms :: artifacts dl 1737ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.11.563 from central in [default]
	org.apache.hadoop#hadoop-aws;3.2.2 from central in [

23/11/20 03:23:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [2]:
import warnings, requests, zipfile, io
warnings.simplefilter('ignore')
import pandas as pd
from scipy.io import arff

import os
import boto3
import sagemaker
from sagemaker.image_uris import retrieve
from sklearn.model_selection import train_test_split

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


##  Using SageMaker to Train and Depoly a Machine Learning Model

The abalone dataset has been used to predict the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope - - a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location(hence food availability) may be required to solve the problem. More information about the dataset is at: https: // archive.ics.uci.edu/ml/datasets/abalone

a copy of the data is available through sagemaker sample data files at `s3: // sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv`

In the part, we use Spark to prepare the data locally, and then use a built-in SageMaker xgboost algorithm to train a XGBoost regression model and then deploy it.


1\. First download the dataset using `aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv  abalone.csv`

In [3]:
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv abalone.csv


download: s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv to ./abalone.csv


2\. In the following, load abalone.csv to a Spark DataFrame, using the schema provided below.

```
schema = "sex String,length Double,diameter Double,height Double,whole_weight Double,shucked_weight Double,viscera_weight Double,shell_weight Double,rings Double"
```

- Then, view 10 rows from this dataset

In [4]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DoubleType

schema = StructType([
    StructField("sex", StringType(), True),
    StructField("length", DoubleType(), True),
    StructField("diameter", DoubleType(), True),
    StructField("height", DoubleType(), True),
    StructField("whole_weight", DoubleType(), True),
    StructField("shucked_weight", DoubleType(), True),
    StructField("viscera_weight", DoubleType(), True),
    StructField("shell_weight", DoubleType(), True),
    StructField("rings", DoubleType(), True)
])

df = spark.read.csv("/home/ec2-user/SageMaker/abalone.csv", schema=schema, header=True)
df.show(10)


                                                                                

23/11/20 03:24:35 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv
+---+------+--------+------+------------+--------------+--------------+------------+-----+
|sex|length|diameter|height|whole_weight|shucked_weight|viscera_weight|shell_weight|rings|
+---+------+--------+------+------------+--------------+--------------+------------+-----+
|  M|  0.35|   0.265|  0.09|      0.2255|        0.0995|        0.0485|        0.07|  7.0|
|  F|  0.53|    0.42| 0.135|       0.677|        0.2565|        0.1415|        0.21|  9.0|
|  M|  0.44|   0.365| 0.125|       0.516|        0.2155|         0.114|       0.155| 10.0|
|  I|  0.33|   0.255|  0.08|       0.205|        0.0895|        0.0395|       0.055|  7.0|
|  I| 0.425|     0.3| 0.095|

3\. Build a data engineering pipeline to prepare the data:
- Sex is converted to numeric values via StringIndexer
- the first column is `rings` (i.e. age of abalone) because SageMaker algorithm requires the first column of the training data to be label.
- keep the indexed sex column but not the original sex column.
- keep the remaining columns.
- randomly split the dataframe into `train_df` (80%) and `validation_df` (20%)

In [5]:
from pyspark.sql import SparkSession
from pyspark.ml.feature import StringIndexer
from pyspark.ml import Pipeline


# Load the dataset and define the schema
# ... (load the data as shown previously)

# Create a StringIndexer to convert 'sex' column to numeric values
indexer = StringIndexer(inputCol="sex", outputCol="sex_indexed")

# Build the pipeline
pipeline = Pipeline(stages=[indexer])

# Fit the pipeline to the dataset and transform the data
model = pipeline.fit(df)
transformed_df = model.transform(df)

# Reorder the columns
required_columns = ["rings"] + [col for col in df.columns if col not in ["rings", "sex"]] + ["sex_indexed"]
final_df = transformed_df.select(required_columns)

# Split the DataFrame
train_df, validation_df = final_df.randomSplit([0.8, 0.2])


23/11/20 03:25:54 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M
 Schema: sex
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv


                                                                                

In [6]:
train_df.show(5)

23/11/20 03:26:02 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv
+-----+------+--------+------+------------+--------------+--------------+------------+-----------+
|rings|length|diameter|height|whole_weight|shucked_weight|viscera_weight|shell_weight|sex_indexed|
+-----+------+--------+------+------------+--------------+--------------+------------+-----------+
|  2.0|  0.15|     0.1| 0.025|       0.015|        0.0045|         0.004|       0.005|        1.0|
|  3.0|  0.13|     0.1|  0.03|       0.013|        0.0045|         0.003|       0.004|        1.0|
|  3.0|  0.14|   0.105| 0.035|       0.014|        0.0055|        0.0025|       0.004|        1.0|
|  3.0| 0.155|    0.11|  0.04|      0.0155|        0.0065|         0.00

4\. Write dataframes as csv:
- Write `train_df` and `validation_df` to local directories `train` and `validation` respectively, using the csv format.
- Also write a copy of validation_df without the `rings` column to a local directory `test`, also using the csv format.
- verify the content of these files using bash commands, e.g. `head train/*.csv`


In [7]:

test_df = validation_df.drop("rings")

In [8]:
# Write train_df to a local directory in CSV format
train_df.write.csv("/home/ec2-user/SageMaker/train/", header=True)

23/11/20 03:26:16 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv


In [9]:
!head -n 5 train/*.csv

rings,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,sex_indexed
2.0,0.15,0.1,0.025,0.015,0.0045,0.004,0.005,1.0
3.0,0.13,0.1,0.03,0.013,0.0045,0.003,0.004,1.0
3.0,0.14,0.105,0.035,0.014,0.0055,0.0025,0.004,1.0
3.0,0.155,0.11,0.04,0.0155,0.0065,0.003,0.005,0.0


In [10]:
validation_df.write.csv("/home/ec2-user/SageMaker/validation/", header=True)
test_df.write.csv("/home/ec2-user/SageMaker/test/", header=True)

23/11/20 03:26:22 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv
23/11/20 03:26:22 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv


In [46]:
!head -n 5 validation/*.csv

rings,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,sex_indexed
3.0,0.13,0.1,0.03,0.013,0.0045,0.003,0.004,1.0
3.0,0.18,0.125,0.05,0.023,0.0085,0.0055,0.01,0.0
3.0,0.18,0.13,0.045,0.0275,0.0125,0.01,0.009,1.0
3.0,0.195,0.15,0.045,0.0375,0.018,0.006,0.011,1.0


In [47]:
!head -n 5 test/*.csv

length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,sex_indexed
0.13,0.1,0.03,0.013,0.0045,0.003,0.004,1.0
0.18,0.125,0.05,0.023,0.0085,0.0055,0.01,0.0
0.18,0.13,0.045,0.0275,0.0125,0.01,0.009,1.0
0.195,0.15,0.045,0.0375,0.018,0.006,0.011,1.0


5\. Run the following to obtain prefix and bucket names, then copy csv files (for training, validation, and test) to corresponding folders on s3.

e.g. the following command copy your training csv to s3

`aws s3 cp train/*.csv s3://$bucket/$prefix/train/train.csv`

- then verify your uploaded file using aws s3 cp and head combined (via pipe)

`aws s3 cp s3://$bucket/$prefix/train/train.csv - | head`

(you can ignore the "broken pipe" error; the command still works)

In [11]:
import sagemaker
import boto3
from sagemaker import image_uris
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

prefix = 'xgboost-builtin-algo'
bucket = sagemaker.Session().default_bucket()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [12]:
!aws s3 cp train/*.csv s3://$bucket/$prefix/train/train.csv



upload: train/part-00000-e1a61c67-d43f-48ba-9797-32a5022b6a5a-c000.csv to s3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/train/train.csv


In [13]:
!aws s3 cp s3://$bucket/$prefix/train/train.csv - | head



rings,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,sex_indexed
2.0,0.15,0.1,0.025,0.015,0.0045,0.004,0.005,1.0
3.0,0.13,0.1,0.03,0.013,0.0045,0.003,0.004,1.0
3.0,0.14,0.105,0.035,0.014,0.0055,0.0025,0.004,1.0
3.0,0.155,0.11,0.04,0.0155,0.0065,0.003,0.005,0.0
3.0,0.16,0.11,0.025,0.018,0.0065,0.0055,0.005,1.0
3.0,0.165,0.12,0.03,0.0215,0.007,0.005,0.005,1.0
3.0,0.165,0.12,0.05,0.021,0.0075,0.0045,0.014,1.0
3.0,0.18,0.125,0.05,0.023,0.0085,0.0055,0.01,0.0
3.0,0.18,0.13,0.045,0.0275,0.0125,0.01,0.009,1.0
download failed: s3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/train/train.csv to - [Errno 32] Broken pipe


In [14]:
!aws s3 cp validation/*.csv s3://$bucket/$prefix/validation/validation.csv
!aws s3 cp test/*.csv s3://$bucket/$prefix/test/test.csv

upload: validation/part-00000-39b75db2-57b6-48a6-bce4-fc61001703b7-c000.csv to s3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/validation/validation.csv
upload: test/part-00000-540320d9-1665-4bda-9da5-968b7cdf34ad-c000.csv to s3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/test/test.csv


6\. Train a XGBoost model using the training and validation datasets using SageMaker's built-in xgboost algorithm. 

- Create a `xgboost-container` 
    - `sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "1.5-1")` will retrive the image (for xgboost version 1.5-1, the version is required) for using with the container.
- Using the folloing hyperparameters (note that the objective `reg:squarederror` indicates that we use xgboost for numerical prediction with Mean Squared Error as evaluation metrics. 

        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"reg:squarederror",
        "num_round":"50"

- construct a SageMaker `estimator` using the `xgboost-container`, using the following configurations:
    
        sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                      hyperparameters=hyperparameters,
                                      role=sagemaker.get_execution_role(),
                                      instance_count=1, 
                                      instance_type='ml.m4.xlarge', 
                                      output_path=output_path)  


- build two input sources `train_input`, `validation_input` using the `TrainInput` function 
    - TrainingInput(s3_folder, content_type='text/csv')`  returns a csv input channel. The build-in XGBoost algorithm of SageMaker only works with csv and libsvm formats. 
    - s3_folder (in the form of s3://bucket/prefix/folder) should be the folder you upload your data to. One example is `s3://sagemaker-us-east-1-308934269464/xgboost-builtin-algo/train/`
    
- Train the xgboost `estimator` using both train and validation inputs. This will take several minutes.

More details at: https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html


In [15]:
xgboost_container = sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "1.5-1")

In [16]:
hyperparams={"max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"reg:squarederror",
        "num_round":"50"
}

s3_output_location="s3://{}/{}/output/".format(bucket,prefix)

In [17]:
  xgb_model =      sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                      hyperparameters=hyperparams,
                                      role=sagemaker.get_execution_role(),
                                      instance_count=1, 
                                      instance_type='ml.m4.xlarge', 
                                      output_path=s3_output_location) 

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [18]:
bucket

'sagemaker-us-east-1-380875404404'

In [19]:
from sagemaker.inputs import TrainingInput

# Define the S3 paths to the training and validation data
s3_training_data_path = 's3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/train/'
s3_validation_data_path = 's3://sagemaker-us-east-1-380875404404/xgboost-builtin-algo/validation/'

train_input = TrainingInput(s3_data=s3_training_data_path, content_type='text/csv')


validation_input = TrainingInput(s3_data=s3_validation_data_path, content_type='text/csv')


In [20]:
data_channels = {'train': train_input, 'validation': validation_input}


In [21]:
xgb_model.fit(inputs=data_channels, logs=False)


2023-11-20 03:27:09 Starting - Starting the training job...
2023-11-20 03:27:33 Starting - Preparing the instances for training...............
2023-11-20 03:28:55 Downloading - Downloading input data...........
2023-11-20 03:29:55 Training - Downloading the training image...
2023-11-20 03:30:15 Training - Training image download completed. Training in progress....
2023-11-20 03:30:31 Uploading - Uploading generated training model..
2023-11-20 03:30:50 Completed - Training job completed


7\. Deploy the trained model to an HTTP endpoint  (save as variable `predictor`)

Using the following deployment parameters:

   - initial_instance_count=1, 
   - instance_type="ml.m4.xlarge"
   
This will take a few minutes.

In [22]:
predictor = xgb_model.deploy(initial_instance_count=1,
                serializer = sagemaker.serializers.CSVSerializer(),
                instance_type='ml.m4.xlarge')

-------!

8\. Test the deployment endpoint

- read one line from the test data. Note that it should not have the label column and should be in the same csv format.
- past the line to the end point using code such as this where 
    - `predictor` is the varilable returned from deploy() in the previous step. 
    - `payload` is the string-type input data (csv formatted).
    
The response from the end point is json type with body containing the predicted value. 


```
runtime_client = sagemaker_session.sagemaker_runtime_client
response = runtime_client.invoke_endpoint(
    EndpointName=predictor.endpoint_name, ContentType="text/csv", Body=payload
)
result = response["Body"].read().decode("ascii")
```

In [68]:
first_row_df

Unnamed: 0,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,sex_indexed
0,0.075,0.055,0.01,0.002,0.001,0.0005,0.0015,1.0


In [35]:
import pandas as pd


first_row_dict = test_df.head(1)[0].asDict()
first_row_df = pd.DataFrame([first_row_dict])

payload = first_row_df.to_csv(header=False, index=False).strip('\n')




23/11/20 03:47:06 WARN CSVHeaderChecker: CSV header does not conform to the schema.
 Header: M, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15
 Schema: sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings
Expected: sex but found: M
CSV file: file:///home/ec2-user/SageMaker/abalone.csv


In [37]:
# Assuming you have already deployed your model and have the predictor
sagemaker_session = sagemaker.Session()
runtime_client = sagemaker_session.sagemaker_runtime_client

# Invoking the endpoint
response = runtime_client.invoke_endpoint(
    EndpointName=predictor.endpoint_name, 
    ContentType="text/csv", 
    Body=payload
)

# Decoding the response
result = response["Body"].read().decode("ascii")
print("Predicted value:", result)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
Predicted value: 3.395925521850586



9\. (optional challenge) Extending the previous step by reading from validation dataset (so you know the ground truth)

- read 10 rows from the validation csv.
- split it so that you obtain the label as well as the string-type payload as input for the endpoint.
- feed each payload to the predictor, obtain the result and print both the label and the predicted value side by side.

In [60]:
import pandas as pd
data = pd.read_csv("s3://{}/{}/validation/validation.csv".format(bucket,prefix), nrows = 10)

In [61]:
cols = data.columns.to_list()
cols.remove("rings")


In [62]:
cols
val_data = data[["rings"]]
no_target_data = data[cols]

In [65]:
val_data["pred_rings"] = ""
for index, row in no_target_data.iterrows():
    val_data.loc[index, "pred_rings"] = float(predictor.predict(row.values))

In [66]:
val_data

Unnamed: 0,rings,pred_rings
0,1.0,3.395926
1,3.0,3.395926
2,3.0,4.333157
3,3.0,4.333157
4,4.0,3.395926
5,4.0,4.537944
6,4.0,3.689819
7,4.0,4.333157
8,4.0,4.333157
9,4.0,4.623317


In [None]:
Cleanup the endpoint

When you’re done using the endpoint, please run the cell below to delete the hosted endpoint and avoid any additional charges.



In [67]:
predictor.delete_model()
predictor.delete_endpoint()

10\. Using test folder on S3 as input for batch inference using the trained model. 

- output should go to a `testout` folder on s3 (with the same bucket and prefix)
- Use the following transformer parameters:
       instance_count=1,
       instance_type='ml.m4.xlarge',
       strategy='MultiRecord',
       assemble_with='Line'
       
This will take several minutes.

In [47]:
batch_X_file = 'test.csv'

In [54]:
batch_output = "s3://{}/{}/testout/".format(bucket,prefix)

batch_pre_process_input = "s3://{}/{}/test/test.csv".format(bucket,prefix)
batch_input = "s3://{}/{}/testout/test.csv".format(bucket,prefix)

pd.read_csv(batch_pre_process_input).to_csv(batch_input, index = False, header = False)

In [55]:


# Initialize the transformer object
xgb_transformer = xgb_model.transformer(
    instance_count=1,
    instance_type='ml.m4.xlarge',
    strategy='MultiRecord',
    assemble_with='Line',
    output_path=batch_output
)


xgb_transformer.transform(
    data=batch_input,
    content_type='text/csv',
    split_type='Line'
)


xgb_transformer.wait()


.......................................
[34m[2023-11-20:04:52:10:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2023-11-20:04:52:10:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2023-11-20:04:52:10:INFO] nginx config: [0m
[34mworker_processes auto;[0m
[34mdaemon off;[0m
[34mpid /tmp/nginx.pid;[0m
[34merror_log  /dev/stderr;[0m
[34mworker_rlimit_nofile 4096;[0m
[34mevents {
  worker_connections 2048;[0m
[34m}[0m
[34mhttp {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /dev/stdout combined;
  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }
  server {
    listen 8080 deferred;
    client_max_body_size 0;
    keepalive_timeout 3;
    location ~ ^/(ping|invocations|execution-parameters) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_read_timeout 60s;
      proxy_pass http://gunicorn;
    }
 

11\. Inspect the s3 folder using the technique introduced in step 5.

In [59]:
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key="{}/testout/{}".format(prefix,'test.csv.out'))
target_predicted = pd.read_csv(io.BytesIO(obj['Body'].read()),sep=',',names=['pred_rings'])
target_predicted.head(10)

Unnamed: 0,pred_rings
0,3.395926
1,3.395926
2,4.333157
3,4.333157
4,3.395926
5,4.537944
6,3.689819
7,4.333157
8,4.333157
9,4.623317


In [57]:
!aws s3 ls s3://$bucket/$prefix/testout/

2023-11-20 04:45:32      37904 test.csv
2023-11-20 04:52:18      15299 test.csv.out


In [58]:
!aws s3 cp s3://$bucket/$prefix/testout/test.csv.out - | head

3.395925521850586
3.395925521850586
4.333157062530518
4.333157062530518
3.395925521850586
4.5379438400268555
3.6898193359375
4.333157062530518
4.333157062530518
4.623316764831543


# Download and Clean up

