# Identify fake job postings! - part 3

Problem statement:

> My friend is on the job market. However, they keep wasting time applying for fraudulent job postings. They have asked me to use my data skills to filter out fake postings and save them effort.
> They have mentioned that job postings are abundant, so they would prefer my solution to risk filtering out real posts if it decreases the number of fraudulent posts they apply to.
> I have access to a dataset consisting of approximately 18'000 job postings, containing both real and fake jobs.


## Story published with Jupyter2Hashnode

Have you ever struggled to convert a Jupyter Notebook into a compelling Hashnode story? If so, you're not alone. It can be a daunting task, but fortunately, there's a tool that can simplify the process: Jupyter2Hashnode.

With Jupyter2Hashnode, you can convert Jupyter Notebooks into Hashnode stories with just a single command. The tool compresses images, uploads them to the Hashnode server, updates image URLs in the markdown file, and finally, publishes the story article. It's an effortless way to transform your data analysis or code tutorials into a polished and engaging format.

If you're interested in learning more about Jupyter2Hashnode, there's a detailed guide available on Hashnode (https://tiagopatriciosantos.hashnode.dev/jupyter2hashnode-an-effortless-way-to-convert-jupyter-notebooks-to-hashnode-stories). It's a game-changing tool that can save you time and energy while helping you create high-quality content for your audience. Give it a try and see the difference for yourself!


# The part 3

This end-2-end ML (Machine Learning) project is divided into a 3 part series.

- Part 1 - is all about getting to know the Dataset using Exploratory analysis, cleaning data, choosing the metrics and doing the first model prediction experiments.
- Part 2 - is about setup of DagsHub,  DVC and MLFlow to create a version-controlled data science project, as well as tracking experiment parameters and metrics, and comparing experiments.
- Part 3 - is all about deployment, where using MLFlow and FastApi we will deploy the model into a WebAPI and serve it with Mogenius, a Virtual DevOps platform.

You can check this github project [here.](https://github.com/tiagopatriciosantos/FastApiFakeJobPost)


❗⚠ At time of writing this article Mogenius have suspended new registrations, check their website if they have opened again in the [registration page](https://studio.mogenius.com/user/registration)
>The public registration is currently suspended due to exceptionally high demand.
>We will offer it again soon. Contact us if you would like to see a demo of mogenius.

## Tools

For this part I will use git as and VS Code as editor.

Follow the instructions to install:
- [Git](https://github.com/git-guides/install-git)
- [VS Code](https://code.visualstudio.com/download)

I assume to have a working Python 3 installation on local system.

It also assumes that we have already logged a model into DagsHub MLflow tracking server.


## What is Mogenius?

https://mogenius.com/

Mogenius is the single layer between your application and the cloud. You can deploy and run any application with mogenius and get it up and running in no time on a hyper-scalable and automated cloud infrastructure. Most application types and services are supported, like web applications, databases, background workers and of course static websites. 

Read more about [supported services here](https://docs.mogenius.com/services/service-overview).


## For free 

With Community plan we can:
- Run our personal projects and prototypes on mogenius.
- Auto-deployment on Kubernetes
- Hyperscaling cloud resources on AWS or Azure
- CI/CD pipeline
- CDN, cybersecurity protection, SSL management
- Access to the mogenius developer community with monthly free cloud resources and more benefits

We can compare plans in detail on [pricing page](https://mogenius.com/pricing)


I will show the steps that I've used to setup the project, although feel free to follow Mogenius tutorials to have a broader understanding:
1. https://docs.mogenius.com/getting-started/quickstart
2. https://docs.mogenius.com/tutorials/how-to-deploy-python-in-the-cloud
3. https://docs.mogenius.com/tutorials/how-to-deploy-fastapi-in-the-cloud



## Joining Mogenius...

First, [sign up](https://studio.mogenius.com/user/registration) by entering our email address and choosing a password. Next, verify our email address and phone number to secure our mogenius account. Once completed, we are ready to create our first cloudspace.


[![](https://api.mogenius.com/file/id/48f657d6-2032-4b79-95f5-2f15f02e7e4e)](https://studio.mogenius.com/user/registration)



## Create a Cloudspace

Start our first project on mogenius by creating a cloudspace. Give it a name with a maximum of 24 characters, no spaces, or special characters. Click "Create now" and our cloudspace will be created using the mogenius Community Plan.

![](https://api.mogenius.com/file/id/d9210359-7406-42f4-8d8f-854205294ce8)

🥳 Congratulations on creating our first cloudspace on mogenius!

## Add your first service, FastAPI
One of the initial tasks is to add services to our cloudspace (e.g. application, database). When we first start, we'll see a pop-up window below. Alternatively, we can add services from our cloudspace dashboard, where we'll also see the available resources in our cloudspace. There are three ways to add services to our cloudspace, we will use a pre-configured service template to create our FastAPI:

![](https://api.mogenius.com/file/id/1d25d25c-2715-4a3e-8201-ec8ceac94cef)


With this option, mogenius will automatically create and add a boilerplate FastAPI template to your Git repository, allowing you to start coding in the newly created repo or to use existing code. Browse the service library or use the search function to find the FastAPI service, then click "Add service."

![](https://i.imgur.com/6saPc3S.png)

Next, if this is the first time you are deploying a service, we need to connect your cloudspace to your repository. Click on “Continue with GitHub,” which will prompt you to grant permission to access your GitHub repositories. You will only need to do this once, as your mogenius cloudspace will now be connected to your GitHub account and can access your repositories.

Next, we create a new repository by clicking “+ Add repository.” Select a name for the new repository and create it. By default, this will also be the name of our service, but we can also change it to a different name.

We can leave all settings at default for now, as we can change them at any point later when the service is up and running.

Now, simply click "Create Service." Our FastAPI boilerplate template will be built, added to the specified Git repository, and deployed to our cloudspace simultaneously, allowing to start using it almost immediately. Once the setup routines, build, and deployment process are complete (usually only a few minutes), we can start coding in our repository and access our FastAPI at the specified hostname. Every time we commit any changes to our repository, it will trigger a new build-deploy process automatically (CI/CD).

We can find all the details on our service's overview page, view metrics, access service logs, add resources, and add additional instances for our service (Kubernetes pods).

That's it! We have created the FastAPI service, and it will be available to access by other services via the internal hostname that has been assigned to our service, e.g. fastapi-template-8b4tp5:3000. We will choose to expose this service, we will have an external hostname that can be accessed from outside our cloudspace, it looks like this: fastapi-template-prod-myaccount-afooyl.mo2.mogenius.io:80

If we go to the Github repository we can see the result of this creation:
![](https://i.imgur.com/rBdT99U.png)





## MLFlow changing stage to production


Now let's put our model in the "Production" stage, we will use the production stage model to deploy into our WebAPI.

Access into the MLFlow UI:

![](https://i.imgur.com/yaJNfXf.png)


Open the model:

![](https://i.imgur.com/7C1dsFD.png)


Set into production:

![](https://i.imgur.com/97G507c.png)

Say "OK" and voilá.

![](https://i.imgur.com/yW9kOnn.png)

## Cloning the FastAPI Project

Let's now clone the repository into our local machine, copying the clone command on Github repository.

![](https://i.imgur.com/rBdT99U.png)


Execute this commands in the command line:
```console
cd path/to/folder
git clone https://github.com/tiagopatriciosantos/FastApiFakeJobPost.git
cd FastApiFakeJobPost
```

With VS Code already installed we can now run:
```console
code .
```

That will open the VS Code editor.

## Creating a virtual python environment

To create and activate our virtual python environment using venv, type the following commands into your terminal (still in the project folder):


Linux/Mac
```console
python3 -m venv .venv
echo .venv/ >> .gitignore
source .venv/bin/activate
```
Windows
```powershell
python3 -m venv .venv
echo .venv/ >> .gitignore
.venv\Scripts\activate.bat
```


The first command creates the virtual environment - a directory named .venv, located inside your project directory, where all the Python packages used by the project will be installed without affecting the rest of your computer.

The second command activates the virtual python environment, which ensures that any python packages we use don't contaminate our global python installation.

The rest of this tutorial should be executed in the same shell session.
If exit the shell session or want to create another, we need to make sure to activate the virtual environment in that shell session first.



## Installing requirements
To install the requirements open the requirements.txt and place the text inside with these direct dependencies:
```plaintext
pydantic>=1.8.0,<2.0.0
uvicorn==0.20.0
fastapi==0.89.1
pandas==1.5.3
scikit-learn==1.2.0
rich==13.3.0
mlflow==2.1.1
python-multipart==0.0.5
python-dotenv==0.21.1
```

Now, to install type:
```console
pip install -r requirements.txt
```

## Load and serve the model

### `app/main.py`

Open and put the fowling code into `app/main.py` file.

This Python code defines a FastAPI application that loads a pre-trained ML model and uses it to make predictions on input data provided by a user through a CSV file.

The code imports the following modules:

- `FastAPI`: A web framework for building APIs quickly and easily.
- `File` and `UploadFile` from FastAPI: These are used for handling file uploads in the application.
- `HTTPException` from FastAPI: This is used to raise HTTP exceptions when there are errors in the application.
- `mlflow` A machine learning platform for managing the ML lifecycle, including experiment tracking, packaging code into reproducible runs, and sharing and deploying models.
- `pandas`: A library for data manipulation and analysis.
- `print` from `rich`: A library for pretty-printing information to the console.

The code defines a `Model` class to store the pre-trained model and use it for prediction. The `__init__` method of this class loads the deployed model using `mlflow.pyfunc.load_model()` and the `predict` method uses the loaded model to make predictions on new data. The `get_schema` and `get_columns` methods return information about the input schema of the model.

The code defines a `POST` endpoint with the path `/predict` that accepts a CSV file and returns a JSON object containing the predictions of the model on the input data. If the file is not a CSV file, the application raises an HTTP 400 Exception indicating that only CSV files are accepted.

The code also defines two `GET` endpoints with the paths `/schema` and `/info` that return information about the input schema and model information, respectively.

Finally, the code creates an instance of the `Model` class using the `main` model name and a tracking URI, sets up the FastAPI application with the initialized `Model` instance, and prints a message indicating that the setup is complete.

```python
from fastapi import FastAPI, File, UploadFile, HTTPException
import mlflow
import pandas as pd
from rich import print

## loads environment variables from .env file
from dotenv import load_dotenv
load_dotenv() 

# Initialize the FastAPI application
app = FastAPI(docs_url="/")

# Create a class to store the deployed model & use it for prediction
class Model:
    def __init__(self, model_name: str, tracking_uri ):
        """
        To initalize the model
        modelname: Name of the model stored
        tracking_uri: tracking_uri
        """
        # Load the deployed model 
        self.model_name = model_name
        mlflow.set_tracking_uri(tracking_uri)
        uri =f"models:/{self.model_name}/Production"
        
        self.model = mlflow.pyfunc.load_model(uri)
        
    def predict(self, data):
        """
        To use the loaded model to make predictions on the data
        data: Pandas DataFrame to perform predictions
        """
        predictions = self.model.predict(data)
        return {  str(k): str(v) for k, v in enumerate(predictions) }

    def get_schema(self, to_dtypes=False):
        schema = self.model.metadata.signature.inputs.to_dict()
        if to_dtypes:
            schema = {r["name"]:  ( r["type"] if r["type"] !="string" else "object" )  for r in schema  }
        return schema

    def get_columns(self):
        schema = self.model.metadata.signature.inputs.to_dict()
        return [ r["name"]  for r in schema  ]

    def get_info(self):
        client = mlflow.MlflowClient()
        mv = [mv for mv in client.search_model_versions(self.model_name) if mv.current_stage == 'Production' ]
        return dict(mv[0])

model = Model("main","https://dagshub.com/tiagopatriciosantos/FakeJobPostsProject.mlflow")
print("All setup!")

# Create the POST endpoint with path '/predict'
@app.post("/predict", tags=["Fake Job"])
async def create_upload_file(file: UploadFile = File(...)):
    # Handle the file only if it is a CSV
    if file.filename.endswith(".csv"):
        # CSV file to load the data into a pandas Dataframe
        data = pd.read_csv(file.file, dtype=model.get_schema(True), usecols=model.get_columns())
        
        # Return a JSON object containing the model predictions
        labels ={
            "Labels": model.predict(data)
        }
        return  labels
    else:
        # Raise a HTTP 400 Exception, indicating Bad Request 
        raise HTTPException(status_code=400, detail="Invalid file format. Only CSV Files accepted.")


@app.get("/schema", tags=["Fake Job"])
async def get_schema():
    return model.get_schema()

@app.get("/info", tags=["Fake Job"])
async def get_info():
    return model.get_info()
```


To test the code we need to connect into the Dagshub MLFlow server, we can set the environment variables into a `.env` file, as we have `load_dotenv` set up, or we can set environment variables in our command line.


### `.env`

This file stores the necessary environment variables and will be used when calling `load_dotenv()`

```plaintext
MLFLOW_TRACKING_USERNAME=tiagopatriciosantos
MLFLOW_TRACKING_PASSWORD=<secret>
```


🚩🚨 Don't forget to include this file in the `.gitignore` file, you don't want to push to your public repository your secrets.
```console
echo .env >> .gitignore
```


We can get the necessary MLFlow values from Dagshub repository:

![](https://i.imgur.com/yaJNfXf.png)


### Test the API

We can now test the API using the fowling command:
    
`uvicorn app.main:app`

```console
All setup!
INFO:     Started server process [28372]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 
```

We can now access the address [http://127.0.0.1:8000](http://127.0.0.1:8000) and test our api.


    

## The docker file


We don't need to make nothing in this file as this already have been setup by Mogenius, the Dockerfile is used to build a Docker image that will run a Python web application using the Uvicorn web server.

Here is a breakdown of the file:

- This line specifies the base image for the Docker image, which is the official Python 3.9 image from Docker Hub.
`FROM python:3.9`

- This line sets the working directory for the container to /code.

 `WORKDIR /code`

- These lines copy the requirements.txt file from the local file system to the container's /code directory, and then installs the Python dependencies specified in the requirements.txt file using pip.

```console
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
```

- This line copies the app directory from the local file system to the container's /code/app directory.

`COPY ./app /code/app`

- This line specifies that the container will listen on port 8080. However, this does not actually publish the port - it just documents that the container will use it.
`EXPOSE 8080`

- This line sets the user to run the container as to 1000. This is useful for security purposes, as it helps to ensure that the container runs with minimal privileges.
`USER 1000`

- This line specifies the command that should be run when the container starts. It runs the Uvicorn web server with the app.main:app module as the application, and binds to the container's port 8080.

`CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]`


The `Dockerfile` file commands:
```dockerfile
FROM python:3.9
 
WORKDIR /code
 
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
COPY ./app /code/app

EXPOSE 8080

USER 1000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
```



## Committing progress to Git

Create the file `.gitignore` and past this text inside:
```
.venv
__pycache__
.env
```

Let's check the Git status of our project:

```console
$ git status -s
 M app/main.py
 M requirements.txt
?? .gitignore
```

Now let's commit this to Git using the command line:

```console
git add .
git commit -m "Added MLFlow, serve the model logic and endpoint"
git push -u origin main
```



## Mogenius Environment variables & secrets

You can define environment variables and secrets using Mogenius UI. 
Each secret is encrypted and then stored in the key vault. 
To use a particular secret call its name to get the encrypted key, this way, a secret is never written in code, but retrieved from the Key Vault in a secure way.

We need to create this environment variables so our code can run:
```plaintext
MLFLOW_TRACKING_USERNAME=tiagopatriciosantos
MLFLOW_TRACKING_PASSWORD=<secret>
```

Go to Mogenius studio and add this into the service environment variables:

![](https://i.imgur.com/e7Y6ANO.png)




## Checking the final result


We can now go to the external address of our service and use our API...

![](https://i.imgur.com/XevmHKS.png)

And test the example file available [here](https://github.com/tiagopatriciosantos/FastApiFakeJobPost/blob/6b7ee11a9fc92ed3b642b67b4bfec69429fbcf83/fake_job_postings_test.csv)

![](https://i.imgur.com/Ze8knw3.png)





# Conclusion

The conclusion of this end-to-end ML project highlights the three-part series in which the project is divided. Part 1 covers the exploratory analysis, data cleaning, metric selection, and initial model prediction experiments. Part 2 focuses on setting up DagsHub, DVC, and MLFlow for version control, tracking experiment parameters and metrics, and comparing experiments. Finally, Part 3 focuses on deployment, where MLFlow and FastAPI are used to deploy the model into a WebAPI and serve it with Mogenius, a Virtual DevOps platform. The three parts of the project work together to provide a comprehensive overview of end-to-end ML development, from data exploration to deployment.