As a senior data scientist in a corporate setting, I'll outline the steps to build a regression model for a pharmaceutical company. Let's assume the business problem is to predict the efficacy of a new drug based on various factors such as patient demographics, medical history, and genetic information.

**Data Collection and Storage**

The pharmaceutical company uses a combination of electronic health records (EHRs) and clinical trial management systems (CTMS) to store patient data. The EHRs are stored in a relational database management system (RDBMS) such as MySQL, while the CTMS data is stored in a cloud-based platform like Amazon Redshift.

To extract the data from the client, I would use Amazon Web Services (AWS) services such as:

1. **AWS Database Migration Service (DMS)**: To migrate the EHR data from the on-premises MySQL database to Amazon Redshift.
2. **AWS Glue**: To extract, transform, and load (ETL) the CTMS data from the cloud-based platform to Amazon Redshift.

**Data Lake/Data Warehouse**

For this project, I would use Amazon Redshift as the data warehouse, as it provides a scalable and secure platform for storing and analyzing large datasets.

**Data Extraction and Model Building**

To extract the data from Amazon Redshift and build the regression model, I would use:

1. **AWS Sagemaker**: A fully managed service that provides a platform for building, training, and deploying machine learning models.
2. **AWS Sagemaker Notebook**: A web-based interface for data scientists to build, train, and deploy machine learning models.

**Choosing the Right Regression Model**

To choose the right regression model, I would use a combination of domain expertise, data analysis, and experimentation. I would loop over various regression models such as:

1. **Linear Regression**: A simple and interpretable model that assumes a linear relationship between the features and target variable.
2. **Ridge Regression**: A linear regression model with L2 regularization to prevent overfitting.
3. **Lasso Regression**: A linear regression model with L1 regularization to prevent overfitting and select features.
4. **Random Forest Regressor**: An ensemble model that combines multiple decision trees to predict the target variable.
5. **Gradient Boosting Regressor**: An ensemble model that combines multiple decision trees to predict the target variable, with a focus on gradient boosting.

I would evaluate the performance of each model using metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared. I would also use techniques such as cross-validation to prevent overfitting and ensure that the model generalizes well to unseen data.

**Tuning the Model**

To tune the model, I would use a combination of grid search and random search to find the optimal hyperparameters. I would write a separate Python script using modular programming to perform the hyperparameter tuning.

**Design Pattern**

To implement the regression model, I would use a design pattern such as the **Model-View-Controller (MVC)** pattern. This pattern separates the model (regression algorithm), view (data visualization), and controller (data processing) into separate components, making it easier to maintain and update the code.

**ZenML**

To build a pipeline and automate the workflow, I would use **ZenML**, a Python library that provides a simple and intuitive way to define and run data pipelines. ZenML allows me to define the pipeline using a Python script and then run it on a schedule or on-demand.

**Pipeline**

The pipeline would consist of the following steps:

1. **Data Ingestion**: Extract data from Amazon Redshift and load it into AWS Sagemaker.
2. **Data Preprocessing**: Clean and preprocess the data using AWS Sagemaker Notebook.
3. **Model Building**: Build and train the regression model using AWS Sagemaker.
4. **Model Evaluation**: Evaluate the performance of the model using metrics such as MSE, MAE, and R-squared.
5. **Model Deployment**: Deploy the model to a production environment using AWS Sagemaker.
6. **Prediction**: Use the deployed model to make predictions on new data and store the results in Amazon Redshift.

**Automation**

To automate the pipeline, I would use **AWS CloudWatch Events** to schedule the pipeline to run on a regular basis (e.g., daily, weekly). I would also use **AWS Lambda** to run the pipeline on-demand, in response to changes in the data or model.

**Tools, Software, and Services**

The tools, software, and services used in this project are:

1. **AWS**: Amazon Web Services, including Amazon Redshift, AWS Sagemaker, AWS Glue, and AWS Lambda.
2. **Python**: The programming language used for data analysis, model building, and pipeline automation.
3. **ZenML**: A Python library for building and running data pipelines.
4. **AWS CloudWatch Events**: A service for scheduling and running tasks on a regular basis.
5. **AWS Lambda**: A service for running code on-demand, in response to events.

Here's a sample code snippet that demonstrates the pipeline:
```python
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from zenml import pipeline

# Define the pipeline
@pipeline
def regression_pipeline():
    # Data ingestion
    data = pd.read_csv('data.csv')
    
    # Data preprocessing
    X = data.drop('target', axis=1)
    y = data['target']
    
    # Model building
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    
    # Model evaluation
    y_pred = model.predict(X_test)
    mse = np.mean((y_test - y_pred) ** 2)
    mae = np.mean(np.abs(y_test - y_pred))
    print(f'MSE: {mse:.2f}, MAE: {mae:.2f}')
    
    # Model deployment
    model.save('model.pkl')
    
    # Prediction
    new_data = pd.read_csv('new_data.csv')
    new_X = new_data.drop('target', axis=1)
    new_y_pred = model.predict(new_X)
    print(new_y_pred)

# Run the pipeline
regression_pipeline.run()
```
This code snippet defines a pipeline that ingests data, preprocesses it, builds a random forest regressor model, evaluates its performance, deploys the model, and makes predictions on new data. The pipeline is defined using the `@pipeline` decorator and can be run using the `run()` method.