<a href="https://www.kaggle.com/code/yutodennou/tips-open-interpreter-titanic?scriptVersionId=154829503" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<a id="1"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 1. Purpose🎉 </b></div>

**Open Interpreter makes below codes automatically by simple prompt.  
Here is an easy way to analyze using Open Interpreter, even if someone is not familiar with preprocessing or models.** 

```python
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split

# Encode categorical variables
label_encoder = LabelEncoder()
train_data['Sex'] = label_encoder.fit_transform(train_data['Sex'])
test_data['Sex'] = label_encoder.transform(test_data['Sex']) 

# Fill missing values
imputer = SimpleImputer(strategy='mean')
train_data['Age'] = imputer.fit_transform(train_data[['Age']])
test_data['Age'] = imputer.transform(test_data[['Age']])
train_data['Fare'] = imputer.fit_transform(train_data[['Fare']])
test_data['Fare'] = imputer.transform(test_data[['Fare']])

# Split data into features and target
X_train = train_data.drop(['Survived'], axis=1)
y_train = train_data['Survived']
X_test = test_data 



from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import VotingClassifier
import pickle

# Define the models
decision_tree = DecisionTreeClassifier()
random_forest = RandomForestClassifier()
xgboost = xgb.XGBClassifier()

# Create the ensemble model
ensemble_model = VotingClassifier(estimators=[('dt', decision_tree), ('rf', random_forest), ('xgb', xgboost)], voting='hard')

# Perform cross-validation
decision_tree_scores = cross_val_score(decision_tree, X_train, y_train, cv=4)
random_forest_scores = cross_val_score(random_forest, X_train, y_train, cv=4)
xgboost_scores = cross_val_score(xgboost, X_train, y_train, cv=4)
ensemble_scores = cross_val_score(ensemble_model, X_train, y_train, cv=4)

# Calculate average accuracy
decision_tree_avg_accuracy = decision_tree_scores.mean()
random_forest_avg_accuracy = random_forest_scores.mean()
xgboost_avg_accuracy = xgboost_scores.mean()
ensemble_avg_accuracy = ensemble_scores.mean()

# Save the best model as a pickle file
best_model = max(decision_tree_avg_accuracy, random_forest_avg_accuracy, xgboost_avg_accuracy, ensemble_avg_accuracy)
if best_model == decision_tree_avg_accuracy:
    pickle.dump(decision_tree, open('decision_tree_model.pkl', 'wb'))
elif best_model == random_forest_avg_accuracy:
    pickle.dump(random_forest, open('random_forest_model.pkl', 'wb'))
elif best_model == xgboost_avg_accuracy:
    pickle.dump(xgboost, open('xgboost_model.pkl', 'wb'))
else:
    pickle.dump(ensemble_model, open('ensemble_model.pkl', 'wb'))


```

<a id="2"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 2. Import Library🗂️ </b></div>

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv


In [3]:
!pip install open-interpreter

Collecting open-interpreter
  Obtaining dependency information for open-interpreter from https://files.pythonhosted.org/packages/37/44/a57ad8ab7fcdaefb812c3c11eede170d94922b63bf56e3801bae71c91b44/open_interpreter-0.1.17-py3-none-any.whl.metadata
  Downloading open_interpreter-0.1.17-py3-none-any.whl.metadata (14 kB)
Collecting astor<0.9.0,>=0.8.1 (from open-interpreter)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting git-python<2.0.0,>=1.0.3 (from open-interpreter)
  Downloading git_python-1.0.3-py2.py3-none-any.whl (1.9 kB)
Collecting html2image<3.0.0.0,>=2.0.4.3 (from open-interpreter)
  Obtaining dependency information for html2image<3.0.0.0,>=2.0.4.3 from https://files.pythonhosted.org/packages/89/b9/1dc02a535c71ceed5b71bd0f82e5d31fba100b59d43a35d5c40d836a8525/html2image-2.0.4.3-py3-none-any.whl.metadata
  Downloading html2image-2.0.4.3-py3-none-any.whl.metadata (14 kB)
Collecting inquirer<4.0.0,>=3.1.3 (from open-interpreter)
  Obtaining dependency information fo

<a id="3"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 3. Set Up⚙️</b></div>


**If you run the following code and it does not work, please stop the kernel("Restart & clear cell output") and run again from here**

In [4]:
import interpreter
interpreter.auto_run = True

# Use GPT-3.5-turbo．
interpreter.model = "gpt-3.5-turbo"

In [5]:
# Your API key from here -> https://platform.openai.com/api-keys
interpreter.api_key = "YOUR API KEY"

<a id="4"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 4. Generate Codes 👉</b></div>


In [7]:
text = """
# Read csv data as dataset from '/kaggle/input/titanic/train.csv' as train data and '/kaggle/input/titanic/test.csv' as test data.
# Using the dataset, test the Decision Tree, Random Forest, XGBoost and an ensemble of the three models to see which is the most accurate. 
# The validation should be based on the average of the 4-fold cross-validation results.
# Save model as pickle file
# Save codes as notebook file
"""

Results = interpreter.chat(text)

Output()

Output()

Output()

Output()

Output()

Output()

Output()

**By above simple code lines, what I put in "1.Purpose" generats as one result.
This is really easy way to get fundamental models!!**

<a id="5"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> Bonus: Bad Example🙅 </b></div>

In [None]:
text = """
# Read csv data as dataset from '/kaggle/input/titanic/train.csv' as train data and '/kaggle/input/titanic/test.csv' as test data.
# Using the dataset, make one accuraet model. 
# The validation method is k-fold cross-validation.
# show the scores by print()
# Save model and processing code
"""

results = interpreter.chat(text)

**Be specific about conditions that can be determined because it may not generate well, such as trying only one model or other performance issues, or making corrections to errors over and over again.**

**Even with the same prompt, the approach may be slightly different, and you may not get to the final result. Therefore, if the model does not work, try again in the following way**

1. Re-run without changing anything.
2. Write in detail where the error occurs and re-run.