# Учебный Проект → «Золото» ✨

Подготовьте прототип модели машинного обучения для «Цифры». Компания разрабатывает решения для эффективной работы промышленных предприятий.  
Модель должна предсказать коэффициент восстановления золота из золотосодержащей руды. В вашем распоряжении данные с параметрами добычи и очистки.  
Модель поможет оптимизировать производство, чтобы не запускать предприятие с убыточными характеристиками.  
Вам нужно:

- Подготовить данные;
- Провести исследовательский анализ данных;
- Построить и обучить модель;

$$
recovery = \frac {C * (F - T) }{F * (C -T)} * 100% 
$$

где:

- C — доля золота в концентрате после флотации/очистки;
- F — доля золота в сырье/концентрате до флотации/очистки;
- T — доля золота в отвальных хвостах после флотации/очистки;

## Описание данных

Данные находятся в трёх файлах:

- `gold_recovery_train_new.csv` — обучающая выборка;
- `gold_recovery_test_new.csv` — тестовая выборка;
- `gold_recovery_full_new.csv` — исходные данные;

### Технологический процесс

`Rougher feed` — исходное сырье  
`Rougher additions` (или reagent additions) — флотационные реагенты: Xanthate, Sulphate, Depressant  
- `Xanthate` **— ксантогенат (промотер, или активатор флотации);  
- `Sulphate` — сульфат (на данном производстве сульфид натрия);  
- `Depressant` — депрессант (силикат натрия).  

`Rougher process` (англ. «грубый процесс») — флотация  
`Rougher tails` — отвальные хвосты  
`Float banks` — флотационная установка  
`Cleaner process` — очистка  
`Rougher Au` — черновой концентрат золота  
`Final Au` — финальный концентрат золота  

# Загружаем данные

Импортируем библиотеки 🎒

In [1]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import RandomState


from matplotlib import rcParams

from math import sqrt

from statistics import mean
from statistics import stdev

from scipy.stats import t
from scipy.stats import bootstrap

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.utils import shuffle
from sklearn.pipeline import make_pipeline

Настроим вид графиков по+красоте ✨

In [2]:
%config InlineBackend.figure_formats = ['svg']

In [3]:
# style MATPLOTLIBRC
custom_params = {
                'figure.figsize': (10, 6),
                'figure.facecolor': '#232425',
                'figure.dpi': 240,

                'legend.frameon': False,
                'legend.borderpad': 1.4,
                'legend.labelspacing': 0.7,
                'legend.handlelength': 0.7,
                'legend.handleheight': 0.7,

                'axes.facecolor': '#232425',
                'axes.labelcolor': '#EEEEEE',
                'axes.labelpad': 17,
                'axes.spines.left': False,
                'axes.spines.bottom': False,
                'axes.spines.right': False,
                'axes.spines.top': False,
                'axes.grid': False,

                'contour.linewidth': 0.0,

                'xtick.color': '#AAAAAA',
                'ytick.color': '#AAAAAA',
                'xtick.bottom': True,
                'xtick.top': False,
                'ytick.left': True,
                'ytick.right': False,
    
                "lines.color": '#EEEEEE',

                'text.color': '#EEEEEE',
    
                'font.family': 'sans-serif',
                # 'font.sans-serif': [
                #     'Helvetica',
                #     'Verdana',
                #     'Tahoma',
                #     'Trebuchet MS',
                #     'Arial',
                #     'Chevin'
                #     ]
                }

# rcParams.update(custom_params)

Константы.

In [4]:
random_seed = 108108108
random_np = RandomState(128) 
dpi_k = custom_params['figure.dpi'] / rcParams['figure.dpi']
px = 1/custom_params['figure.dpi']

Функции.

In [5]:
def baisic_df_info(data_df, title='Basic Info'):
    print(title, end='\n\n')
    print('Дубликатов:',
             len(data_df.loc[data_df.duplicated()].index),
          end='\n\n'
     )
    
    display(
        data_df.info(),
        data_df.sample(5),
        data_df.describe(),
    )

## 1. Подготовим данные

### 1.1. Загрузим файлы и изучим их.

> Данные индексируются датой и временем получения информации (признак date)

In [15]:
try:
    gold_recovery_train = pd.read_csv(
        './datasets/gold_recovery_train_new.csv',
        index_col='date',
    )
    gold_recovery_test = pd.read_csv(
        './datasets/gold_recovery_test_new.csv',
        index_col='date',
    )
    gold_recovery_full = pd.read_csv(
        './datasets/gold_recovery_full_new.csv',
        index_col='date',
    )
    
except FileNotFoundError:
    gold_recovery_train = pd.read_csv(
        'https://code.s3.yandex.net/datasets/gold_recovery_train_new.csv',
        index_col='date',
    )
    gold_recovery_test = pd.read_csv(
        'https://code.s3.yandex.net/datasets/gold_recovery_test_new.csv',
        index_col='date',
    )
    gold_recovery_full = pd.read_csv(
        'https://code.s3.yandex.net/datasets/gold_recovery_full_new.csv',
        index_col='date',
    )
    print('FYI datasets loaded via url')

In [17]:
gold_recovery_train.index = pd.to_datetime(gold_recovery_train.index)
gold_recovery_test.index = pd.to_datetime(gold_recovery_test.index)
gold_recovery_full.index = pd.to_datetime(gold_recovery_full.index)

In [22]:
baisic_df_info(gold_recovery_train, 'Обучающая Выборка')

Обучающая Выборка

Дубликатов: 0

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14149 entries, 2016-01-15 00:00:00 to 2018-08-18 10:59:59
Data columns (total 86 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   final.output.concentrate_ag                         14148 non-null  float64
 1   final.output.concentrate_pb                         14148 non-null  float64
 2   final.output.concentrate_sol                        13938 non-null  float64
 3   final.output.concentrate_au                         14149 non-null  float64
 4   final.output.recovery                               14149 non-null  float64
 5   final.output.tail_ag                                14149 non-null  float64
 6   final.output.tail_pb                                14049 non-null  float64
 7   final.output.tail_sol                               14144 non-null  float64
 8   final.o

None

Unnamed: 0_level_0,final.output.concentrate_ag,final.output.concentrate_pb,final.output.concentrate_sol,final.output.concentrate_au,final.output.recovery,final.output.tail_ag,final.output.tail_pb,final.output.tail_sol,final.output.tail_au,primary_cleaner.input.sulfate,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-02-15 00:00:00,5.993936,11.219458,6.816658,42.43942,61.881954,10.867237,1.261555,13.270269,2.539981,118.284616,...,17.976637,-533.106078,16.029773,-509.245609,16.080797,-457.01796,12.103695,-499.926782,25.014208,-490.584026
2018-05-18 11:59:59,6.578948,7.984402,8.21453,43.465774,60.428917,9.80813,1.380813,10.878882,2.766377,102.187986,...,30.000564,-387.574231,20.844773,-237.840775,24.987691,-477.851088,22.979703,-498.538986,25.027242,-499.691657
2016-02-17 09:00:00,5.980593,9.785551,6.111711,43.38543,65.115619,10.823257,1.376406,14.262405,2.449873,122.706501,...,18.000236,-501.472698,15.977133,-504.309112,15.970154,-500.286338,11.972989,-500.447159,27.023439,-500.091087
2016-05-09 17:59:59,6.315692,10.326579,8.592155,41.023749,63.074531,6.706009,1.956455,12.647087,2.60796,133.930858,...,16.06859,-501.250144,13.909998,-500.336056,11.09302,-498.721237,6.004939,-500.448148,20.005331,-498.725086
2017-03-02 09:59:59,5.765678,10.574901,8.399307,44.456015,72.705002,10.82878,3.802091,7.058426,3.306791,95.009449,...,25.025558,-401.298225,23.015848,-400.014997,23.744413,-449.465961,19.993038,-450.067355,24.994203,-500.109754


Unnamed: 0,final.output.concentrate_ag,final.output.concentrate_pb,final.output.concentrate_sol,final.output.concentrate_au,final.output.recovery,final.output.tail_ag,final.output.tail_pb,final.output.tail_sol,final.output.tail_au,primary_cleaner.input.sulfate,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
count,14148.0,14148.0,13938.0,14149.0,14149.0,14149.0,14049.0,14144.0,14149.0,14129.0,...,14143.0,14148.0,14148.0,14148.0,14148.0,14148.0,14148.0,14148.0,14147.0,14148.0
mean,5.142034,10.13296,9.202849,44.003792,66.518832,9.607035,2.597298,10.512122,2.918421,133.320659,...,19.985454,-478.696836,15.487065,-460.229416,16.775136,-483.956022,13.06459,-483.966564,19.577539,-506.79848
std,1.369586,1.65493,2.790516,4.905261,10.295402,2.319069,0.971843,3.003617,0.903712,39.431659,...,5.657723,50.736021,5.255655,58.843586,5.831906,37.892788,5.765617,39.207913,5.764417,37.079249
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003112,...,0.0,-799.709069,0.0,-799.889113,-0.372054,-797.142475,0.646208,-800.00618,0.195324,-809.398668
25%,4.21162,9.297355,7.484645,43.276111,62.545817,7.997429,1.905973,8.811324,2.368607,107.006651,...,14.990775,-500.628656,11.894558,-500.149,11.08398,-500.363177,8.994405,-500.105994,14.989304,-500.745104
50%,4.994652,10.297144,8.845462,44.872436,67.432775,9.48027,2.592022,10.514621,2.851025,133.018328,...,20.001789,-499.68145,14.975536,-499.388738,17.932223,-499.702452,11.997547,-499.914556,19.984175,-500.061431
75%,5.85954,11.170603,10.487508,46.166425,72.346428,11.003707,3.241723,11.933009,3.434764,159.825396,...,24.990826,-477.472413,20.059375,-400.039008,21.34655,-487.712108,17.982903,-453.186936,24.991623,-499.536466
max,16.001945,17.031899,18.124851,52.756638,100.0,19.552149,5.639565,22.31773,8.197408,250.127834,...,30.115735,-245.239184,24.007913,-145.071088,43.709931,-275.073125,27.926001,-157.396071,32.188906,-104.427459


In [9]:
baisic_df_info(gold_recovery_test, 'Тестовая Выборка')

Тестовая Выборка

Дубликатов: 0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5290 entries, 0 to 5289
Data columns (total 53 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   date                                        5290 non-null   object 
 1   primary_cleaner.input.sulfate               5286 non-null   float64
 2   primary_cleaner.input.depressant            5285 non-null   float64
 3   primary_cleaner.input.feed_size             5290 non-null   float64
 4   primary_cleaner.input.xanthate              5286 non-null   float64
 5   primary_cleaner.state.floatbank8_a_air      5290 non-null   float64
 6   primary_cleaner.state.floatbank8_a_level    5290 non-null   float64
 7   primary_cleaner.state.floatbank8_b_air      5290 non-null   float64
 8   primary_cleaner.state.floatbank8_b_level    5290 non-null   float64
 9   primary_cleaner.state.floatbank8_c_air      5290 non

None

Unnamed: 0,date,primary_cleaner.input.sulfate,primary_cleaner.input.depressant,primary_cleaner.input.feed_size,primary_cleaner.input.xanthate,primary_cleaner.state.floatbank8_a_air,primary_cleaner.state.floatbank8_a_level,primary_cleaner.state.floatbank8_b_air,primary_cleaner.state.floatbank8_b_level,primary_cleaner.state.floatbank8_c_air,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
2140,2016-12-11 00:59:59,170.362721,11.97889,7.53,1.094923,1511.924778,-500.012699,1508.475453,-499.173149,1500.232078,...,17.056135,-500.551062,15.004442,-500.423358,16.13373,-499.965725,11.999298,-492.509245,20.995931,-500.469487
1438,2016-11-10 15:59:59,7.329299,0.335485,6.99,0.054684,1600.116175,-500.149965,1600.960498,-498.449058,1601.817586,...,15.93835,-503.15789,14.002322,-501.582806,15.077042,-499.250256,13.993569,-499.955098,21.985435,-503.318844
854,2016-10-16 15:59:59,214.091444,19.404826,7.57,1.019044,1603.8737,-498.44475,1701.334198,-500.376752,1699.336117,...,19.982961,-500.796811,14.984332,-499.034149,7.880064,-483.482277,14.048928,-281.588062,16.967569,-497.025051
1687,2016-11-21 09:59:59,182.224262,5.923422,7.01,1.201413,1592.7789,-500.452403,1640.65937,-499.907762,1648.795263,...,18.002312,-499.980764,16.001364,-502.578984,14.871928,-499.08242,11.989468,-495.574619,21.991911,-500.347853
5133,2017-12-24 03:59:59,76.95901,8.092793,8.29,0.6882,1602.781533,-379.919951,1599.060886,-419.917169,1389.537223,...,19.995416,-498.131112,14.823352,-499.44727,11.001331,-497.747672,8.960964,-499.949095,11.98486,-639.729381


Unnamed: 0,primary_cleaner.input.sulfate,primary_cleaner.input.depressant,primary_cleaner.input.feed_size,primary_cleaner.input.xanthate,primary_cleaner.state.floatbank8_a_air,primary_cleaner.state.floatbank8_a_level,primary_cleaner.state.floatbank8_b_air,primary_cleaner.state.floatbank8_b_level,primary_cleaner.state.floatbank8_c_air,primary_cleaner.state.floatbank8_c_level,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
count,5286.0,5285.0,5290.0,5286.0,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0,...,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0,5290.0
mean,174.839652,8.683596,7.266339,1.383803,1539.494,-497.665883,1545.174,-500.273098,1527.272,-498.33068,...,16.32007,-505.14457,13.73544,-463.349858,12.804186,-501.329122,9.881145,-495.663398,17.304935,-501.793193
std,43.02708,3.07205,0.610219,0.643474,116.7979,19.952431,122.2246,32.968307,122.538,21.964876,...,3.493583,31.427337,3.430484,86.189107,3.026591,17.951495,2.868205,34.535007,4.536544,39.044215
min,2.566156,0.003839,5.65,0.004984,5.44586e-32,-795.316337,6.647490000000001e-32,-799.997015,4.033736e-32,-799.960571,...,1.079872e-16,-799.798523,2.489718e-17,-800.836914,0.069227,-797.323986,0.528083,-800.220337,-0.079426,-809.741464
25%,147.121401,6.489555,6.89,0.907623,1498.936,-500.357298,1498.971,-500.703002,1473.23,-501.018117,...,14.03618,-500.868258,12.02862,-500.323028,10.914838,-500.726841,8.036719,-500.194668,13.997317,-500.690984
50%,177.828489,8.052207,7.25,1.19761,1585.129,-499.969164,1595.622,-500.028514,1549.595,-500.017711,...,17.00847,-500.115727,14.96486,-499.576513,12.954182,-499.990332,10.004301,-499.990535,16.014935,-500.007126
75%,208.125438,10.027764,7.6,1.797819,1602.077,-499.568951,1602.324,-499.293257,1601.144,-498.99413,...,18.03862,-499.404224,15.96213,-400.933805,15.097528,-499.283191,11.997467,-499.719913,21.020013,-499.373018
max,265.983123,40.0,15.5,4.102454,2103.104,-57.195404,1813.084,-142.527229,1715.054,-150.937035,...,30.0518,-401.565212,31.26971,-6.506986,25.258848,-244.483566,14.086866,-137.740004,26.705889,-123.307487


In [10]:
baisic_df_info(gold_recovery_full, 'Исходные Данные')

Исходные Данные

Дубликатов: 0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19439 entries, 0 to 19438
Data columns (total 87 columns):
 #   Column                                              Non-Null Count  Dtype  
---  ------                                              --------------  -----  
 0   date                                                19439 non-null  object 
 1   final.output.concentrate_ag                         19438 non-null  float64
 2   final.output.concentrate_pb                         19438 non-null  float64
 3   final.output.concentrate_sol                        19228 non-null  float64
 4   final.output.concentrate_au                         19439 non-null  float64
 5   final.output.recovery                               19439 non-null  float64
 6   final.output.tail_ag                                19438 non-null  float64
 7   final.output.tail_pb                                19338 non-null  float64
 8   final.output.tail_sol                       

None

Unnamed: 0,date,final.output.concentrate_ag,final.output.concentrate_pb,final.output.concentrate_sol,final.output.concentrate_au,final.output.recovery,final.output.tail_ag,final.output.tail_pb,final.output.tail_sol,final.output.tail_au,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
12390,2017-09-01 00:59:59,3.551036,11.90795,8.892675,45.761391,69.23257,6.532799,4.225786,12.890968,2.556721,...,14.004499,-497.999058,10.924549,-399.109726,14.005582,-499.589225,11.049347,-499.329125,15.020557,-500.800247
2726,2016-06-02 10:59:59,3.986132,11.016959,8.73016,43.687579,72.120379,7.808862,1.904848,10.859785,2.343354,...,10.048825,-496.3717,8.069776,-498.837151,7.958285,-500.23183,6.071655,-500.05716,18.023341,-502.190461
13543,2017-10-26 01:59:59,4.87557,10.961129,8.735236,45.967852,71.352693,8.478994,4.052048,13.864887,3.047853,...,21.040038,-499.872918,19.030293,-400.100233,15.033259,-499.366763,10.988044,-499.924652,15.99119,-499.614931
19,2016-01-15 19:00:00,5.658943,10.436002,5.942418,42.509402,70.3483,10.232392,1.174845,19.498566,2.27246,...,14.001632,-509.808331,12.087669,-510.782166,10.032043,-499.128183,8.01614,-499.806844,13.94979,-600.140776
12142,2017-08-21 02:59:59,4.429346,9.193434,8.760518,48.233556,69.994726,6.914917,1.665086,11.261351,1.772211,...,14.494627,-498.7124,10.881682,-399.471984,13.966469,-499.996212,11.010379,-499.717726,15.000438,-499.508973


Unnamed: 0,final.output.concentrate_ag,final.output.concentrate_pb,final.output.concentrate_sol,final.output.concentrate_au,final.output.recovery,final.output.tail_ag,final.output.tail_pb,final.output.tail_sol,final.output.tail_au,primary_cleaner.input.sulfate,...,secondary_cleaner.state.floatbank4_a_air,secondary_cleaner.state.floatbank4_a_level,secondary_cleaner.state.floatbank4_b_air,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level
count,19438.0,19438.0,19228.0,19439.0,19439.0,19438.0,19338.0,19433.0,19439.0,19415.0,...,19433.0,19438.0,19438.0,19438.0,19438.0,19438.0,19438.0,19438.0,19437.0,19438.0
mean,5.16847,9.978895,9.501224,44.076513,67.050208,9.688589,2.705795,10.583728,3.042467,144.624774,...,18.987674,-485.894516,15.010366,-461.078636,15.694452,-488.684065,12.198224,-487.149827,18.959024,-505.436305
std,1.372348,1.66924,2.787537,5.129784,10.12584,2.328642,0.949077,2.868782,0.922808,44.464071,...,5.411058,47.75857,4.890228,67.405524,5.510974,34.533396,5.333024,38.347312,5.550498,37.689057
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003112,...,0.0,-799.798523,0.0,-800.836914,-0.372054,-797.323986,0.528083,-800.220337,-0.079426,-809.741464
25%,4.25124,9.137262,7.72282,43.402215,63.299712,8.062878,2.040119,8.938947,2.461138,114.107029,...,14.975734,-500.704892,11.940294,-500.187742,10.988606,-500.458467,8.971105,-500.129462,14.983037,-500.728067
50%,5.066094,10.102433,9.218961,45.011244,68.172738,9.743623,2.74873,10.622456,2.984909,143.232103,...,18.017481,-499.837793,14.971014,-499.459786,15.000036,-499.802605,11.019433,-499.935317,19.960541,-500.0484
75%,5.895527,11.035769,10.947813,46.275313,72.686642,11.134294,3.333216,12.104271,3.571351,175.075656,...,23.01247,-498.24973,19.034162,-400.118106,18.02619,-498.384187,14.019088,-499.436708,24.00317,-499.495378
max,16.001945,17.031899,19.61572,52.756638,100.0,19.552149,5.804178,22.31773,8.245022,265.983123,...,30.115735,-245.239184,31.269706,-6.506986,43.709931,-244.483566,27.926001,-137.740004,32.188906,-104.427459


### 1.2. Проверим, что эффективность обогащения рассчитана правильно

####  Вычислим эффективность обогащения на обучающей выборке для признака `rougher.output.recovery`

####  Найдём MAE между расчётами и значением признака

####  Выводы

Некоторые параметры недоступны, потому что замеряются и/или рассчитываются значительно позже. Из-за этого в тестовой выборке отсутствуют некоторые признаки, которые могут быть в обучающей. Также в тестовом наборе нет целевых признаков.  
Исходный датасет содержит обучающую и тестовую выборки со всеми признаками.  
В вашем распоряжении сырые данные: их просто выгрузили из хранилища. Прежде чем приступить к построению модели, проверьте по нашей инструкции их на корректность.




1.3. Проанализируйте признаки, недоступные в тестовой выборке. Что это за параметры? К какому типу относятся?
1.4. Проведите предобработку данных.
2. Проанализируйте данные
2.1. Посмотрите, как меняется концентрация металлов (Au, Ag, Pb) на различных этапах очистки. Опишите выводы.
2.2. Сравните распределения размеров гранул сырья на обучающей и тестовой выборках. Если распределения сильно отличаются друг от друга, оценка модели будет неправильной.
2.3. Исследуйте суммарную концентрацию всех веществ на разных стадиях: в сырье, в черновом и финальном концентратах. 