# Где собирать логи
1. Ставим Docker desktop
2. Устанавливаем [образ](https://hub.docker.com/r/yandex/clickhouse-server/) Clickhouse
```
!docker run -d -p 0.0.0.0:8123:8123 --volume=/path/to/some/folder/on/disk/some_clickhouse_database:/var/lib/clickhouse --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
```

Опция -p 0.0.0.0:8123:8123 открывает доступ к контейнеру по порту 8123 (иногда сразу его нет).

При повторной установке в случае ошибки вида
```
docker: Error response from daemon: Conflict. The container name "/some-clickhouse-server" is already in use by container "34899ff1c1d78111048b762fb730963adac0b90eedb9751f4c5d62aa4d90c589". You have to remove (or rename) that container to be able to reuse that name.
```
удалите контейнер командой (только замените ID контейнера на свой)
```
!docker rm 34899ff1c1d78111048b762fb730963adac0b90eedb9751f4c5d62aa4d90c589
```

Как узнать ID_контейнера
```
!docker ps
```

Как зайти в контейнер (лучше делать в командной строке):
```
docker exec -it ID_контейнера bash
```

Открыть clickhouse-client:
```
docker run -it --rm --link some-clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server
```

3. Проверьте наличие доступа к clickhouse в контейнере в браузере, открыв ссылку [localhost:8123](http://localhost:8123), должны увидеть Ok.

4. Берем открытые данные [Метрики](https://clickhouse.tech/docs/ru/getting-started/example-datasets/metrica/).

In [1]:
import requests
from io import StringIO
import pandas as pd

In [2]:
HOST = 'http://localhost:8123'

In [7]:
def clickhouse_data(query, host=HOST, timeout=60, columns=None):
    r = requests.post(host, params = {'timeout_before_checking_execution_speed': 120, 'max_execution_time': 6000}
                          , timeout = timeout, data = query)
    if r.status_code == 200:
        return r.text
    else:
        print('Что-то пошло не так')
        raise ValueError(r.text)

In [8]:
clickhouse_data('select count(*) from datasets.hits_v1')

'8873898\n'

In [9]:
text = clickhouse_data('select BrowserCountry, count(*) as cnt from datasets.hits_v1 group by BrowserCountry order by cnt desc limit 5')
text

'��\t6048025\nTp\t2131666\nTi\t361055\nI7\t175930\nIP\t32667\n'

In [10]:
df = pd.read_csv(StringIO(text), sep='\t', names=['BrowserCountry', 'cnt'])
df

Unnamed: 0,BrowserCountry,cnt
0,��,6048025
1,Tp,2131666
2,Ti,361055
3,I7,175930
4,IP,32667


In [12]:
%%time

text = clickhouse_data('select EventDate, count(*) from datasets.hits_v1 group by EventDate order by EventDate')
df = pd.read_csv(StringIO(text), sep='\t', names=['EventDate', 'hits'])

CPU times: user 5.77 ms, sys: 3.11 ms, total: 8.88 ms
Wall time: 170 ms


In [13]:
df

Unnamed: 0,EventDate,hits
0,2014-03-17,1406958
1,2014-03-18,1383658
2,2014-03-19,1405797
3,2014-03-20,1353623
4,2014-03-21,1245779
5,2014-03-22,1031592
6,2014-03-23,1046491


In [14]:
text = clickhouse_data('select EventDate, uniq(UserID) from datasets.hits_v1 group by EventDate order by EventDate')
df = pd.read_csv(StringIO(text), sep='\t', names=['EventDate', 'unique_users_approx'])
df

Unnamed: 0,EventDate,unique_users_approx
0,2014-03-17,36613
1,2014-03-18,36531
2,2014-03-19,36940
3,2014-03-20,36462
4,2014-03-21,35447
5,2014-03-22,31555
6,2014-03-23,31200


In [15]:
text = clickhouse_data('select EventDate, uniqExact(UserID) from datasets.hits_v1 group by EventDate order by EventDate')
df = pd.read_csv(StringIO(text), sep='\t', names=['EventDate', 'unique_users_exact'])
df

Unnamed: 0,EventDate,unique_users_exact
0,2014-03-17,36613
1,2014-03-18,36531
2,2014-03-19,36940
3,2014-03-20,36462
4,2014-03-21,35447
5,2014-03-22,31555
6,2014-03-23,31200


In [16]:
%%time
text = clickhouse_data('select TraficSourceID, EventDate, uniqExact(UserID) from datasets.hits_v1 group by TraficSourceID, EventDate')
df = pd.read_csv(StringIO(text), sep='\t', names=['TraficSourceID', 'EventDate', 'unique_users_exact'])

CPU times: user 5.77 ms, sys: 3.21 ms, total: 8.98 ms
Wall time: 572 ms


In [17]:
df

Unnamed: 0,TraficSourceID,EventDate,unique_users_exact
0,0,2014-03-19,31574
1,10,2014-03-21,3824
2,5,2014-03-17,9111
3,7,2014-03-19,56
4,2,2014-03-17,1660
...,...,...,...
65,1,2014-03-20,15660
66,4,2014-03-23,372
67,3,2014-03-21,14521
68,-1,2014-03-22,18890
