# Instalando PyMove

In [1]:
# Necessário apenas no Colab
# %%capture
# !pip install pymove

# Breve Descrição do DataSet



Dados disponíveis no seguinte repositório da UCI: [GPS Trajectories Data Set](https://archive.ics.uci.edu/ml/datasets/GPS+Trajectories)

Os dados consiste em trajetórias reais coletadas dos usuários do aplicativo Go! Track(Ano de 2013). O GO! Track coleta continuamente pontos de GPS das trajetórias que as pessoas seguem enquanto estão no carro. O conjunto de dados usado contém 63 trajetórias com mais de 18.000 pontos.

**go_track_trackspoints.csv:** pontos de localização de cada trajetória.

- _id_: chave exclusiva para identificar cada ponto.
- _latitude_: latitude de onde o ponto está.
- _longitude_: longitude de onde o ponto está.
- _track_id_: identifique a trajetória à qual o ponto pertence.
- _time_: data e hora em que o ponto foi coletado (GMT-3).

# Leitura, Identificação e Conversão dos Dados

In [2]:
import pandas as pd
import pymove as pm

In [3]:
# Endereço no Colab
# filepath = '/content/drive/My Drive/Colab Notebooks/PyMove/Datas/GPS Trajectory/go_track_trackspoints.csv'

In [4]:
# Endereço Local
filepath = '../Datas/GPS Trajectory/go_track_trackspoints.csv'

In [5]:
df = pd.read_csv(filepath)

In [6]:
df.head()

Unnamed: 0,id,latitude,longitude,track_id,time
0,1,-10.939341,-37.062742,1,2014-09-13 07:24:32
1,2,-10.939341,-37.062742,1,2014-09-13 07:24:37
2,3,-10.939324,-37.062765,1,2014-09-13 07:24:42
3,4,-10.939211,-37.062843,1,2014-09-13 07:24:47
4,5,-10.938939,-37.062879,1,2014-09-13 07:24:53


In [7]:
df.drop(['id'], axis=1, inplace=True)

In [8]:
len(df.track_id.unique())

163

In [9]:
df.shape

(18107, 4)

In [10]:
df.head()

Unnamed: 0,latitude,longitude,track_id,time
0,-10.939341,-37.062742,1,2014-09-13 07:24:32
1,-10.939341,-37.062742,1,2014-09-13 07:24:37
2,-10.939324,-37.062765,1,2014-09-13 07:24:42
3,-10.939211,-37.062843,1,2014-09-13 07:24:47
4,-10.938939,-37.062879,1,2014-09-13 07:24:53


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18107 entries, 0 to 18106
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   latitude   18107 non-null  float64
 1   longitude  18107 non-null  float64
 2   track_id   18107 non-null  int64  
 3   time       18107 non-null  object 
dtypes: float64(2), int64(1), object(1)
memory usage: 566.0+ KB


In [12]:
df.time = pd.to_datetime(df.time)

In [13]:
df.head()

Unnamed: 0,latitude,longitude,track_id,time
0,-10.939341,-37.062742,1,2014-09-13 07:24:32
1,-10.939341,-37.062742,1,2014-09-13 07:24:37
2,-10.939324,-37.062765,1,2014-09-13 07:24:42
3,-10.939211,-37.062843,1,2014-09-13 07:24:47
4,-10.938939,-37.062879,1,2014-09-13 07:24:53


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18107 entries, 0 to 18106
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   latitude   18107 non-null  float64       
 1   longitude  18107 non-null  float64       
 2   track_id   18107 non-null  int64         
 3   time       18107 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(2), int64(1)
memory usage: 566.0 KB


In [15]:
dm = pm.PandasMoveDataFrame(
    df, latitude='latitude', 
    longitude='longitude', 
    datetime='time', 
    traj_id='track_id'
)

dm2 = pm.PandasMoveDataFrame(
    df, latitude='latitude', 
    longitude='longitude', 
    datetime='time', 
    traj_id='track_id'
)

In [16]:
dm.head()

Unnamed: 0,lat,lon,id,datetime
0,-10.939342,-37.06274,1,2014-09-13 07:24:32
1,-10.939342,-37.06274,1,2014-09-13 07:24:37
2,-10.939324,-37.062763,1,2014-09-13 07:24:42
3,-10.939211,-37.062843,1,2014-09-13 07:24:47
4,-10.938939,-37.062878,1,2014-09-13 07:24:53


# Detecção de _Stop/Moves_ na Trajetória

A verificação de _stop/moves_ no PyMove pode ser identificada basicamente por duas maneiras. A primeira é diretamente por meio de uma visualização gráfica através da chamada da função _plot_stops_, descrita a seguir. As outras formas de identificação são por meio da chamada de funções que ao analizar o _dataset_, geram _features_ adicionais que armazenam informações referentes a um ponto de movimento ou um ponto de parada. Por "baixo dos panos", a função _plot_stops_ também faz algo semelhante para que possa mostrar um gráfico com esses pontos.

A seguir, temos essas duas maneiras básicas de identificação dos pontos de _stop/moves_ da trajetórias.

## Visualizando pontos de parada

A visualização dos pontos será feita por meio do módulo _visualization_ do PyMove.

In [17]:
import pymove.visualization as vsl

In [18]:
vsl.plot_stops(dm2)


Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 769.29it/s]


...Reset index

..Total Time: 0.28812336921691895

Creating or updating features MOVE and STOPS...


....There are 2580 stops to this parameters



## Identificando _Stop/Moves_

A identificação dos _Stop/Moves_ será feita com base no que já existe implementado na biblioteca PyMove.

Antes de qualquer coisa, é possível perceber que o _dataset_ que usaremos contém apenas as features iniciais e que as modificações geradas pela função _plot_stops_ encontram-se em um outro _dataset_ que será descartado. Assim como as demais funções, armazenaremos os resultados do processamento de cada função em _datasets_ diferentes para que seus valores não se misturem.

In [19]:
dm.head()

Unnamed: 0,lat,lon,id,datetime
0,-10.939342,-37.06274,1,2014-09-13 07:24:32
1,-10.939342,-37.06274,1,2014-09-13 07:24:37
2,-10.939324,-37.062763,1,2014-09-13 07:24:42
3,-10.939211,-37.062843,1,2014-09-13 07:24:47
4,-10.938939,-37.062878,1,2014-09-13 07:24:53


In [20]:
dm2.head()

Unnamed: 0,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,situation
0,1,-10.939342,-37.06274,2014-09-13 07:24:32,,0.0,,
1,1,-10.939342,-37.06274,2014-09-13 07:24:37,0.0,2.934187,2.934187,stop
2,1,-10.939324,-37.062763,2014-09-13 07:24:42,2.934187,15.475652,18.332698,move
3,1,-10.939211,-37.062843,2014-09-13 07:24:47,15.475652,30.418784,44.653001,move
4,1,-10.938939,-37.062878,2014-09-13 07:24:53,30.418784,44.240489,74.240342,move


In [21]:
dm2[dm2.situation=='stop'].shape

(2580, 8)

Após mostrar o cabeçalho do que já existe no _datsaet_, é possível perceber que foram adicionadas algumas colunas ao mesmo. Essas colunas são resultado da função de _plot_stops_ do módulo _visualization_.

No entanto, agora iremos observar os retornos do módulo _stay_point_detection_

### _stay_point_detection_

Usaremos as funções que já existem implementadas no módulo _stay_point_detection_. Para isso, importaremos o mesmo.

In [22]:
from pymove.preprocessing import stay_point_detection

#### _create_or_update_move_stop_by_dist_time_

Determina os pontos de _stop_ e _move_ do _dataframe_ baseado no tempo de leitura; se esses pontos já existirem, eles serão atualizados.

Essa função retorna o _dataframe_ com 2 features adicionais: _segment_stop_ e _stop_.
- _segment_stop_ indica o segmento de trajetória ao qual o ponto pertence.
- _stop_ indica se o ponto representa uma parada.


In [23]:
dmst = stay_point_detection.create_or_update_move_stop_by_dist_time(dm, inplace=False)

Split trajectories by max distance between adjacent points: 30

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 1118.87it/s]

...Reset index

..Total Time: 0.1930544376373291
...setting id as index



Generating segment_stop:   0%|          | 0/163 [00:00<?, ?it/s]

id: 14 has not point to split


Generating segment_stop:  36%|███▌      | 59/163 [00:00<00:00, 581.06it/s]

id: 71 has not point to split
id: 148 has not point to split
id: 159 has not point to split
id: 171 has not point to split


Generating segment_stop:  76%|███████▌  | 124/163 [00:00<00:00, 599.07it/s]

id: 37982 has not point to split
id: 37990 has not point to split
id: 37993 has not point to split
id: 37998 has not point to split


Generating segment_stop: 100%|██████████| 163/163 [00:00<00:00, 611.90it/s]

id: 38030 has not point to split
... Reseting index
...No trajs with only one point. (18107, 8)

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance






...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 1034.25it/s]

...Reset index

..Total Time: 0.19692063331604004
------------------------------------------


Creating or updating distance, time and speed features in meters by seconds

...Sorting by segment_stop and datetime to increase performance

...Set segment_stop as index to a higher peformance




  data_.at[idx, DIST_TO_PREV] / time_prev
Generating distance features: 100%|██████████| 8964/8964 [00:03<00:00, 2453.78it/s]


...Reset index...

..Total Time: 3.698
Create or update stop as True or False
...Creating stop features as True or False using 900 to time in seconds
False    16798
True      1309
Name: stop, dtype: int64

Total Time: 4.51 seconds
-----------------------------------------------------



In [24]:
dmst.head()

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,stop
0,1,1,-10.939342,-37.06274,2014-09-13 07:24:32,,0.0,,,,False
1,1,1,-10.939342,-37.06274,2014-09-13 07:24:37,0.0,2.934187,2.934187,5.0,0.0,False
2,1,1,-10.939324,-37.062763,2014-09-13 07:24:42,2.934187,15.475652,18.332698,5.0,0.586837,False
3,1,1,-10.939211,-37.062843,2014-09-13 07:24:47,15.475652,30.418784,44.653001,5.0,3.09513,False
4,2,1,-10.938939,-37.062878,2014-09-13 07:24:53,,44.240489,74.240342,,,False


In [25]:
dmst[dmst.stop].head()

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,stop
2248,963,26,-10.924294,-37.10479,2014-10-22 10:24:27,,29.920119,81.682837,,,True
2249,963,26,-10.924559,-37.104748,2014-10-22 10:24:37,29.920119,0.998859,29.397265,10.0,2.992012,True
2250,963,26,-10.924553,-37.10474,2014-10-22 10:24:48,0.998859,0.0,0.998859,11.0,0.090805,True
2251,963,26,-10.924553,-37.10474,2014-10-22 10:24:58,0.0,4.145351,4.145351,10.0,0.0,True
2252,963,26,-10.92452,-37.104755,2014-10-22 10:25:09,4.145351,12.908755,9.03111,11.0,0.37685,True


In [26]:
dmst[dmst.stop].shape

(1309, 11)

In [27]:
pm.visualization.plot_markers(dmst[dmst.stop])

#### _create_update_move_and_stop_by_radius_

Localiza os pontos de _stop_ e _move_ do _dataframe_ baseado no raio; se esses pontos já existirem, eles serão atualizados.

Essa função retorna o _dataframe_ com 2 features adicionais: _segment_stop_ e _new_label_.
- _segment_stop_ indica o segmento de trajetória ao qual o ponto pertence.
- _new_label_ indica se o ponto representa um ponto de parada ou um ponto móvel.

Essa função é usada "por baixo dos panos" pela função _plot_stops_ do módulo _visualization_. Dessa maneira, os dados gerados por ela são similares aos gerados pela função de visualização.

In [28]:
dmsr = stay_point_detection.create_update_move_and_stop_by_radius(dm,inplace=False)


Creating or updating features MOVE and STOPS...


Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 1017.37it/s]

...Reset index

..Total Time: 0.19951319694519043






....There are 2580 stops to this parameters



In [29]:
dmsr.head()

Unnamed: 0,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,situation
0,1,-10.939342,-37.06274,2014-09-13 07:24:32,,0.0,,
1,1,-10.939342,-37.06274,2014-09-13 07:24:37,0.0,2.934187,2.934187,stop
2,1,-10.939324,-37.062763,2014-09-13 07:24:42,2.934187,15.475652,18.332698,move
3,1,-10.939211,-37.062843,2014-09-13 07:24:47,15.475652,30.418784,44.653001,move
4,1,-10.938939,-37.062878,2014-09-13 07:24:53,30.418784,44.240489,74.240342,move


In [30]:
dmsr[dmsr.situation=='stop'].shape

(2580, 8)

In [31]:
vsl.plot_markers(dmsr[dmsr.situation=='stop'])

### _compression_

Usaremos as funções que existem implementadas no módulo _compression_. Para isso, importaremos o mesmo.

In [32]:
from pymove.preprocessing import compression

#### _compress_segment_stop_to_point_

Comprime as trajetórias usando os pontos de permanência do _dataframe_. Comprima um segmento para definir o ponto _lat_mean_ e _lon_mean_ para cada segmento.

Retorna o _dataframe_ com 2 features adicionais: _segment_stop_, _stop_, _lat_mean_ e _lon_mean_.
- _segment_stop_ indica o segmento de trajetória ao qual o ponto pertence.
- _stop_ indica se o ponto representa uma parada.
- _lat_mean_ e _lon_mean_:
 - se a opção padrão for usada, _lat_mean_ e _lon_mean_ serão definidos com base no ponto que se repete mais dentro o segmento.
 - Por outro lado, se a opção centróide for usada, _lat_mean_ e _lon_mean_ serão definidas pelo centróide do todos os pontos no segmento.

In [33]:
dmcp = compression.compress_segment_stop_to_point(dm)

Split trajectories by max distance between adjacent points: 30

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 874.44it/s]


...Reset index

..Total Time: 0.2390732765197754
...setting id as index


Generating segment_stop:  24%|██▍       | 39/163 [00:00<00:00, 389.56it/s]

id: 14 has not point to split
id: 71 has not point to split
id: 148 has not point to split


Generating segment_stop:  58%|█████▊    | 95/163 [00:00<00:00, 428.37it/s]

id: 159 has not point to split
id: 171 has not point to split


Generating segment_stop: 100%|██████████| 163/163 [00:00<00:00, 529.01it/s]

id: 37982 has not point to split
id: 37990 has not point to split
id: 37993 has not point to split
id: 37998 has not point to split
id: 38030 has not point to split
... Reseting index
...No trajs with only one point. (18107, 8)

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance




Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 1091.33it/s]

...Reset index

..Total Time: 0.1874065399169922
------------------------------------------


Creating or updating distance, time and speed features in meters by seconds

...Sorting by segment_stop and datetime to increase performance

...Set segment_stop as index to a higher peformance




  data_.at[idx, DIST_TO_PREV] / time_prev
Generating distance features: 100%|██████████| 8964/8964 [00:02<00:00, 3046.43it/s]


...Reset index...

..Total Time: 2.982
Create or update stop as True or False
...Creating stop features as True or False using 900 to time in seconds
False    16798
True      1309
Name: stop, dtype: int64

Total Time: 3.82 seconds
-----------------------------------------------------

...setting mean to lat and lon...
...get only segments stop...


Generating segment_stop and stop: 100%|██████████| 8/8 [00:00<00:00, 14.00it/s]

...Dropping 1293 points...
...Shape_before: 18107
...Current shape: 16814
-----------------------------------------------------






In [34]:
dmcp.head()

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,stop,lat_mean,lon_mean
0,1,1,-10.939342,-37.06274,2014-09-13 07:24:32,,0.0,,,,False,,
1,1,1,-10.939342,-37.06274,2014-09-13 07:24:37,0.0,2.934187,2.934187,5.0,0.0,False,,
2,1,1,-10.939324,-37.062763,2014-09-13 07:24:42,2.934187,15.475652,18.332698,5.0,0.586837,False,,
3,1,1,-10.939211,-37.062843,2014-09-13 07:24:47,15.475652,30.418784,44.653001,5.0,3.09513,False,,
4,2,1,-10.938939,-37.062878,2014-09-13 07:24:53,,44.240489,74.240342,,,False,,


In [35]:
dmcp[dmcp.stop].shape

(16, 13)

In [36]:
dmcp[dmcp.stop]

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,stop,lat_mean,lon_mean
2248,963,26,-10.924294,-37.10479,2014-10-22 10:24:27,,29.920119,81.682837,,,True,-10.924553,-37.10474
2318,963,26,-10.92654,-37.101929,2014-10-22 10:45:35,3.078025,36.841701,34.087703,10.0,0.307802,True,-10.924553,-37.10474
2649,1045,27,-10.924294,-37.10479,2014-10-22 10:24:27,,29.920119,81.682837,,,True,-10.924553,-37.10474
2719,1045,27,-10.92654,-37.101929,2014-10-22 10:45:35,3.078025,36.841701,34.087703,10.0,0.307802,True,-10.924553,-37.10474
8357,4291,134,-10.896932,-37.079685,2015-02-14 10:28:37,,7.360138,,,,True,-10.897094,-37.079708
8660,4291,134,-10.897087,-37.07972,2015-02-14 10:55:29,0.0,,,5.0,0.0,True,-10.897094,-37.079708
8661,4292,135,-10.896932,-37.079685,2015-02-14 10:28:37,,7.360138,,,,True,-10.897094,-37.079708
8964,4292,135,-10.897087,-37.07972,2015-02-14 10:55:29,0.0,,,5.0,0.0,True,-10.897094,-37.079708
10117,5018,147,-10.922647,-37.047188,2015-02-23 05:54:44,,14.168456,40.897641,,,True,-10.922691,-37.047279
10322,5018,147,-10.922628,-37.047073,2015-02-23 06:12:56,4.325883,,,5.0,0.865177,True,-10.922691,-37.047279


In [37]:
vsl.plot_markers(dmcp[dmcp.stop])

#### _compress_segment_stop_to_point_optimizer_

Comprime as trajetórias usando os pontos de parada do _dataframe_. Comprime um segmento para definir a configuração _lat_mean_ e _lon_mean_ para cada segmento.

Retorna o _dataframe_ com 2 features adicionais: _segment_stop_, _stop_, _lat_mean_ e _lon_mean_.
- _segment_stop_ indica o segmento de trajetória ao qual o ponto pertence.
- _lat_mean_ e _lon_mean_:
 - se a opção padrão for usada, _lat_mean_ e _lon_mean_ serão definidos com base no ponto que se repete mais dentro o segmento.
 - Por outro lado, se a opção centróide for usada, _lat_mean_ e _lon_mean_ serão definidas pelo centróide do todos os pontos no segmento

In [38]:
dmco = compression.compress_segment_stop_to_point_optimizer(dm)

Split trajectories by max distance between adjacent points: 30

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 723.52it/s]

...Reset index

..Total Time: 0.2732658386230469
...setting id as index



Generating segment_stop:  42%|████▏     | 68/163 [00:00<00:00, 672.15it/s]

id: 14 has not point to split
id: 71 has not point to split
id: 148 has not point to split
id: 159 has not point to split
id: 171 has not point to split


Generating segment_stop:  76%|███████▌  | 124/163 [00:00<00:00, 633.11it/s]

id: 37982 has not point to split
id: 37990 has not point to split
id: 37993 has not point to split
id: 37998 has not point to split

Generating segment_stop: 100%|██████████| 163/163 [00:00<00:00, 595.43it/s]



id: 38030 has not point to split
... Reseting index
...No trajs with only one point. (18107, 8)

Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance



Generating distance, time and speed features: 100%|██████████| 163/163 [00:00<00:00, 973.81it/s]

...Reset index

..Total Time: 0.21719932556152344
------------------------------------------


Creating or updating distance, time and speed features in meters by seconds

...Sorting by segment_stop and datetime to increase performance






...Set segment_stop as index to a higher peformance



  data_.at[idx, DIST_TO_PREV] / time_prev
Generating distance features: 100%|██████████| 8964/8964 [00:03<00:00, 2715.50it/s]


...Reset index...

..Total Time: 3.348
Create or update stop as True or False
...Creating stop features as True or False using 900 to time in seconds
False    16798
True      1309
Name: stop, dtype: int64

Total Time: 4.22 seconds
-----------------------------------------------------

...setting mean to lat and lon...
...get only segments stop...


Generating segment_stop and stop: 100%|██████████| 8/8 [00:00<00:00, 16.46it/s]

...Dropping 1293 points...
...Shape_before: 18107
...Current shape: 16814
-----------------------------------------------------






In [39]:
dmco.head()

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,stop,lat_mean,lon_mean
0,1,1,-10.939342,-37.06274,2014-09-13 07:24:32,,0.0,,,,False,,
1,1,1,-10.939342,-37.06274,2014-09-13 07:24:37,0.0,2.934187,2.934187,5.0,0.0,False,,
2,1,1,-10.939324,-37.062763,2014-09-13 07:24:42,2.934187,15.475652,18.332698,5.0,0.586837,False,,
3,1,1,-10.939211,-37.062843,2014-09-13 07:24:47,15.475652,30.418784,44.653001,5.0,3.09513,False,,
4,2,1,-10.938939,-37.062878,2014-09-13 07:24:53,,44.240489,74.240342,,,False,,


In [40]:
dmco[dmco.stop].shape

(16, 13)

In [41]:
vsl.plot_markers(dmco[dmco.stop])

E essa seria apenas uma simples visualização de algumas funções que já presentes no PyMove que faz essa detecção dos pontos de _stop/moves_ na trajetória.