Open
Description
env
use vscode remote-ssh connect to debian server and use the jupyter
debian12
vscode=Version: 1.99.0
python=3.10
ipykernal=6.29.5
pandas=2.2.3
replay
create a big dataframe
import pandas as pd
import numpy as np
np.random.seed(0)
num_rows = 10000000
num_cols = 10
# create the random number
data = np.random.randint(0, 100, size=(num_rows, num_cols))
df = pd.DataFrame(data, columns=[f'col_{i}' for i in range(num_cols)])
monitor the memory use
%load_ext ipython_memory_usage
%imu_start
when rerun the cell below ,momery will used increasing
df_temp = df.copy(deep=True)
df_temp.head()
in my computer,the result like
[Out] In [4] used 763.6 MiB RAM in 0.55s (system mean cpu 40%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 1632.0 MiB
[Out] In [5] used 763.0 MiB RAM in 0.55s (system mean cpu 19%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 2394.9 MiB
[Out] In [6] used 763.1 MiB RAM in 0.55s (system mean cpu 25%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 3158.0 MiB
try to solve
After many tries , I find the main reason is df_temp.head()
;
If rerun df_temp = df.copy(deep=True)
,the used memory doesn't increase;
If change the code to
df_snapshot_test = df_snapshot.copy(deep=True)
df_snapshot_test.head().copy(deep=True)
rerun will not increase memory too;
I have try
import gc
gc.collect()
or
from IPython.display
import clear_output
both can't free the memory
Metadata
Metadata
Assignees
Labels
No labels