Skip to content

rerun cell with datafame make memory leak #1391

Open
@kelarMai

Description

@kelarMai

env

use vscode remote-ssh connect to debian server and use the jupyter

debian12
vscode=Version: 1.99.0
python=3.10
ipykernal=6.29.5
pandas=2.2.3

replay

create a big dataframe

import pandas as pd
import numpy as np

np.random.seed(0)
num_rows = 10000000  
num_cols = 10  
# create the random number
data = np.random.randint(0, 100, size=(num_rows, num_cols))
df = pd.DataFrame(data, columns=[f'col_{i}' for i in range(num_cols)])

monitor the memory use

%load_ext ipython_memory_usage
%imu_start

when rerun the cell below ,momery will used increasing

df_temp = df.copy(deep=True)
df_temp.head()

in my computer,the result like

[Out] In [4] used 763.6 MiB RAM in 0.55s (system mean cpu 40%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 1632.0 MiB

[Out] In [5] used 763.0 MiB RAM in 0.55s (system mean cpu 19%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 2394.9 MiB

[Out] In [6] used 763.1 MiB RAM in 0.55s (system mean cpu 25%, single max cpu 100%), peaked 0.0 MiB above final usage, current RAM usage now 3158.0 MiB

try to solve

After many tries , I find the main reason is df_temp.head() ;
If rerun df_temp = df.copy(deep=True) ,the used memory doesn't increase;

If change the code to

df_snapshot_test = df_snapshot.copy(deep=True)
df_snapshot_test.head().copy(deep=True)

rerun will not increase memory too;

I have try

import gc
gc.collect()  

or

from IPython.display 
import clear_output

both can't free the memory

another similar issuse

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions