Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug ReportMemoryError: Unable to allocate 20.0 PiB for an array with shape (2817560004071633,) and data type float64 #1435

Open
3 tasks done
BioComSoftware opened this issue Sep 7, 2023 · 5 comments

Comments

@BioComSoftware
Copy link

Current Behaviour

Windows 10 enterprise running as a VM in vSphere
Jupyter Notebook: 6.5.2
pandas==1.5.3
Python: 3.11.1
ydata-profiling==4.5.1

  • I have a dataframe with 76 columns. But I am only running 10 rows of data through ydata-profiler.
  • The data has all been intentionally converted to be only datetime, float, dict, or string.
  • There are two very large dict fields, although the error message doesn't imply they are the issue.

After preparing the DF, I run ...

profile = ProfileReport(peaks_method_df.head(10), title="Profiling Report")
profile.to_notebook_iframe()

It fails with the error...

MemoryError: Unable to allocate 20.0 PiB for an array with shape (2817560004071633,) and data type float64
  • If I run it with only 5 rows, it succeeds.
  • I have increased the pagefile to 70GB, running on a separate disk.

Expected Behaviour

It should complete with the report fields, which it does if I run only 5 rows instead of 10.

Data Description

Pandas dataframe:

  • 76 columns, 10 rows
  • data is only datetime, float, dict, or string.
  • two very large dict fields

One row of data ( I.e. peaks_method_df.iloc[0].to_dict(). Can recreate 10 identical rows for testing. )

{'revision': '2022-1110-2215-37418',
'name': 'TPH_N2_Guard_221110',
'methodtype': 'AcquisitionMethod',
'created': Timestamp('2022-11-10 22:15:37.416870+0000', tz='UTC'),
'createdby': 'USERNAME (COMPANYNAME USA) (COMPANYNAME\lfs-USERID)',
'lastmodified': Timestamp('2022-11-11 01:57:23.545328+0000', tz='UTC'),
'reportableinformation_json': {'Id': 'ce8196f4-1e88-474e-b945-76e202135dfe',
'Ver': 0,
'MethodDescription': {'ID': None,
'Name': 'Method',
'Tables': [],
'Sections': [{'ID': 'Method_Information',
'Name': 'Method Information',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'ID_Method_Original',
'Name': 'Last Saved As',
'Unit': None,
'Value': '/FILENAME.amx'},
{'ID': 'ID_Method_Modified',
'Name': 'Modified',
'Unit': None,
'Value': '2022-11-10 20:57:23-05:00'},
{'ID': 'ID_Method_ModifiedBy',
'Name': 'Modifier',
'Unit': None,
'Value': 'USERNAME (COMPANYNAME USA) (COMPANYNAME\lfs-USERID)'},
{'ID': 'ID_Method_Created',
'Name': 'Created',
'Unit': None,
'Value': '2022-11-10 17:15:37-05:00'},
{'ID': 'ID_Method_CreatedBy',
'Name': 'Creator',
'Unit': None,
'Value': 'USERNAME (COMPANYNAME USA) (COMPANYNAME\lfs-USERID)'},
{'ID': 'ID_Method_Description',
'Name': 'Description',
'Unit': None,
'Value': ''},
{'ID': 'ID_Method_Version',
'Name': 'Version',
'Unit': None,
'Value': '2022-1110-2215-37418'},
{'ID': 'ID_Method_ApprovalState',
'Name': 'Method Status',
'Unit': None,
'Value': 'Generic'}]},
{'ID': 'GCMethod',
'Name': 'GC',
'Tables': [],
'Sections': [{'ID': 'SUMMARY',
'Name': 'GC Summary',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'RUN_TIME',
'Name': 'Run Time',
'Unit': 'min',
'Value': '2'},
{'ID': 'POSTRUN_TIME',
'Name': 'Post Run Time',
'Unit': 'min',
'Value': '0'},
{'ID': 'CooldownMode',
'Name': 'Cycle Time Optimization',
'Unit': '',
'Value': 'Fast Cool (default)'}]},
{'ID': 'OVEN',
'Name': 'Oven',
'Tables': [],
'Sections': [{'ID': 'TEMPERATURE',
'Name': 'Temperature',
'Tables': [{'ID': 'RAMP',
'Name': 'Program',
'Rows': [{'Parameters': [{'ID': 'RATE',
'Name': 'Rate',
'Unit': '°C/min',
'Value': '250'},
{'ID': 'NEXT', 'Name': 'Value', 'Unit': '°C', 'Value': '350'},
{'ID': 'HOLD',
'Name': 'Hold Time',
'Unit': 'min',
'Value': '0.66'}]}]}],
'Sections': [],
'Parameters': [{'ID': 'STATE',
'Name': 'Setpoint',
'Unit': None,
'Value': 'On'},
{'ID': 'INITIAL', 'Name': '(Initial)', 'Unit': '°C', 'Value': '40'},
{'ID': 'HOLD', 'Name': 'Hold Time', 'Unit': 'min', 'Value': '0.1'},
{'ID': 'POSTRUN', 'Name': 'Post Run', 'Unit': '°C', 'Value': '0'}]}],
'Parameters': [{'ID': 'EQUIL_TIME',
'Name': 'Equilibration Time',
'Unit': 'min',
'Value': '0.01'},
{'ID': 'MAX_TEMPERATURE',
'Name': 'Max Temperature',
'Unit': '°C',
'Value': '400'},
{'ID': 'MaxTempOverride',
'Name': 'Maximum Temperature Override',
'Unit': '',
'Value': 'Disabled'}]},
{'ID': 'ALS',
'Name': 'ALS',
'Tables': [],
'Sections': [{'ID': 'FRONTINJ',
'Name': 'Injector',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'SYRINGE_SIZE',
'Name': 'Syringe Size',
'Unit': 'μL',
'Value': '10'},
{'ID': 'INJ_VOL',
'Name': 'Injection Volume',
'Unit': 'μL',
'Value': '1'},
{'ID': 'SOLV_A_WASH_PRE',
'Name': 'Solvent A Washes (PreInj)',
'Unit': '',
'Value': '2'},
{'ID': 'SOLV_A_WASH_POST',
'Name': 'Solvent A Washes (PostInj)',
'Unit': '',
'Value': '2'},
{'ID': 'SOLV_A_VOL',
'Name': 'Solvent A Volume',
'Unit': 'μL',
'Value': '2'},
{'ID': 'SOLV_B_WASH_PRE',
'Name': 'Solvent B Washes (PreInj)',
'Unit': '',
'Value': '2'},
{'ID': 'SOLV_B_WASH_POST',
'Name': 'Solvent B Washes (PostInj)',
'Unit': '',
'Value': '2'},
{'ID': 'SOLV_B_VOL',
'Name': 'Solvent B Volume',
'Unit': 'μL',
'Value': '2'},
{'ID': 'SAMP_WASH',
'Name': 'Sample Washes',
'Unit': '',
'Value': '0'},
{'ID': 'SAMP_WASH_VOL',
'Name': 'Sample Wash Volume',
'Unit': 'μL',
'Value': '2'},
{'ID': 'SAMP_PUMPS',
'Name': 'Sample Pumps',
'Unit': '',
'Value': '5'},
{'ID': 'DWELL_TIME_PRE',
'Name': 'Dwell Time (PreInj)',
'Unit': 'min',
'Value': '0'},
{'ID': 'DWELL_TIME_POST',
'Name': 'Dwell Time (PostInj)',
'Unit': 'min',
'Value': '0'},
{'ID': 'SOLV_DRAW_SPEED',
'Name': 'Solvent Wash Draw Speed',
'Unit': 'μL/min',
'Value': '300'},
{'ID': 'SOLV_DISP_SPEED',
'Name': 'Solvent Wash Dispense Speed',
'Unit': 'μL/min',
'Value': '3000'},
{'ID': 'SAMP_DRAW_SPEED',
'Name': 'Sample Wash Draw Speed',
'Unit': 'μL/min',
'Value': '300'},
{'ID': 'SAMP_DISP_SPEED',
'Name': 'Sample Wash Dispense Speed',
'Unit': 'μL/min',
'Value': '3000'},
{'ID': 'INJ_DISP_SPEED',
'Name': 'Injection Dispense Speed',
'Unit': 'μL/min',
'Value': '6000'},
{'ID': 'VISCOSITY_DELAY',
'Name': 'Viscosity Delay',
'Unit': 'sec',
'Value': '2'},
{'ID': 'SAMP_DEPTH',
'Name': 'Sample Depth',
'Unit': '',
'Value': 'Disabled'},
{'ID': 'INJ_MODE',
'Name': 'Injection Type',
'Unit': '',
'Value': 'Standard'},
{'ID': 'L1_AIRGAP',
'Name': 'L1 Airgap',
'Unit': 'μL',
'Value': '0.2'},
{'ID': 'SOLV_WASHMODE',
'Name': 'Solvent Wash Mode',
'Unit': '',
'Value': 'A, B'}]},
{'ID': 'SAMP_OVERLAP',
'Name': 'Sample Overlap',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'MODE',
'Name': 'Mode',
'Unit': '',
'Value': 'Sample overlap is not enabled'}]}],
'Parameters': [{'ID': 'ALS_ERRORS',
'Name': 'ALS Errors',
'Unit': '',
'Value': 'Pause for user interaction'}]},
{'ID': 'INLET1',
'Name': 'SS Inlet N2',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'MODE',
'Name': 'Mode',
'Unit': '',
'Value': 'Split'},
{'ID': 'HEATER', 'Name': 'Heater', 'Unit': '°C', 'Value': 'On 400'},
{'ID': 'PRESSURE',
'Name': 'Pressure',
'Unit': 'psi',
'Value': 'On 41.914'},
{'ID': 'TOTAL_FLOW',
'Name': 'Total Flow',
'Unit': 'mL/min',
'Value': 'On 75.387'},
{'ID': 'SEPTUM_PURGE_FLOW',
'Name': 'Septum Purge Flow',
'Unit': 'mL/min',
'Value': 'On 3'},
{'ID': 'INLET_LEAK_TEST',
'Name': 'Pre-Run Flow Test',
'Unit': '',
'Value': 'Off'},
{'ID': 'GAS_SAVER', 'Name': 'Gas Saver', 'Unit': '', 'Value': 'Off'},
{'ID': 'SPLIT_RATIO',
'Name': 'Split Ratio',
'Unit': ':1',
'Value': '2'},
{'ID': 'SPLIT_FLOW',
'Name': 'Split Flow',
'Unit': 'mL/min',
'Value': '48.258'},
{'ID': 'LINER',
'Name': 'Liner',
'Unit': '',
'Value': 'COMPANYNAME 5190-5105: 870 μL (Splitless, UI, Mid-Frit Liner)'}]},
{'ID': 'COLUMN',
'Name': 'Column',
'Tables': [],
'Sections': [{'ID': 'COLUMN1',
'Name': 'Column #1',
'Tables': [],
'Sections': [{'ID': 'FLOW',
'Name': 'Flow',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'STATE',
'Name': 'Setpoint',
'Unit': None,
'Value': 'On'},
{'ID': 'INITIAL',
'Name': '(Initial)',
'Unit': 'mL/min',
'Value': '24.129'},
{'ID': 'POSTRUN',
'Name': 'Post Run',
'Unit': 'mL/min',
'Value': '3'}]}],
'Parameters': [{'ID': 'INFO',
'Name': 'Column Information',
'Unit': None,
'Value': 'COMPANYNAME 123-57J1-INT: US19250232'},
{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': None,
'Value': 'DB-5HT'},
{'ID': 'TEMPERATURERANGE',
'Name': 'Temperature Range',
'Unit': None,
'Value': '-60 °C—400 °C (400 °C)'},
{'ID': 'DIMENSIONS',
'Name': 'Dimensions',
'Unit': None,
'Value': '5 m x 320 μm x 0.1 μm (Uncalibrated)'},
{'ID': 'HeatedBy', 'Name': 'Heater', 'Unit': None, 'Value': 'Oven'},
{'ID': 'COLUMN_LOCKED',
'Name': 'Column lock',
'Unit': '',
'Value': 'Locked'},
{'ID': 'IN', 'Name': 'In', 'Unit': '', 'Value': 'SS Inlet N2'},
{'ID': 'OUT', 'Name': 'Out', 'Unit': '', 'Value': 'Detector 1 FID'},
{'ID': 'INITIAL', 'Name': '(Initial)', 'Unit': '°C', 'Value': '40'},
{'ID': 'PRESSURE',
'Name': 'Pressure',
'Unit': 'psi',
'Value': '41.914'},
{'ID': 'FLOW', 'Name': 'Flow', 'Unit': 'mL/min', 'Value': '24.129'},
{'ID': 'VELOCITY',
'Name': 'Average Velocity',
'Unit': 'cm/sec',
'Value': '180.3'},
{'ID': 'HOLDUPTIME',
'Name': 'Holdup Time',
'Unit': 'min',
'Value': '0.046219'},
{'ID': 'HOLDUPTIME_OVERALL',
'Name': 'System Holdup',
'Unit': 'min',
'Value': '0.052099'},
{'ID': 'CONTROL_MODE',
'Name': 'Control Mode',
'Unit': '',
'Value': 'Constant Flow'}]}],
'Parameters': [{'ID': 'ColumnOutletPressure',
'Name': 'Column Outlet Pressure',
'Unit': 'psi',
'Value': '0'}]},
{'ID': 'Detector 1',
'Name': 'Detector 1 FID',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'MAKEUP',
'Name': 'Makeup',
'Unit': '',
'Value': 'N2'},
{'ID': 'Heater', 'Name': 'Heater', 'Unit': '°C', 'Value': 'On 350'},
{'ID': 'H2 Flow',
'Name': 'H2 Flow',
'Unit': 'mL/min',
'Value': 'On 30'},
{'ID': 'Air Flow',
'Name': 'Air Flow',
'Unit': 'mL/min',
'Value': 'On 400'},
{'ID': 'Makeup Flow (Combined)',
'Name': 'Makeup Flow (Combined)',
'Unit': 'mL/min',
'Value': 'On 30'},
{'ID': 'CARRIER_GAS_CORRECTION',
'Name': 'Carrier Gas Flow Correction',
'Unit': '',
'Value': 'Column + Makeup = Constant'},
{'ID': 'FLAME', 'Name': 'Flame', 'Unit': '', 'Value': 'On'},
{'ID': 'BR_BLANK_RUN_SETPOINTS',
'Name': 'Blank Evaluation Setpoints',
'Unit': '',
'Value': ''},
{'ID': 'BR_PERFORM_BLANK_RUN_TEST',
'Name': 'Perform Blank Evaluation Test',
'Unit': '',
'Value': 'Off'}]},
{'ID': 'AQM_COMPONENTS',
'Name': 'COMPANYNAME Intuvo 9000 GC Components',
'Tables': [],
'Sections': [{'ID': 'AQM_TRAP',
'Name': 'Guard Chip',
'Tables': [],
'Sections': [{'ID': 'TEMPERATURE',
'Name': 'Temperature',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'STATE',
'Name': 'Setpoint',
'Unit': None,
'Value': 'On'},
{'ID': 'INITIAL',
'Name': '(Initial)',
'Unit': '°C',
'Value': '375'},
{'ID': 'POSTRUN',
'Name': 'Post Run',
'Unit': '°C',
'Value': '25'}]}],
'Parameters': [{'ID': 'GUARD',
'Name': 'Model Number',
'Unit': '',
'Value': 'G4587-60565 (Intuvo SSI Guard Chip)'},
{'ID': 'Track Oven',
'Name': 'Track Oven',
'Unit': '',
'Value': 'Off'},
{'ID': 'MAX_TEMPERATURE',
'Name': 'Max Temperature',
'Unit': '°C',
'Value': '450'}]},
{'ID': 'AQM_ISOTHERMALSETPOINTS',
'Name': 'Isothermal Setpoints',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'Bus Temperature',
'Name': 'Bus Temperature',
'Unit': '°C',
'Value': 'On 350'},
{'ID': 'Use Default Bus Temperature',
'Name': 'Use Default Bus Temperature',
'Unit': '',
'Value': 'On'},
{'ID': 'Detector 1 Tail',
'Name': 'Detector 1 Tail',
'Unit': '°C',
'Value': 'On 350'}]}],
'Parameters': []},
{'ID': 'CA_HEADING',
'Name': 'Detector Evaluation',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'CA_PERFORM_ANALYSIS',
'Name': 'Perform Detector Evaluation Test',
'Unit': '',
'Value': 'Off'}]},
{'ID': 'PEAK_EVALUATION',
'Name': 'Peak Evaluation',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'PE_PerformAnalysis',
'Name': 'Perform Peak Evaluation Test',
'Unit': '',
'Value': 'Off'}]},
{'ID': 'VALVE1',
'Name': 'Valve 1',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'Type',
'Name': 'Type',
'Unit': '',
'Value': 'Gas Sampling Valve'},
{'ID': 'GSV_LOOP_VOLUME',
'Name': 'GSV Loop Volume',
'Unit': 'mL',
'Value': '0.1'},
{'ID': 'LOAD_TIME',
'Name': 'Load Time',
'Unit': 'min',
'Value': '0.5'},
{'ID': 'INJECT_TIME',
'Name': 'Inject Time',
'Unit': 'min',
'Value': '0.5'}]},
{'ID': 'VALVE_BOX',
'Name': 'Valve Box',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'HEATER',
'Name': 'Heater',
'Unit': '°C',
'Value': 'On 40'}]},
{'ID': 'SIGNALS',
'Name': 'Signals',
'Tables': [],
'Sections': [{'ID': 'SIGNAL1',
'Name': 'Signal #1: Detector 1 Signal',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Detector 1 Signal'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Detector 1 Signal (FID)'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL2',
'Name': 'Signal #2: Guard Chip',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Guard Chip'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Temperature: Actual'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL3',
'Name': 'Signal #3: Oven',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Oven'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Temperature: Actual'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL4',
'Name': 'Signal #4: Main Bus',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Main Bus'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Temperature: Actual'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL5',
'Name': 'Signal #5: Little Bus',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Little Bus'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Temperature: Actual'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL6',
'Name': 'Signal #6: Column',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'Column'},
{'ID': 'DETAILS',
'Name': 'Details',
'Unit': '',
'Value': 'Column 1 Flow Actual'},
{'ID': 'SAVE', 'Name': 'Save', 'Unit': '', 'Value': 'On'},
{'ID': 'DATA_RATE',
'Name': 'Data Rate',
'Unit': 'Hz',
'Value': '50'}]},
{'ID': 'SIGNAL7',
'Name': 'Signal #7: ',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'None'}]},
{'ID': 'SIGNAL8',
'Name': 'Signal #8: ',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'DESCRIPTION',
'Name': 'Description',
'Unit': '',
'Value': 'None'}]}],
'Parameters': []}],
'Parameters': [{'ID': 'ModuleDisplayName',
'Name': 'Module Display Name',
'Unit': None,
'Value': 'COMPANYNAME 9000'},
{'ID': 'ModuleType', 'Name': 'Module Type', 'Unit': None, 'Value': 'GC'},
{'ID': 'Order', 'Name': 'Order', 'Unit': None, 'Value': '1'}]},
{'ID': 'METHOD_PROPERTIES',
'Name': 'Method Properties',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'Instrument_Technique',
'Name': 'Instrument Technique',
'Unit': None,
'Value': 'Gas Chromatography'}]},
{'ID': 'SCHEMA_VERSION',
'Name': 'Schema version',
'Tables': [],
'Sections': [],
'Parameters': [{'ID': 'Schema_Version',
'Name': 'Schema version',
'Unit': None,
'Value': '2.3'}]}],
'Parameters': []}},
'audittrail_json': '[]',
'approvalstate': 'Generic',
'instrumenttechnique': 'Gas Chromatography',
'filename': 'None',
'path': '/Intuvo UFGC TPH/Results/N2 Guard DCM_C6 221110/1 ppm Std DCM_C6003.sirslt/TPH_N2_Guard_221110.amx',
'lastsaved': Timestamp('2023-08-22 09:18:34.256356+0000', tz='UTC'),
'peak_area': 0.0,
'peak_area_unit': 'nan',
'peak_areapercent': 0.0,
'peak_asymmetry_10perc': 0.0,
'peak_baselinecode': 'nan',
'peak_baselineend': 0.0,
'peak_baselinemodel': 'nan',
'peak_baselineparameters': 'nan',
'peak_baselineretentionheight': 0.0,
'peak_baselinestart': 0.0,
'peak_begintime': 0.0,
'peak_correxprettime': 0.0,
'peak_drift': 0.0,
'peak_endtime': 0.0,
'peak_height': 0.0,
'peak_height_unit': 'nan',
'peak_heightpercent': 0.0,
'peak_levelend': 0.0,
'peak_levelstart': 0.0,
'peak_noise': 0.0,
'peak_peakvalleyratio': 0.0,
'peak_platespermeter_ep': 0.0,
'peak_platespermeter_jp': 0.0,
'peak_platespermeter_usp': 0.0,
'peak_puritypassed': 'nan',
'peak_relativeretentiontime': 0.0,
'peak_relativeretentiontime_unit': 'nan',
'peak_resolution_ep': 0.0,
'peak_resolution_jp': 0.0,
'peak_resolution_usp': 0.0,
'peak_retentiontime': 0.0,
'peak_retentiontime_unit': 'nan',
'peak_signaltonoise': 0.0,
'peak_symmetry': 0.0,
'peak_tailingfactor': 0.0,
'peak_theoreticalplates_ep': 0.0,
'peak_theoreticalplates_jp': 0.0,
'peak_theoreticalplates_usp': 0.0,
'peak_type': 'nan',
'peak_wander': 0.0,
'resultset_name': 'nan',
'resultsetrevision_revision': 'nan',
'resultsetrevision_lastsaved': NaT,
'resultsetrevision_iscurrent': 0.0}

Code that reproduces the bug

from ydata_profiling import ProfileReport
import pandas as pd
peaks_method_df = pd.read_sql(...)
profile = ProfileReport(peaks_method_df.head(10), title="Profiling Report")
profile.to_notebook_iframe()

pandas-profiling version

v4.5.1

Dependencies

pandas==1.5.3
Python: 3.11.1

OS

Windows 10 Enterprise

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@BioComSoftware
Copy link
Author

Note: Dropping the two large DICT fields, "reportableinformation_json" and "audittrail_json" results in the identical error, including the memory size of 2817560004071633

MemoryError: Unable to allocate 20.0 PiB for an array with shape (2817560004071633,) and data type float64

@fabclmnt
Copy link
Contributor

fabclmnt commented Sep 8, 2023

Hi @BioComSoftware ,

it seems that your system is overcommitting memory to the process. Have in mind that ydata-profiling does not support dict fields. Your dictionary fields are being treated as strings and that is probably leading to the stress of your system memory while calculating string/text related metrics.

also have a look into this thread and let me know if this was helpful: https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type

@BioComSoftware
Copy link
Author

Hi @fabclmnt
Thanks for the input. I did try the run with the reportableinformation_json (dict) and audittrail_json (dict) removed, and it still errored (with the exact same memory count BTW) so I think thats not the issue. It seems to be related to one (or more) of the float64s. Perhaps it tries to combine all the float columns into an array?

As for the stack overflow, I had seen that and tried the steps to increase the pagefile size to 70GB [Windows system] (which did not fix it.) Is there another way to by pass the overcommit limit on Windows, like there is on Linux?

Thanks!

@BioComSoftware
Copy link
Author

So, the best I could glean form the internet is; Windows simply doesn't allow for memory overcommit. I moved to working on an Ubuntu 20.04 system, enabled overcommit, and created a VERY large dedicated swap - and it works now.

@fabclmnt
Copy link
Contributor

Hi.
@BioComSoftware glad it worked and thank you for your feedback.

Let me know if there is any topic that you need support with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants