Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way, or a best practice to scan this large amount of files for processing ? #131

Closed
NguyenHoangMinhkkkk opened this issue May 3, 2024 · 4 comments

Comments

@NguyenHoangMinhkkkk
Copy link

This is my code for getting all files existed in folder on the initial run of my application.
it's works good.

but there is a problem with Large file amount.
if the folder contains 1M or 2M files or more,
-> performance goes down, RAM consuming because this.proccessFile function cannot keep these many paths for processing

is there a way, or a best practice to scan this large amount of files for processing ?

this.proccessFile is a reader function to read content of files.


import { glob } from 'glob';
import watch from 'node-watch';
    watcher.on('ready', async () => {
      this.logger.log('[Watch] Ready, run initial scan');

      const patterns = [
        `${this.FOLDER}/**/*.txt`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.csv`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.TXT`.replace(/\\/g, '/'),
        `${this.FOLDER}/**/*.CSV`.replace(/\\/g, '/'),
      ];

      const stream = glob.stream(patterns);

      let fileCount = 0; // counting files
      
      stream.on('data', (filePath) => { 
        
        fileCount = fileCount + 1;
        this.logger.log('[Watch] scan: ' + fileCount);
        
        this.processFile(filePath);
      });
    });
@yuanchuan
Copy link
Owner

Hi @NguyenHoangMinhkkkk ,

Why do you need to scan all the files in the first place? Did it work by just watching the directory instead?

The ready event emits after the watcher has been setup. If you're trying to traverse each file and it should be slow as the amount of the files grow. But it seems that it has nothing to do with the watcher.

@NguyenHoangMinhkkkk
Copy link
Author

Thank you!

my work is watch the storage-folder to see which file added into it. but if the App stopped unfortunately, files added into folder stacking, the long time stopped, the more files amount. and i have to find a solution to handle these stacked files in the storage-folder when the App started again.

normally there are files added into the folder ~20/s mean 1.728.000/day.

@yuanchuan
Copy link
Owner

yuanchuan commented May 27, 2024 via email

@NguyenHoangMinhkkkk
Copy link
Author

An in-memory database might help?

i'm doing a workaround solution. just split these stacked files into smaller blocks and sync handling them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants