Skip to content

Possibly memory issues with SVC? #1010

Closed
@Stack-it-up

Description

@Stack-it-up

Description
I'm trying to use Intelex to accelerate training of a SVC. My dataset is pretty tame (18 MB, in fact, I am attaching it, since it is a publicly available dataset - Universal Dependencies ISDT). I wasn't expecting my 16GB of ram (and 16gb of swap) to be filled by this task, so I wonder if this could be a bug. However, I am a student, so it may be an error on my part (if so, I'm sorry).

To Reproduce
Steps to reproduce the behavior:

  1. Download attached files in the same folder
  2. Change extension of train_parser from txt to py
  3. Install NLTK
  4. Run the python script
  5. See error

Expected behavior
A new file should be created with the training output. Instead, an Out Of Memory error is raised.

Note on NLTK implementation
The code for the function train is pretty straightforward, see source code here: https://www.nltk.org/_modules/nltk/parse/transitionparser.html#TransitionParser.train

Environment:

  • OS: Ubuntu 20.04
  • Intelex 2021.5
  • Python 3.9.11
  • scikit-learn 1.0.2
  • NLTK 3.7
  • conda 4.13.0
  • CPU: i5-10500

Attachments
train_parser.txt
it_isdt-ud-train.txt

EDIT:
the svmlight file generated by NLTK is actually 62 MB and the memory used during sequential training (plain sklearn) is around 1GB

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions