Possibly memory issues with SVC?

**Description**
I'm trying to use Intelex to accelerate training of a SVC. My dataset is pretty tame (18 MB, in fact, I am attaching it, since it is a publicly available dataset - Universal Dependencies ISDT). I wasn't expecting my 16GB of ram (and 16gb of swap) to be filled by this task, so I wonder if this could be a bug. However, I am a student, so it may be an error on my part (if so, I'm sorry).

**To Reproduce**
Steps to reproduce the behavior:
1. Download attached files in the same folder
2. Change extension of train_parser from txt to py
3. Install NLTK
4. Run the python script
5. See error

**Expected behavior**
A new file should be created with the training output. Instead, an Out Of Memory error is raised.

**Note on NLTK implementation**
The code for the function train is pretty straightforward, see source code here: https://www.nltk.org/_modules/nltk/parse/transitionparser.html#TransitionParser.train

**Environment:**
 - OS: Ubuntu 20.04
 - Intelex 2021.5
 - Python 3.9.11
 - scikit-learn 1.0.2
 - NLTK 3.7
 - conda 4.13.0
 - CPU: i5-10500

**Attachments**
[train_parser.txt](https://github.com/intel/scikit-learn-intelex/files/8818787/train_parser.txt)
[it_isdt-ud-train.txt](https://github.com/intel/scikit-learn-intelex/files/8818783/it_isdt-ud-train.txt)

EDIT:
the svmlight file generated by NLTK is actually 62 MB and the memory used during sequential training (plain sklearn) is around 1GB



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possibly memory issues with SVC? #1010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possibly memory issues with SVC? #1010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions