Dynamic Pipelines #13833
Unanswered
billziss-gh
asked this question in
Help: Coding & Implementations
Dynamic Pipelines
#13833
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a need to have multiple spaCy pipelines that start with the same base language model, but then perform different custom processing. What is the recommended approach for doing so?
For example, consider that my base pipeline is
en_core_web_trf
and I also have custom componentsA
andB
. I would like to have two different pipelines:en_core_web_trf + A
en_core_web_trf + B
Now I understand that I can
spacy.load("en_core_web_trf")
twice and thenadd_pipe
myA
orB
components to create my respective pipelines. However I am uncertain if this is the right approach, because:I do not understand the memory usage implications of doing so: do two
spacy.load
calls for the same model imply twice (or close to twice) the amount of memory used?The
spacy.load
function seems slow regardless of how many times it is called with the same model. This means that it is not an ideal candidate for dynamic pipeline generation.Another alternative would be to prepare the pipeline
en_core_web_trf + A + B
and then selectively enable components needed. However this seems a bit hacky and it would also not work in all my scenarios (e.g. when bootstrapping or training a new componentC
).In an ideal world I would like to load a prototype
en_core_web_trf
pipeline and then dynamically create new pipelines by adding random components to them. I can probably achieve this goal by creating my own custom classDynamicPipeline
that overrides__call__
and behaves likespacy.Language
. However this solution seems overkill and likely there is a far better solution.Beta Was this translation helpful? Give feedback.
All reactions