Dynamic Pipelines #13833

billziss-gh · 2025-06-17T12:43:33Z

billziss-gh
Jun 17, 2025

I have a need to have multiple spaCy pipelines that start with the same base language model, but then perform different custom processing. What is the recommended approach for doing so?

For example, consider that my base pipeline is en_core_web_trf and I also have custom components A and B. I would like to have two different pipelines:

en_core_web_trf + A
en_core_web_trf + B

Now I understand that I can spacy.load("en_core_web_trf") twice and then add_pipe my A or B components to create my respective pipelines. However I am uncertain if this is the right approach, because:

I do not understand the memory usage implications of doing so: do two spacy.load calls for the same model imply twice (or close to twice) the amount of memory used?
The spacy.load function seems slow regardless of how many times it is called with the same model. This means that it is not an ideal candidate for dynamic pipeline generation.

Another alternative would be to prepare the pipeline en_core_web_trf + A + B and then selectively enable components needed. However this seems a bit hacky and it would also not work in all my scenarios (e.g. when bootstrapping or training a new component C).

In an ideal world I would like to load a prototype en_core_web_trf pipeline and then dynamically create new pipelines by adding random components to them. I can probably achieve this goal by creating my own custom class DynamicPipeline that overrides __call__ and behaves like spacy.Language. However this solution seems overkill and likely there is a far better solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Dynamic Pipelines #13833

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Dynamic Pipelines #13833

Uh oh!

billziss-gh Jun 17, 2025

Replies: 0 comments

billziss-gh
Jun 17, 2025