Caikit NLP

Caikit-NLP is a python library providing various Natural Language Processing (NLP) capabilities built on top of caikit framework.

Introduction

Caikit-NLP implements concept of "task" from caikit framework to define (and consume) interfaces for various NLP problems and implements various "modules" to provide functionalities for these "modules".

Capabilities provided by caikit-nlp:

Task	Module(s)	Salient Feature(s)
Text Generation	1. `PeftPromptTuning` 2. `TextGeneration`	1. Prompt Tuning, Multi-task Prompt tuning 2. Fine-tuning Both modules above provide optimized inference capability using Text Generation Inference Server
Text Classification	1. `SequenceClassification`	1. (Work in progress..)
Token Classification	1. `FilteredSpanClassification`	1. (Work in progress..)
Tokenization	1. `RegexSentenceSplitter`	1. Demo purposes only
Embedding	[COMING SOON]	[COMING SOON]

Getting Started

To help you quickly get started with using Caikit, we have prepared a Jupyter notebook that can be run in Google Colab. Caikit-nlp is a powerful library that leverages prompt tuning and fine-tuning to add NLP domain capabilities to caikit.

Contributing

We welcome contributions from the community! If you would like to contribute to caikit-nlp, please read the guidelines in the main project's CONTRIBUTING.md file. It includes information on submitting bug reports, feature requests, and pull requests. Make sure to follow our coding standards, code of conduct, security standards, and documentation guidelines to streamline the contribution process.

License

This project is licensed under the ASFv2 License.

Glossary

A list of terms that either may be unfamiliar or that have nebulous definitions based on who and where you hear them, defined for how they are used/thought of in the caikit/caikit-nlp project:

Fine tuning - trains the base model onto new data etc; this changes the base model.
Prompt engineering - (usually) manually crafting texts that make models do a better job that's left appended to the input text. E.g., if you wanted to do something like sentiment on movie reviews, you might come up with a prompt like The movie was: _____ and replace the _____ with the movie review you're consider to try to get something like happy/sad out of it.
PEFT - library by Huggingface containing implementations of different tuning methods that scale well - things like prompt tuning, and MPT live there. So PEFT itself isn't an approach even though parameter efficient fine-tuning sounds like one. Prompt tuning - learning soft prompts. This is different from prompt engineering in that you're not trying to learn tokens. Instead, you're basically trying to learn new embedded representations (sometimes called virtual tokens) that can be concatenated onto your embedded input text to improve the performance. This can work well, but also can be sensitive to initialization.
Multitask prompt tuning (MPT) - Tries to fix some of the issues with prompt tuning by allowing you to effectively learn 'source prompts' across different tasks & leverage them to initialize your prompt tuning etc. More information on MPT can be found at: https://arxiv.org/abs/2303.02861

The important difference between fine tuning and capabilities like prompt tuning/multi-taskprompt tuning is that the latter doesn't change the base model's weights at all. So when you run inference for prompt tuned models, you can have n prompts to 1 base model, and just inject the prompt tensors you need when they're requested instead of having n separate fine-tuned models.

Runtime Performance Benchmarking

Runtime Performance Benchmarking for tuning various models.

Notes

Currently causal language models and sequence-to-sequence models are supported.

Name		Name	Last commit message	Last commit date
Latest commit History 655 Commits
.github		.github
benchmarks		benchmarks
caikit_nlp		caikit_nlp
examples		examples
runtime_template		runtime_template
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.pylintrc		.pylintrc
.whitesource		.whitesource
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
code-of-conduct.md		code-of-conduct.md
prompt_tuning_parameter_selection.md		prompt_tuning_parameter_selection.md
pyproject.toml		pyproject.toml
runtime_config.yaml		runtime_config.yaml
setup_requirements.txt		setup_requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caikit NLP

Introduction

Getting Started

Contributing

License

Glossary

Runtime Performance Benchmarking

Notes

About

Releases

Packages

Languages

License

tharapalanivel/caikit-nlp

Folders and files

Latest commit

History

Repository files navigation

Caikit NLP

Introduction

Getting Started

Contributing

License

Glossary

Runtime Performance Benchmarking

Notes

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages