# Usage Example of TrainWiseBM25Retriever

In [12]:
from langchain_core.load.load import loads
from bm25retriever import TrainWiseBM25Retriever

docs = []
with open("../chunks.jsonl", "r", encoding="utf-8") as f:
    for line in f:
        docs.append(loads(line.strip()))

bm25_retriever = TrainWiseBM25Retriever.from_documents(docs)
bm25_retriever.k = 5

query = "What is LoRA?"
filter = {"source": "hf"}
retrieved_docs = bm25_retriever.invoke(query, filter=filter)
for doc, score in retrieved_docs:
    print(doc.page_content)

 we developed are released under the [HF1BitLLM](https://huggingface.co/HF1BitLLM) organization. Two of these models were fine-tuned on 10B tokens with different training setup, while the third was fine-tuned on 100B tokens. Notably, our models surpass the Llama 1 7B model in MMLU benchmarks.

### How to Use with Transformers

To integrate the BitNet architecture into Transformers, we introduced a new quantization method called "bitnet" ( [PR](https://github.com/huggingface/transformers/pull/33410)). This method involves replacing the standard Linear layers with specialized BitLinear layers that are compatible with the BitNet architecture, with appropriate dynamic quantization of activations, weight unpacking, and matrix multiplication.

Loading and testing the model in Transformers is incredibly straightforward, there are zero changes to the API:

```python
model = AutoModelForCausalLM.from_pretrained(
    "HF1BitLLM/Llama3-8B-1.58-100B-tokens",
    device_map="cuda",
    torch_dtype=