In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

The model initialization is taken from https://github.com/lm-sys/FastChat/blob/0e958b852a14f4bef5f0e9d7a5e7373477329cf2/fastchat/serve/inference.py#L81

The parameter

```
torch_dtype=torch.float16
```

ensures that 16 bit float precision is used instead of 32 bit, which theoretically reduces the memory requirement to less than 16G of GPU RAM. However, in practice, 
it turned out that sometimes 16G is not sufficient perhaps due to loading the input on CUDA as well, so move up to 2 T4 or increase total GPU RAM by other means to beyond 16G. Alternatively, perhaps there's a way to clear the GPU RAM before generating to make space.

In [2]:
tokenizer = AutoTokenizer.from_pretrained("/home/jupyter/vicuna-7b", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("/home/jupyter/vicuna-7b", low_cpu_mem_usage=True, 
                                             torch_dtype=torch.float16,
                                             device_map="auto"
                                            )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The FastChat code did not seem to use much parameter besides temperature. However, it may be worth trying the original llama parameters, which are 

```
do_sample=True, top_p=0.95, temperature=0.8
```

The temperature controls the stochasticity or "creativity" of the models. A higher temperature implies more stochastic behavior, while a lower temperature implies lower stochastic behavior.

In [3]:
def generate(prompt, max_new_tokens=1024, temperature=0.7):
    torch.cuda.empty_cache()
    # Create the prompt according to the pre-trained format: see https://github.com/lm-sys/FastChat/blob/0e958b852a14f4bef5f0e9d7a5e7373477329cf2/fastchat/conversation.py#L187
    system = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. "
    template = system + "USER: %s "
    input_start = template % prompt
    input_final = template % prompt + "ASSISTANT:"
    
    inputs = tokenizer(input_final, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, 
                             temperature=temperature)
    result = tokenizer.batch_decode(outputs, skip_special_tokens=True, spaces_between_special_tokens=False)
    
    # Since the task is Autoregressive LM, it's gonna repeat the prompt in the output. Thus, the prompt itself is removed from the output.
    
    return result[0][len(input_start):]

In [4]:
print(generate("How to make a BLT?"))

ASSISTANT: To make a BLT, you will need the following ingredients:

* 2 slices of bread
* 8 large bacon strips, cooked and crumbled
* 8 large lettuce leaves
* 8 slices of tomato
* 2 tablespoons of mayonnaise

Instructions:

1. Toast the 2 slices of bread until they are golden brown.
2. In a bowl, mix together the crumbled bacon and mayonnaise.
3. Lay one slice of bread on a plate and spoon a small amount of the bacon mixture onto the bread.
4. Top the bacon mixture with a few lettuce leaves, then a few slices of tomato.
5. Repeat the process with the second slice of bread, placing the bacon mixture on top of the bread.
6. Cut the sandwich into diagonal slices and serve.


In [11]:
article = """Kaiser Permanente, a pioneering integrated health system based on the West Coast, has agreed to acquire a Pennsylvania-based hospital operator in a bid to create a new national player in the rapidly consolidating healthcare market. The combination of Kaiser and Geisinger, which must be reviewed by federal and state agencies, would have more than $100 billion in revenue. The subsidiary, called Risant Health, aims to acquire four or five more hospital systems and get to total revenue of $30 billion to $35 billion over the next five years, the systems said. Geisinger Chief Executive Jaewon Ryu would lead Risant. “Kaiser Permanente, through Risant, is really stepping out, putting a stake in the ground, and saying we’re going to partner with these leading health systems,” Kaiser Chief Executive Greg Adams said. Both systems helped model the integration of health insurance, hospital ownership and other operations. The biggest healthcare companies in the U.S., including UnitedHealth Group Inc. and CVS Health Corp., have been moving in that direction, unveiling a growing roster of deals to pull together doctors, clinics and other operations with health-insurance units. These conglomerates say they aim to implement new payment models that are supposed to bolster quality and reduce costs. “Kaiser knows how to bring those pieces together and has been doing that for its entire history,” said Chas Roades, co-president of Gist Healthcare, a consulting firm owned by industry advisers Kaufman Hall. Mr. Roades said he expects a number of hospital systems would probably be eager to join Risant, because of Kaiser’s resources and expertise in managing patients’ health and the cost of care. “This puts them in the conversation with UnitedHealth and CVS Health in terms of developing a national care platform,” he said. Kaiser, founded in 1945 and based in Oakland, Calif., has long been a bellwether for uniting different healthcare functions. It reported $95.4 billion in revenue last year from 39 hospitals, 12.6 million insurance plan members and nearly 24,000 physicians. The nonprofit has most fully developed the integrated setup in its home state, where it has a commanding presence, but it also operates in other regions, including the Pacific Northwest, as well as Colorado and Hawaii. Geisinger, based in Danville, Pa., reported about $6.9 billion in revenue last year. It counts 10 hospitals and about 600,000 health plan enrollees and employs more than 1,700 doctors. 
  """

response = generate("Assume you are an individual investor. Give a summarization of article, include time, event, and entity involved and what immediate actions should you take given this information.\n" + article)
print(response)

ASSISTANT: Summary:

Kaiser Permanente, a healthcare system based on the West Coast, has agreed to acquire a Pennsylvania-based hospital operator, Geisinger, in a bid to create a new national player in the consolidating healthcare market. The combination of the two systems would have more than $100 billion in revenue and aim to acquire four or five more hospital systems over the next five years, with the subsidiary, Risant Health, leading the acquisition efforts. The acquisition would allow Kaiser Permanente to expand its operations and expertise in managing patients' health and the cost of care.

Immediate Action:

As an individual investor, there is no immediate action required as the acquisition is between two entities and not directly impacting the individual investor. However, it is important to keep an eye on the healthcare market and the impact of consolidation on the industry and individual investors. It may be beneficial to review investment portfolios and consider investing i