# LLM Text Summarization

Concepts:
- 10-K Market Risk Disclosue
- HuggingFace transformers APIs
- OpenAI GPT API's
- Text summarization and evaluation


In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import textwrap
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers.utils import logging
from tqdm import tqdm
from finds.database import SQL, RedisDB
from finds.unstructured import Edgar
from finds.structured import BusDay, CRSP, PSTAT
from finds.readers import Sectoring
from secret import paths, credentials
logging.set_verbosity_error() # logging.set_verbosity_info() #logging.set_verbosity_warning()
VERBOSE = 0
SAVED = False

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
sql = SQL(**credentials['sql'], verbose=VERBOSE)
user = SQL(**credentials['user'], verbose=VERBOSE)
bd = BusDay(sql)
rdb = RedisDB(**credentials['redis'])
crsp = CRSP(sql, bd, rdb, verbose=VERBOSE)
pstat = PSTAT(sql, bd, verbose=VERBOSE)
ed = Edgar(paths['10X'], zipped=False, verbose=VERBOSE)

Last FamaFrench Date 2024-04-30 00:00:00


In [3]:
# Retrieve universe of stocks
univ = crsp.get_universe(bd.endmo(20231231))

In [4]:
# lookup company names
comnam = crsp.build_lookup(source='permno', target='comnam', fillna="")
univ['comnam'] = comnam(univ.index)

In [5]:
# lookup sic codes from Compustat, and map to FF 10-sector code
sic = pstat.build_lookup(source='lpermno', target='sic', fillna=0)
industry = Series(sic[univ.index], index=univ.index)
industry = industry.where(industry > 0, univ['siccd'])
sectors = Sectoring(sql, scheme='codes10', fillna='')   # supplement from crosswalk
univ['sector'] = sectors[industry]

Qualitative and Quantitave Disclosure about Market Risk item from 10-K's

In [6]:
# retrieve from 10K's in 1Q 2024
item, form = 'qqr10K', '10-K'
rows = DataFrame(ed.open(form=form, item=item))
found = rows[rows['date'].between(20240101, 20240331)]\
    .drop_duplicates(subset=['permno'], keep='last')\
    .set_index('permno')

In [7]:
# Keep largest decile of stocks
found = found.loc[found.index.intersection(univ.index[univ['decile'] == 1])]

In [8]:
# Keep minimum length
docs = {permno: ed[found.loc[permno, 'pathname']].lower()
        for permno in found.index}
permnos = [permno for permno, doc in docs.items() if len(doc)>2000]
found = found.join(Series(docs, name='item').reindex(permnos), how='inner')

## HuggingFace APIs

Command line interface
   
`pip install huggingface_hub["cli"]`

- To empty cache (in ~/.cache/huggingface/)
    
  `huggingface-cli delete-cache`




### Transformers

- AutoTokenizers
- AutoModels
  - `generate` method: 


  



In [9]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
save_name = paths['scratch'] / "Llama-3-8B-Instruct"
model_name = save_name if SAVED else model_id   # load from folder if saved locally

### Quantization

https://medium.com/@manuelescobar-dev/implementing-and-running-llama-3-with-hugging-faces-transformers-library-40e9754d8c80

Quantization reduces the hardware requirements by loading the model weights with lower precision. Instead of loading them in 16 bits (float16), they are loaded in 4 bits, significantly reducing memory usage from ~20GB to ~8GB.

In [10]:
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
)
bnb_config

BitsAndBytesConfig {
  "_load_in_4bit": true,
  "_load_in_8bit": false,
  "bnb_4bit_compute_dtype": "float16",
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": false,
  "llm_int8_enable_fp32_cpu_offload": false,
  "llm_int8_has_fp16_weight": false,
  "llm_int8_skip_modules": null,
  "llm_int8_threshold": 6.0,
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes"
}

In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    #torch_dtype=torch.bfloat16,
    device_map="cuda",  # "auto",  'cuda'
)
model

Loading checkpoint shards: 100%|██████████| 4/4 [00:07<00:00,  1.77s/it]


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): Ll

In [12]:
# Maximum context length
print('max context length', model.config.max_position_embeddings)

max context length 8192


In [13]:
# save the model to local disk
if not SAVED:
    model.save_pretrained(save_name)
print(f"CUDA memory: {torch.cuda.memory_allocated()/1e9:.2f} GB")

CUDA memory: 6.26 GB


In [14]:
def generate_response(text, max_char=20000,
                      role="You are a helpful AI assistant",
                      prompt="Write a concise summary of the text."):
    content = f"""
{prompt}

Text in triple quotes: '''{(text+' ')[:max_char]}'''

Summary:""".strip()
    
    messages = [
        {"role": "system", "content": role},
        {"role": "user", "content": content},
    ]

    input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(model.device)

    if VERBOSE:
        print('tokens:', np.prod(input_ids.shape),
              model.config.max_position_embeddings)

    terminators = [
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]

    outputs = model.generate(
        input_ids,
        max_new_tokens=256,  
        eos_token_id=terminators,
        #do_sample=True,
        temperature=0.01, #0.6,
        #top_p=0.9,
    )
    response = outputs[0][input_ids.shape[-1]:]
    return tokenizer.decode(response, skip_special_tokens=True)

In [15]:
# Sample companies from each industry sector
docs = univ.loc[found.index].groupby('sector').sample(1)

In [16]:
summary = {}
for permno in docs.index:
    print('=====', univ.loc[permno, 'comnam'], '=====')
    summary[permno] = generate_response(found.loc[permno, 'item'])
    print("\n".join([textwrap.fill(s) for s in summary[permno].split('\n')]))
    print()

===== FORD MOTOR CO DEL =====
The text discusses the market risk management practices of a company,
including its exposure to foreign currency exchange rates, commodity
prices, and interest rates. The company uses derivative instruments to
hedge its exposures and has a risk management committee that monitors
and manages these risks. The company also uses economic value
sensitivity analysis and re-pricing gap analysis to evaluate potential
long-term effects of changes in interest rates. The company's market
risk exposure is quantified and disclosed in the text, including its
sensitivity to changes in interest rates and foreign currency exchange
rates.

===== OCCIDENTAL PETROLEUM CORP =====
Here is a concise summary of the text:

Occidental's financial results are sensitive to fluctuations in oil,
natural gas, and commodity prices. The company has implemented risk
management controls to mitigate market risk, including limits on value
at risk, credit limits, and segregation of duties. Occ

### System prompt

The system prompt is an initial set of instructions that serve as the starting point when starting a new chat session. This defines things for the model and helps to focus its capabilities.
It is a good place to define a role or a well-known person: the model will assume that role or person including their style.

Creating prompts based on the role or perspective of the person can be a useful technique for generating
more relevant and engaging responses from language models.


In [17]:
role = "You are a helpful first-grade teacher.",
prompt = "Write a simple summary of the text for a first-grader."
simple = {}
for permno in docs.index:
    print('=====', univ.loc[permno, 'comnam'], '=====')
    simple[permno] = generate_response(found.loc[permno, 'item'],
                                       role=role, prompt=prompt)
    print("\n".join([textwrap.fill(s) for s in simple[permno].split('\n')]))
    print()

===== FORD MOTOR CO DEL =====
Hi there! So, you know how sometimes things can change, like the price
of something or the value of money? Well, our company has to deal with
those kinds of changes, and we have to make sure we're prepared.

We have a special team that helps us figure out how to manage those
changes, and they use special tools like computers and math to help us
make good decisions. They also work with other teams to make sure
we're making the best choices for our company.

One of the things they do is use something called "derivatives" to
help us manage our risks. It's like having insurance, but instead of
protecting our stuff, it helps us protect our money.

We also have to think about things like interest rates and currency
exchange rates, which can affect how much money we have. It's like
trying to predict the weather, but instead of rain or sunshine, we're
trying to predict how much money we'll have!

Our team is very good at this, and they work hard to make sure we're

## ChatGPT LLM

TODO

### OpenAI APIs

In [2]:
# TODO: AzureOpenAI 

## Text summarization and evaluation

TODO

### BLEU

### ROUGE

### Readability