In [1]:
from typing import List
from pydantic import BaseModel, Field

from langchain_google_vertexai import ChatVertexAI

from article.sources import sources
from article.utils import flatten_openapi, generate_extract

import mistune
from IPython.display import Markdown
from tqdm.notebook import tqdm

In [2]:
class StyleGuide(BaseModel):
    audience: str
    style: str
    recommendations: List[str]

In [3]:
class Section(BaseModel):
    name: str = Field(title='Name/title of the section')
    points: List[str] = Field(title='List of no more than 3 main points for this section')
    summary: str = Field(title="Summary of the section in 20-50 words")
    experts: List[str] = Field(title="List of 3-5 experts roles/descrptions who can help with article by providing feedback")

    def __str__(self):
        return f'## {self.name}\n{self.summary}\n\n'+','.join(self.points)

class Outline(BaseModel):
    title: str = Field(title="Title of the article")
    summary: str = Field(title="Summary of the article in 20-50 words")
    sections: List[Section]

    def __str__(self):
        return f'# {self.title}\n{self.summary}\n\n' + '\n'.join([str(x) for x in self.sections])

In [4]:
articles = sources()
style_articles = [x[1] for x in articles if x[0].startswith('data/oreilly')]
articles = [x[1] for x in articles]

### Get style guid

Replicate article style

In [5]:
prompt = [
    ('system', 'You are analyzing articles. You sources are' + '\n'.join(style_articles)),
    ('user', 'Give concise descrition of audience and style of those articles. Write it as recommendation for another authors to follow when writting simmilar articles. Create sections: Audience, Style, Recommendations. Use Heading 2 for section')
]

In [6]:
gen_model = ChatVertexAI(model='gemini-1.5-pro-002', temperature=2, top_k=40, top_p=1)
parse_model = ChatVertexAI(model='gemini-1.5-flash-002')

In [7]:
styles = []
for i in tqdm(range(10)):
    style = generate_extract(prompt, gen_model, StyleGuide, parse_model)
    styles.append(style)

  0%|          | 0/10 [00:00<?, ?it/s]

In [23]:
Markdown(styles[9][0])

**I. Tactical Advice (Effective LLM Development):**

* **Prompt Engineering:**
    * Start with fundamentals: few-shot, chain-of-thought, and resource integration.
    * Concise and focused prompts are better than complex, multipurpose ones. Decouple prompts where relevant to evaluate test impacts across isolated single cases across user needs, test individual specialized cases
    * Structure I/O formats consider implementation for maintainability interoperability LLM stacks
        *  **Instructor:** Best for language model API access/integration cases. Use SDK to simplify API interfacing LLM generate public corpus and HF repos; prioritize open source dataset use instructor accordingly - based on how intend model' production 

        *   **Outlines:** Effective for locally hosted HuggingFace models deployments to structure testing feedback loop Llms prompts responses evaluation easier locally maintain; create your hugging-face public data. Create also models that simplify structuring for further building blocks based LL stacks if your company wants make it to the marketplace or enable downstream integrations by developer building using this tech

* **Information Retrieval (RAG):**  Combine, leverage keyword and also multimodal search vector emebddings eval benchmark which gives greatest increase based resources versus  simpler keuword where there is negligable improvements by adding RAG to outputs generated relevance and downstream processes

keep minimal ensure largest model sizes fits still generates concise RAG index outputs by focusing text actually added/processed avoid excess verbose data, use multiple scoring RAG including keuword MRR, NGCG as metircs or own production data to avoid overly rely metrics if output dont fit purpose to impro RAG per reduce sizes models used 

 prioritize RAG data optimizeing sizes content reduce context/windows size model requests which improves also token compute response latency overall


 choose/test simpler or bm simple me only implement RAG/ebmedd systems truly adds measurable perf improve on results (instead use always complex and test performance if metrics) prioritizing by metric, test then evalute accordingly on small chuncks to make debug simpler in multi-prompt queries to generate results when chain LL



  ensure balanced retrieval cost


 fine tunes based business compliance not blindly each type application since tuning specialize niche task better result for long rung because model baseline will contin adapt shift. Ensure gains adding to pipelines. Prioritze custom domain models data where pub ones  generate good-outputs but your data too spec so needed by you LL enhance  model apps




 **II. Strategic Decisions**: pretraining general only if re depend clear  gains case compliance



 prioritize use current specialized available products outsource compute costs LL hosted already and implement fine only needed due private setup in productions unless huge gains value justifies build maintena from scratch full ML. dev LL pretrn platform for your own L data


enhance existing systems workflows. Avoid automation LL if dont gen needed output quality use (use instead focused, specialized uses that give benefit over existing process). Centaury type. tools human focus

 evaluate your needs business then dev prior prompt test; user eval; iterate refine


use trend models as costs to evaluate viability timelines projects L generate products when are feasible (game $ now LL powering to generate in games in under$ with pacman ar example cost compute) based trend analysis


 focus release value before generic demos test on various prompts and output evaluation on wide range queries prioritize production and results produced then presentation for



 LLms are soft build invest to manage L stack as it also soft: needs regular upkeep maintanence; testing  e eeach layers refinement refine after production release check outputs feedback users and dont assume prompt one shot "do then deploy" wins cycle test feedback loops when implement  b avoid surprises



* other key takeaways/aspects:* avoid generalize; specialize specific outputs types based use cases prompts prioritize ship deep user valuable interactions enhance work rather replaceing prioritize smaller specialized to release finished functional soft with user interate build feed  add complexity tuning needed for task to increase perf needed if is needed evaluate benchmarks accordingly on each implemented systems with given prod use constraints first and then release prioritizing actual user experience benefits than blindly following just latest developments/method trends

In [28]:
[styles[i][1] == styles[i-1] for i in range(1, 10)]

[False, False, False, False, False, False, False, False, False]

### Create outline

In [11]:
initial = """
genai in legacy environment

what is different to startup?
- existing processes, do we want to automate (potenialy inefective process) or optimize
- existing applications - are APIs ready for GenAI?
- old documentation - if I have final1.doc and final2.doc which one has valid info?

even with the the challenges there is tremendous benefits of GenAI in legacy env

What aproaches works:
- start small, focused. Having 100s ideas are great but focus is important
- start with business - what outcomes do we want and how to measure them
- translate it to scenarios - input -> output. with this start evals

Evals
Human -> llm as a judge

Prompt engineering
Split to small steps. Easier to manage, easier to evaluate and easier to "explain" = less black box feeling

Logging
log everything, log authomaticaly. Don't trust applications to log on their side

Security
by default assume not-safe
"""
prompt = [
    ('system', 'You are preparing to write new article. Follow the instruction in the schema. You can chance title and/or provided summary. You sources are' + '\n'.join(articles)),
    ('user', f'Prepare outline and plan for new article. Initial thoughts {initial}')]

In [12]:
o_model = ChatVertexAI(model='gemini-1.5-pro-002').with_structured_output(schema=flatten_openapi(Outline.schema()), method='json_mode')

In [13]:
response = o_model.invoke(prompt)

In [14]:
oresponse = Outline.parse_obj(response)

In [15]:
Markdown(str(oresponse))

# Generative AI in Legacy Environments: Challenges, Strategies, and Best Practices
This article explores the specific challenges and considerations of implementing generative AI in legacy environments, offering effective strategies, approaches, and best practices for successful integration.

## Introduction
This section sets the stage by highlighting the unique challenges and considerations of integrating GenAI in legacy environments.

What makes generative AI implementation different in a legacy environment compared to a startup?,Existing processes: automate or optimize?,API readiness for GenAI integration in existing applications,Challenges with outdated documentation
## Benefits of GenAI in Legacy Environments
This section emphasizes the potential benefits and return on investment of GenAI implementation despite inherent challenges.

Discuss the potential advantages and value propositions of adopting GenAI within established organizations
## Effective Approaches for GenAI Integration
This section outlines practical strategies for successful GenAI adoption in legacy systems.

Start small and focused: prioritize key areas for initial implementation,Business-driven approach: align GenAI initiatives with desired outcomes and measurable metrics,Scenario-based planning: define specific input-output scenarios for targeted development and evaluation
## Evaluation Strategies for GenAI
This section explores different evaluation methods to ensure quality and reliability of GenAI outputs.

Human evaluation,LLM as a judge for automated assessment
## Prompt Engineering Techniques
This section focuses on prompt engineering best practices to enhance GenAI performance and control.

Splitting complex tasks into smaller, manageable steps,Improving prompt clarity for reduced ambiguity and better explainability
## Logging and Monitoring for GenAI
This section discusses logging strategies to gain insights into GenAI behavior and performance.

Importance of comprehensive logging for debugging and analysis,Automated logging mechanisms for consistent data capture,Building logging systems independent of application-specific logging
## Security Considerations for GenAI
This section highlights the importance of robust security measures in GenAI implementations.

Default to a not-safe assumption: prioritize security measures from the outset,Address potential vulnerabilities and security risks associated with GenAI

In [34]:
prompt2 = prompt
prompt2.append(('ai', str(oresponse)))

In [35]:
prompt2.append(('user', 'Change first section to more focus on difference betwen building app in new environment to doing the same thing in legacy (existing processes, API not ready for such apps, documents not avialable etc'))

In [36]:
response2 = o_model.invoke(prompt2)

In [38]:
response2

{'sections': [{'experts': ['Software engineers',
    'Data scientists',
    'IT architects'],
   'name': 'Challenges of Building GenAI Apps in Legacy Environments',
   'points': ['Existing processes and APIs may not be compatible with GenAI applications.',
    'Limited documentation can hinder integration efforts.',
    'Data silos and security concerns pose additional challenges.',
    'Talent gaps in AI/ML expertise can slow down development.'],
   'summary': 'Building GenAI applications in legacy environments differs significantly from new environments. Existing processes, APIs, and documentation may not be ready for GenAI integration, requiring extra effort.'},
  {'experts': ['Cloud architects', 'Data engineers', 'Security specialists'],
   'name': 'Strategies for Successful GenAI Integration',
   'points': ['Modernization and cloud migration are crucial for compatibility.',
    'Data integration and management break down data silos.',
    'Security and compliance must be addressed