# Summarizing text

A common LLM task is extracting information from unstructured or semi-structured date. A common challenge is forcing the LLM to extract data in a consistent and predictable structure. We can use LMQL to collect summary information in a predictable json format that can be provided to other bits of code.

In [1]:
import lmql
from bs4 import BeautifulSoup
import html2text
import requests
from rich.pretty import pprint
from dataclasses import dataclass
from dataclasses_json import dataclass_json

In [2]:
page_url = "https://www.meetup.com/ai-ml-engineering-aie-an-rmaiig-subgroup/events/299822840/"

## Get a text representation of the page

The first step is to download the page and get the text content from it

In [3]:
def get_markdown_for_url(url):
    """Get a markdown representation for a given URL

    Args:
        url (str): The URL to get the markdown for

    Returns:
        str: The markdown representation of the URL
    """
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    return html2text.html2text(str(soup))

In [4]:
from IPython.display import Markdown


In [5]:
markdown_text = get_markdown_for_url(page_url)
Markdown(markdown_text)

Skip to content

[](https://www.meetup.com/)

# An Introduction to Language Model Query Language (LMQL)

[![Photo of Bill McIntyre](https://secure-
content.meetupstatic.com/images/classic-member/317092633/45x45.jpg)Hosted
ByBill M.](https://www.meetup.com/ai-ml-engineering-aie-an-rmaiig-
subgroup/events/299822840//attendees/)

![An Introduction to Language Model Query Language
\(LMQL\)](https://secure.meetupstatic.com/photos/event/8/3/b/c/600_519753724.webp?w=750)

## Details

## Details

Join us for a discussion about **Language Model Query Language (LMQL)**

**Why?** LMQL is a programming language designed for interacting with large
language models (LLMs) efficiently and effectively. It supports batch requests
and detailed parameter control, helping with both processing efficiency and
fine-tuning of responses. LMQL can help facilitate automation, efficiency, and
cross-model compatibility in natural language processing tasks.

**What?** Cameron Pope, CTO at [Meaningly.com](http://meaningly.com/), and
fellow LLM/AI/Ollama enthusiast will present on LMQL on Tuesday 3/19 at 8:30
AM at Founder Central / Sweater in Boulder. I'll bring the coffee!

Artificial IntelligenceAI AlgorithmsBig DataMachine LearningMachine Learning
with Python

[![Photo of AI/ML Engineering \(AIE an RMAIIG Subgroup\)
group](https://secure-content.meetupstatic.com/images/classic-
events/517328402/56x56.jpg?w=56?w=128)AI/ML Engineering (AIE an RMAIIG
Subgroup)See more events](https://www.meetup.com/ai-ml-engineering-aie-an-
rmaiig-subgroup/events/)

[![Photo of AI/ML Engineering \(AIE an RMAIIG Subgroup\)
group](https://secure-content.meetupstatic.com/images/classic-
events/517328402/56x56.jpg?w=56?w=128)AI/ML Engineering (AIE an RMAIIG
Subgroup)public
group![question](https://secure.meetupstatic.com/next/images/QuestionV2.svg?w=48)](https://www.meetup.com/ai-
ml-engineering-aie-an-rmaiig-subgroup/)

Tuesday, March 19, 2024  
8:30 AM to 9:30 AM MDT

[Founder Central by
Sweater](https://www.google.com/maps/search/?api=1&query=40.021038%2C%20-105.21757)

2000 Central Ave #100 · Boulder, CO

[![Photo of AI/ML Engineering \(AIE an RMAIIG Subgroup\)
group](https://secure-content.meetupstatic.com/images/classic-
events/517328402/56x56.jpg?w=56?w=128)AI/ML Engineering (AIE an RMAIIG
Subgroup)public
group![question](https://secure.meetupstatic.com/next/images/QuestionV2.svg?w=48)](https://www.meetup.com/ai-
ml-engineering-aie-an-rmaiig-subgroup/)

[![Google map of the user's next upcoming event's location](https://maps-
googleapis.meetup.com/maps/api/staticmap?center=40.021038%2C%20-105.21757&zoom=17&size=480x300&format=png&scale=1&key=AIzaSyBhcQiQISkjMBwLAugJj8V78nMPfitnr44&markers=icon%3Ahttps%3A%2F%2Fsecure.meetupstatic.com%2Fnext%2Fimages%2Fevent%2Fmup-
custom-google-map-
pin.png%7Ccolor%3A0xF65858%7C40.021038%2C%20-105.21757)](https://www.google.com/maps/search/?api=1&query=40.021038%2C%20-105.21757)

An Introduction to Language Model Query Language (LMQL)

FREE

Attend

Create your own Meetup group.[Get Started](https://www.meetup.com/how-to-
group-start/)

[Start a new group](https://www.meetup.com/how-to-group-start/)

Your Account

  * [Sign up](https://www.meetup.com/register/?returnUri=https%3A%2F%2Fwww.meetup.com%2Fai-ml-engineering-aie-an-rmaiig-subgroup%2Fevents%2F299822840%2F)
  * [Log in](https://www.meetup.com/login/?returnUri=https%3A%2F%2Fwww.meetup.com%2Fai-ml-engineering-aie-an-rmaiig-subgroup%2Fevents%2F299822840%2F)
  * [Help](https://www.meetup.com/en-US/help/)
  * [Become an Affiliate](https://www.meetup.com/lp/affiliate-program/)

Discover

  * [Groups](https://www.meetup.com/find/?source=GROUPS)
  * [Calendar](https://www.meetup.com/find/?source=EVENTS)
  * [Topics](https://www.meetup.com/topics/)
  * [Cities](https://www.meetup.com/cities/)
  * [Online Events](https://www.meetup.com/find/online-events/)
  * [Local Guides](https://www.meetup.com/blog/category/local-guides/)
  * [Make Friends](https://www.meetup.com/lp/friendship-and-socializing/)

Meetup

  * [About](https://www.meetup.com/about/)
  * [Blog](https://www.meetup.com/blog/)
  * [Meetup Pro](https://www.meetup.com/meetup-pro/)
  * [Careers](https://www.meetup.com/careers/)
  * [Apps](https://www.meetup.com/apps/)
  * [Podcast](https://www.meetup.com/blog/category/keep-connected-podcast/)

Follow us

[](https://www.facebook.com/meetup/)[](https://twitter.com/Meetup/)[](https://www.youtube.com/meetup)[](https://www.instagram.com/meetup/)[](https://www.tiktok.com/@meetup)

[![get-it-on-google-play](https://secure.meetupstatic.com/next/images/app-
download/android/download_en-
US.svg?w=384)](https://play.google.com/store/apps/details?id=com.meetup&hl=en-
US)[![download-on-the-app-
store](https://secure.meetupstatic.com/next/images/app-
download/ios/download_en-
US.svg?w=384)](https://apps.apple.com/us/app/meetup/id375990038)

© 2024 Meetup[Terms of Service](https://help.meetup.com/hc/en-
us/articles/360027447252)[Privacy
Policy](https://www.meetup.com/privacy/)[Cookie
Policy](https://help.meetup.com/hc/en-
us/articles/360046339551)[Help](https://www.meetup.com/help/)



## Define summary format

We want to summarize this page in a structured way so we are getting information in a consistent format. One of the cool things we can do with LMQL is use it to fill out the fields in a dataclass.

In [6]:

@dataclass_json
@dataclass
class PageInformation:
    page_summary: str
    concepts: str
    locations: str
    people: str
    organizations: str
    dates: str


## Prompt template

Now, we can prompt ChatGPT to create an instance of this class based on the information in the page

In [7]:
@lmql.query
async def summarize_page(markdown_text: str):
    '''lmql
        argmax(max_len=8192)
            "You are a content summarizer. Given a page in Markdown format, you will provide the requested information. Be concise but detailed."
            """This is the page content in markdown format:
            ```markdown
            {markdown_text}
            ```
            """
            "page_summary should be one or two sentences long.\n"
            "concepts should be a list of concepts and ideas from the page separated by commas.\n"
            "locations should be a list of locations on the page separated by commas.\n"
            "people should be a list of people on the page separated by commas.\n"
            "organizations should be a list of organizations on the page separated by commas.\n"
            "dates should be a list of dates on the page separated by commas.\n"
            "Structured: [PAGE_INFORMATION]\n" where type(PAGE_INFORMATION) is PageInformation

            "Summary: {PAGE_INFORMATION.page_summary}\n"
        from "openai/gpt-3.5-turbo"
    '''


In [8]:
lmql_response = await summarize_page(markdown_text=markdown_text)

In [11]:
pprint(lmql_response.variables['PAGE_INFORMATION'])

In [10]:
lmql_response.variables['PAGE_INFORMATION'].to_json()

'{"page_summary": "Join us for a discussion about Language Model Query Language (LMQL) on Tuesday, March 19, 2024 at Founder Central in Boulder.", "concepts": "Language Model Query Language, LMQL, programming language, large language models, LLMs, batch requests, parameter control, processing efficiency, fine-tuning, automation, efficiency, cross-model compatibility, natural language processing tasks", "locations": "Founder Central, Boulder, CO", "people": "Cameron Pope", "organizations": "Meaningly.com", "dates": "Tuesday, March 19, 2024"}'