Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TCB 60 about gen ai functions #659

Merged
merged 2 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
163 changes: 163 additions & 0 deletions _episodes/60.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
layout: episode
title: "60: Trino calling AI"
date: 2024-05-22
tags: trino
youtube_id: "iAMOkhkg44A"
wistia_id: "l9eu7zapfb"
sections:
- title: "Welcome"
time: 0:00
- title: Trino releases
desc: Trino 446, 447, and 448
time: 1:44
- title: Trino Gateway 8 and 9
time: 5:20
- title: Jan Was a subproject maintainer
time: 7:39
- title: Cole's recaptures Iceberg Summit
time: 8:39
- title: Introduction of Isa Inalcik and BestSecret
time: 12:17
- title: Trino at BestSecret and Trino functions
time: 14:12
- title: LLM use cases
time: 15:43
- title: Trino Fest news
time: 57:19
---

## Hosts

* [Manfred Moser](https://www.linkedin.com/in/manfredmoser), Director of Trino
Community Leadership at [Starburst](https://starburst.io),
([@simpligility](https://twitter.com/simpligility))
* [Cole Bowden](https://www.linkedin.com/in/cole-m-bowden), Developer Advocate
at [Starburst](https://starburst.io)

## Guest

* [Isa Inalcik](https://www.linkedin.com/in/isainalcik/), Principal Data
Engineer at [BestSecret Group](https://bestsecret.com/)

## Releases and news

[Trino 446](https://trino.io/docs/current/release/release-446.html)

* Add support for the Snowflake catalog in the Iceberg connector.
* Add support for reading S3 objects restored from Glacier storage in Hive
connector.
* Add support for unsupported type handling configuration in the Snowflake
connector.

[Trino 447](https://trino.io/docs/current/release/release-447.html)

* Add support for `SHOW CREATE FUNCTION`.
* Require Java 22.
* Add support for concurrent `DELETE` and `TRUNCATE` in the Delta Lake
mosabua marked this conversation as resolved.
Show resolved Hide resolved
connector.
* Remove support for Phoenix 5.1.x and earlier.

[Trino 448](https://trino.io/docs/current/release/release-448.html)

* Improve performance of reading from Parquet files.
* Add support for caching Glue metadata with the update to use the V2 REST
interface.

[Trino Gateway 8 and 9](https://trinodb.github.io/trino-gateway/release-notes/)

* Add support support for configurable router policies with two new policies available.
* Add a Helm chart for deployment.
* Add new website.

We also had a new Trino Helm chart release 0.20.0.

[Jan Waś](https://github.com/nineinchnick) is now also
[subproject maintainer](https://trino.io/development/roles#subproject-maintainers) of the
[go client](https://github.com/trinodb/trino-go-client) and the
[Helm charts](https://github.com/trinodb/charts).

## Impressions from the Iceberg Summit

mosabua marked this conversation as resolved.
Show resolved Hide resolved
Last week, Cole attended the [Iceberg Summit](https://iceberg-summit.org/) with
a special Trino perspective, and we chat about his impressions and major
take-aways.

## Guest Isa Inalcik from BestSecret

Isa is a highly skilled data expert with over a decade of hands-on experience in
software development lifecycle. He is well versed with many data tools including
Trino/Starburst Enterprise Platform, Snowflake, Airflow, Apache Spark, Hive,
Apache Iceberg, dbt, and others.

## Trino at BestSecret

At BestSecret, a leading online retailer for fashion and lifestyle in Europe,
Isa spearheads the development of efficient and resilient ELT/ETL pipelines and
the implementation of data and AI-driven solutions. We chat in more details
about their setup and use cases, his solutions, and challenges he is facing.

## Generative AI interest and use cases

Isa has been following the waves of interest in AI and sees the following use
cases related to data and Trino:

* Media (Audio,Video,Image): Extract information out of images.
* Object categorization: Categorize objects on images, videos.
* Data masking: For anonymizing sensitive data from unstructured text.
* Data extraction: To pull structured information from unstructured text.
* Sentiment analysis: For gauging the sentiment of textual data.
* Language detection or translation: For language detection or translating.
* Summarization: To generate concise summaries from lengthy texts.

This inspired him to try an integration of the new emerging LLMs with Trino.

## Trino SPI

Trino uses a service provider interface (SPI) to allow developers to create
plugins for features such as connectors, security integrations and custom
functions. This is crucial for business to implement required functionality and
enabled Isa to work on a plugin to support custom functions that call LLMs.

The OpenAI API specification also allowed him to create one function that can be
used with different LLM backends.

## Proof of concept and demo

We look at the concept and implementation that Isa developed with the following
architecture:

<img src="{{site.baseurl}}/assets/episode/60/trino-ai-architecture.png"/>

The [trino-ai repository](https://github.com/alaturqua/trino-ai) contains source
code and more details as mentioned in his post on
[LinkedIn](https://www.linkedin.com/posts/isainalcik_trino-trino-llama3-activity-7187411736587587584-e2WW/).

## Other resources

* Post from Isa: [Maximize Performance: The Secret to Scaling Trino Clusters with KEDA](https://www.linkedin.com/pulse/maximize-performance-secret-scaling-trino-clusters-isa-inalcik-ffo5e/)
* Post from Isa: [Enhancing Security and Observability in Trino with Open Policy Agent and OpenTelemetry](https://www.linkedin.com/pulse/enhancing-security-observability-trino-open-policy-agent-isa-inalcik-zhl9e)

## Rounding out

Trino Fest news:

* [Finalized speaker lineup announced]({% post_url 2024-05-08-trino-fest-lineup-finalized %})
* [Register for event and hotel now](https://www.starburst.io/info/trino-fest-2024/?utm_medium=trino&utm_source=website&utm_campaign=Global-FY25-Q2-EV-Trino-Fest-2024&utm_content=banner)
* Special thanks to our Trino Fest sponsors - Starburst as event host and
Alluxio, Cloudinary, Onehouse, Startree, and Upsolver as event sponsors.
* Contact us to join the Trino Contributor Congregation the next day.

Other topics:

* Trino Contributor Call on 23rd of May.
* Check out upcoming [Trino Community Broadcast episodes and other events]({{site.url}}/community.html#events).

If you want to learn more about Trino, check out the definitive guide from
O'Reilly. You can get [the free PDF from
Starburst](https://www.starburst.io/info/oreilly-trino-guide/) or buy the
[English, Polish, Chinese, or Japanese
edition]({{site.url}}/trino-the-definitive-guide.html).

Music for the show is from the [Megaman 6 Game Play album by Krzysztof
Slowikowski](https://krzysztofslowikowski.bandcamp.com/album/mega-man-6-gp).
Binary file added assets/episode/60/trino-ai-architecture.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 0 additions & 7 deletions broadcast/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,6 @@ interesting developments in the ecosystem around Trino.

<dl>

<dt>22 May 2024: Trino Community Broadcast 60 - AI functions</dt>
<dd><a href="https://www.linkedin.com/in/isainalcik/">Isa Inalcik</a>
chats with us a bit about his work with Trino at
<a href="https://bestsecret.com/">BestSecret</a> before we dive into
his exploration of creating Trino functions that use generative AI
systems.</dd>

<dt>20 June 2024: Trino Community Broadcast 61 - Power BI client</dt>
<dd><a href="https://www.linkedin.com/in/patrick-pichler/">Patrick Pichler</a>
joins us to talk about his small project of a PowerBI client for Trino
Expand Down