# Loading CSV Data

In [2]:
from langchain.document_loaders import CSVLoader

In [3]:
loader = CSVLoader("extras/01-Data-Connections/some_data/penguins.csv")

In [4]:
data = loader.load()

In [5]:
type(data)

list

In [6]:
type(data[0])

langchain_core.documents.base.Document

In [7]:
data[0]

Document(page_content='species: Adelie\nisland: Torgersen\nbill_length_mm: 39.1\nbill_depth_mm: 18.7\nflipper_length_mm: 181\nbody_mass_g: 3750\nsex: MALE', metadata={'source': 'extras/01-Data-Connections/some_data/penguins.csv', 'row': 0})

In [8]:
print(data[0].page_content)

species: Adelie
island: Torgersen
bill_length_mm: 39.1
bill_depth_mm: 18.7
flipper_length_mm: 181
body_mass_g: 3750
sex: MALE


In [9]:
print(data[1].metadata)

{'source': 'extras/01-Data-Connections/some_data/penguins.csv', 'row': 1}


# Loading HTML Pages

In [11]:
from langchain.document_loaders import BSHTMLLoader

In [12]:
loader1 = BSHTMLLoader("extras/01-Data-Connections/some_data/some_website.html")

In [13]:
data1 = loader1.load()

In [14]:
data1

[Document(page_content='Heading 1', metadata={'source': 'extras/01-Data-Connections/some_data/some_website.html', 'title': ''})]

# Load Python

In [16]:
from langchain.document_loaders import PyPDFLoader

In [17]:
loader2 = PyPDFLoader("extras/01-Data-Connections/some_data/SomeReport.pdf")

In [18]:
data2 = loader2.load()

In [35]:
data2

[Document(page_content='This\nis\nthe\nfirst\nline\nPDF.\nThis\nis\nthe\nsecond\nline\nin\nthe\nPDF.\nThis\nis\nthe\nthird\nline\nin\nthe\nPDF.', metadata={'source': 'extras/01-Data-Connections/some_data/SomeReport.pdf', 'page': 0})]

In [45]:
print(data2[0].page_content.replace('\n', ' '))

This is the first line PDF. This is the second line in the PDF. This is the third line in the PDF.


# External Integration

In [48]:
from langchain.document_loaders import HNLoader

In [61]:
loader3 = HNLoader("https://news.ycombinator.com/item?id=39217149")

In [63]:
data3 = loader3.load()

In [69]:
print(data3[0].page_content)

schacon 7 hours ago  
             | next [–] 

For better or worse, my experience as a GitHub cofounder and author of several Git books (Pro Git, etc) is that the Git commit message is a unique vector for code documentation that is highly sub-optimal.The main issue is that most of the tooling (in Git or GitHub or whatever) generally only shows the first line. So in the case of this commit example would be the very simple message of a generic "US-ASCII error" problem. Everything they talk about in this article is what is great about the _rest_ of the commit message, which, given modern tools, is _almost never_ seen by anyone.The main problem is that Git was built so that the commit message is the _email body_, meant to be read by everyone in the project. But for better or worse, that is not generally the role of this text today. Almost nobody ever sees it. Unless it's discussed in a bunch of patch series over a mailing list, nobody reads anything other than the first 50 chars of the he

In [71]:
print(data3[0].metadata)

{'source': 'https://news.ycombinator.com/item?id=39217149', 'title': 'My favourite Git commit (2019)'}


In [75]:
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain.schema import AIMessage, HumanMessage, SystemMessage

# Set OpenAI API key and create LLM and Chat LLM. Note that key can be stored in a separate file or as an environment variable. Refer to docs.
api_key = open('./openai_key.txt').read()
llm = OpenAI(openai_api_key=api_key)
chat = ChatOpenAI(openai_api_key=api_key)

In [77]:
template = 'Please give a short summary of the following hacker new \n{comment}'
human_prompt = HumanMessagePromptTemplate.from_template(template)

In [79]:
chat_prompt = ChatPromptTemplate.from_messages([human_prompt])

In [81]:
resp = chat(chat_prompt.format_prompt(comment=data3[0].page_content).to_messages())

  warn_deprecated(


In [85]:
print(resp.content)

The author, who is a GitHub co-founder and author of Git books, argues that Git commit messages are not an effective form of code documentation. The main issue is that most tooling only displays the first line of the message, while the rest of the message, which contains valuable information, is rarely seen. The author believes that Git was designed for the commit message to be read by everyone in the project, but in practice, very few people actually read it. The author also highlights the difficulty of finding and accessing the full commit message, even for experienced Git users. They express frustration with this limitation and suggest that writing detailed commit messages is often a waste of time.
