Skip to content

Commit

Permalink
feat: implement first version of search-gpt
Browse files Browse the repository at this point in the history
  • Loading branch information
tobiasbueschel committed Mar 4, 2023
0 parents commit 9df67ec
Show file tree
Hide file tree
Showing 12 changed files with 553 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
OPENAI_API_KEY=
GOOGLE_SEARCH_API_KEY=
GOOGLE_SEARCH_ID=
1 change: 1 addition & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
github: tobiasbueschel
18 changes: 18 additions & 0 deletions .github/workflows/release-please.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
on:
push:
branches:
- main
name: release-please
jobs:
release-please:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
steps:
- uses: google-github-actions/release-please-action@v3
id: release
with:
release-type: node
package-name: search-gpt
pull-request-title-pattern: "chore: release ${version}"
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node_modules/
.env
.DS_Store
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Changelog

## 1.0.0 (2023-03-03)


### Features

* implement first version of search-gpt ([d6f9e79](https://github.com/tobiasbueschel/search-gpt/commit/d6f9e79167b887bd81fac0ad5b228da4cfbe7cfe))
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<div align="center">
<br>
<a href="https://github.com/tobiasbueschel/search-gpt/">
<img alt="SearchGPT" src="logo.png" width="100" height="100">
</a>
<h1>SearchGPT</h1>
<p>
<b>Connecting the internet with ChatGPT</b>
</p>
<br>
<br>
</div>

You want to try ChatGPT with Internet connectivity so that you can ask about events beyond 2021, but don't have access to AI-enabled Bing and don't want to wait for Google's Bard? SearchGPT gives you this functionality today - it crawls the Internet for information and then feeds it back to ChatGPT.

![SearchGPT Demo](./demo.gif)

## Usage

The easiest way to get started with search-gpt is to run the following:

```sh
export OPENAI_API_KEY=<REPLACE>
export GOOGLE_SEARCH_API_KEY=<REPLACE>
export GOOGLE_SEARCH_ID=<REPLACE>

npx search-gpt
```

Alternatively, you can also run:

```sh
npm install --global search-gpt
```

Ensure you have your own [Google Search API key](https://developers.google.com/custom-search/v1/introduction), [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all) and [OpenAI API key](https://platform.openai.com/) before running the CLI.

Once the CLI starts, it will prompt you to enter a question. Simply type in your query, and the AI assistant will search the web and generate a response.

## How it works

This is a proof of concept and is far from a proper implementation (e.g., Microsoft's [Prometheus Model](https://techcrunch.com/2023/02/07/openais-next-generation-ai-model-is-behind-microsofts-new-search)) - I wanted to experiment how easy it could be to crawl certain search engines and then feed these results into a large language model (LLM) such as GPT 3.5. Apart from querying Google Search, one could also think about integrating other APIs to crawl data and then feed it into the LLM.

```mermaid
flowchart LR
A[User enters question] --> B[Search Google]
A --> C[Search Twitter, not implemented yet]
A --> D[Search other engines]
B --> E[Search results handed to ChatGPT]
E --> F[ChatGPT uses this context to provide an answer]
```

## License

This project is licensed under the [MIT license](./license).
Binary file added demo.gif
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
142 changes: 142 additions & 0 deletions index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
"use strict";
import { compile } from "html-to-text";
import readline from "readline";
import chalk from "chalk";
import fetch from "node-fetch";
import * as dotenv from "dotenv";
dotenv.config();

const { env } = process;
const OPENAI_API_KEY = env.OPENAI_API_KEY;
const GOOGLE_SEARCH_API_KEY = env.GOOGLE_SEARCH_API_KEY;
const GOOGLE_SEARCH_ID = env.GOOGLE_SEARCH_ID;

if (!OPENAI_API_KEY || !GOOGLE_SEARCH_API_KEY || !GOOGLE_SEARCH_ID) {
console.error(
"Please ensure you set up your .env file with the correct API keys"
);
process.exit(1);
}

const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});

const convert = compile({
preserveNewlines: false,
wordwrap: false,
// The main content of a website will typically be found in the main element
baseElements: { selectors: ["main"] },
selectors: [
{
selector: "a",
options: { ignoreHref: true },
},
],
});

// Store any previous chats
let previousChat = [];

async function startCli() {
rl.question(
chalk.bgHex("#00A67E").white("🧠 Ask me anything:") + " ",
async (userPrompt) => {
await searchGPT(userPrompt);
startCli();
}
);
}

async function searchGPT(userPrompt) {
previousChat = [
{
role: "system",
content: `You are my AI assistant and I want you to assume today is ${new Date().toDateString()}.`,
},
];

// Step 1: perform Google Search
// We crawl the first page returned in the Google Search as it often contains the result of the query.
// As a fallback, we also include all snippets from the other search result pages in case the answer is not
// included in the first page already.
const searchResults = await getGoogleSearchResults(userPrompt);
const [firstpage, ...remainingPages] = searchResults.items;
const urlToCheck = firstpage.link;

// Fetch raw HTML of first page & get main content
const htmlString = await fetch(urlToCheck);
let context = convert(await htmlString.text());

// Get all Google Search snippets, clean them up and add to the text
context += remainingPages
.reduce((allPages, currentPage) => `${allPages} ${currentPage.snippet}`, "")
.replaceAll("...", " "); // Remove "..." from Google snippet results;

// Note: we must stay below the max token amount of OpenAI's API.
// Max token amount: 4096, 1 token ~= 4 chars in English
// Hence, we should roughly ensure we stay below 10,000 characters for the input
// and leave the remaining the tokens for the answer.
// - https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
// - https://platform.openai.com/docs/api-reference/chat/create
context = context
.replaceAll("\n", " ") // Remove any new lines from raw HTML of first page
.trim()
.substring(0, 10000);

// Provide OpenAI with the context from the Google Search
previousChat.push({
role: "assistant",
content: context,
});

// Step 2: feed search results into OpenAI and answer original question
previousChat.push({
role: "user",
content: `With the information in the assistant's last message, answer this: ${userPrompt}`,
});

const finalResponse = await getOpenAIChatCompletion(previousChat);
console.log(chalk.green("> ") + chalk.white(finalResponse));

console.log(chalk.dim(`> Know more: ${urlToCheck}` + "\n"));

return finalResponse;
}

async function getGoogleSearchResults(searchTerm) {
try {
const response = await fetch(
`https://www.googleapis.com/customsearch/v1\?key\=${GOOGLE_SEARCH_API_KEY}\&cx=${GOOGLE_SEARCH_ID}\&q\=${searchTerm}`
);

const data = await response.json();
return data;
} catch (error) {
console.error(error);
}
}

async function getOpenAIChatCompletion(previousChat) {
try {
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: "gpt-3.5-turbo",
messages: previousChat,
}),
});

const { choices } = await response.json();
return choices[0].message.content;
} catch (error) {
console.error(error);
}
}

startCli();
9 changes: 9 additions & 0 deletions license
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
MIT License

Copyright (c) Tobias Büschel (github.com/tobiasbueschel)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Binary file added logo.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9df67ec

Please sign in to comment.