In [16]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [17]:
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

In [18]:
openai = OpenAI()

In [19]:
class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [20]:
# tom = Website("https://clearchoice.io")
# print(tom.title)
# print(tom.text)


In [21]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents \
of a website to identify and describe its products and services. \
For each product or service, provide a concise summary of its core \
capabilities and features. Ignore any navigation-related text or \
irrelevant content. Present the information in a well-structured \
Markdown format, using headings and bullet points where appropriate." 

In [22]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
Include a matrix of competitive products by name that compares each products capabilities with the website. Rate \
each products capabilities compared with the website.\
Include a summary depicting the best product, also include the urls for each product \
as well as any news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [23]:
#print(user_prompt_for(tom))

In [24]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [25]:
# Try this out, and then try for a few more websites

#messages_for(tom)

In [26]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [27]:
#summarize("https://docusign.com")

In [28]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))
    # Write the summary to a text file
    website = Website(url)
    output_file = f"{website.title}.md"
    try:
        with open(output_file, 'w', encoding='utf-8') as file:
            file.write(summary)
        print(f"Summary successfully written to {output_file}")
    except IOError as e: 
        print(f"An error occurred while writing to the file: {e}")

In [29]:
display_summary("https://sailpoint.com")

# Unified Identity Security Overview

The website focuses on unified identity security as a core aspect of modern enterprise security. SailPoint provides an identity security platform that addresses the growing need for secure access management and risk mitigation in various industries, emphasizing a holistic and intelligent approach.

## Core Products and Services

### Identity Security Cloud
- **Capabilities**: Provides real-time access management to critical data and applications.
- **Features**:
  - Automated, AI-driven processes for efficient identity management.
  - Integration across various data sources and applications.
  - Designed to centralize and unify identity security practices.

### IdentityIQ
- **Capabilities**: Software-based identity security solution.
- **Features**:
  - Cloud Infrastructure Entitlement Management
  - Non-Employee Risk Management
  - Data Access Security
  - Password Management
  - Access Risk Management
  - Machine Identity Security

### SailPoint Atlas
- **Capabilities**: Powers the Identity Security Cloud with foundational services.
- **Features**:
  - Enables complex identity processes to be managed seamlessly.
  - Facilitates onboarding and offboarding of identities.

### Solutions and Use Cases
- **Mitigate Cyber Risk**: Advanced strategies to reduce security vulnerabilities.
- **Improve IT Efficiency**: Automated provisioning and management tools.
- **Embrace Zero Trust**: Strategies designed to meet the zero trust security model.
- **Accelerate Onboarding**: Streamlined processes to decrease time to onboard.
- **Maintain Compliance**: Tools to ensure regulatory compliance with identity governance.

## Competitive Product Matrix

| Product              | Key Features                                                                 | Comparison Rating (1-5) |
|---------------------|-------------------------------------------------------------------------------|-------------------------|
| **SailPoint Identity Security Cloud** | Unified platform, AI capabilities, comprehensive identity lifecycle management | 5                       |
| **Okta**            | Single sign-on (SSO), Multi-factor authentication (MFA), API access management | 4                       |
| **Microsoft Azure AD** | Identity management, integration with Microsoft ecosystem, SSO         | 4                       |
| **IBM Security Verify** | AI-driven intelligence, identity governance, access management             | 3                       |
| **Ping Identity**   | Identity federation, SSO, API security                                       | 3                       |

### Summary of Best Product
**SailPoint Identity Security Cloud** emerges as the leading product in the competitive landscape due to its comprehensive capabilities in providing a unified and intelligent platform for identity security. Its advanced AI features, extensibility, and focus on real-time access management offer a competitive edge that is well-suited for modern enterprise challenges.

## News and Announcements
- **Partnership Announcement**: SailPoint and Imprivata have collaborated to deliver unified identity security and access management specifically tailored for healthcare organizations.
  
  - **Summary**: This partnership enhances SailPoint's offerings by integrating Imprivata's solutions, aimed at improving healthcare organizations' security posture by better managing user identities and access.

## URLs for Each Product
- **SailPoint Identity Security Cloud**: [SailPoint](https://www.sailpoint.com/)
- **Okta**: [Okta](https://www.okta.com/)
- **Microsoft Azure AD**: [Microsoft Azure](https://azure.microsoft.com/en-us/services/active-directory/)
- **IBM Security Verify**: [IBM Security](https://www.ibm.com/security/identity-and-access-management)
- **Ping Identity**: [Ping Identity](https://www.pingidentity.com/)

## Conclusion
SailPoint's commitment to identity security presents a robust solution for enterprises. With its market-leading Identity Security Cloud and strategic partnerships, it is well-positioned to address the complex demands of identity and access management in a rapidly evolving digital landscape.

Summary successfully written to Unified identity security: The core of your modern enterprise.md


In [30]:
#display_summary("https://anthropic.com")

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

PR instructions courtesy of an AI friend: https://chatgpt.com/share/670145d5-e8a8-8012-8f93-39ee4e248b4c