# Entity extraction with Claude
Before you begin, make sure you've read through the included [README.md](README.md) file to:
1. Create an Anthropic Console account
2. Create an API key to use in this cookbook
3. Install the `anthropic` Python SDK

Once you've completed these steps, you'll be ready to complete this cookbook recipe. üë®‚Äçüç≥

## Streamlining customer service
In this cookbook recipe, we'll demonstrate how to use Claude to extract helpful information from (fictional) voicemail transcriptions. The voicemails are left by customers of a national bank with branches across the United States. We'll demonstrate how Claude can help extract important information from the voicemails that can help customer support quickly route their request to the most appropriate support agent.

## Set up the environment

First, confirm that you've installed the Anthropic Python SDK by running the following code:

In [None]:
!pip install anthropic

Next, load the `anthropic` library and set up the Anthropic `client`. Be sure to replace `"<your-api-key>"` with your actual API key created during the initial set up for this recipe.

In [1]:
import anthropic

client = anthropic.Anthropic(
    api_key="<your-api-key>",
)

## Define the extraction tool

To extract the information from our voicemail transcripts, define the `extract_entities` function which contains a prompt specific to this application to Claude. You should consider the following aspects of the prompt that you may want to change:
* **JSON output:** This prompt specifically asks for entities to be structured as JSON output, and prefills the Assistant message with `{` to guide the output in this direction. Consider if JSON is the most appropriate output for your project. For additional information on using different output formats, see [prefilling](https://docs.anthropic.com/en/docs/prefill-claudes-response) and [control output format](https://docs.anthropic.com/en/docs/control-output-format).
* **`model`:** This client is configured to use `"claude-3-sonnet-20240229"` by model default, which is Claude's maximum utility model that offers dependable and balanced outputs at a lower price. Other options may be more appropriate for your use case. [Learn more about how Opus, Sonnet, and Haiku compare](https://docs.anthropic.com/en/docs/models-overview#model-comparison).

You'll notice that the function takes in three arguments:
1. **`message`:** The text from which you wish to extract entities.
2. **`entities`:** A string which contains a list of the entities that you want Claude to prioritize.
3. **`properties`:** A string which contains a list of the properties that you're interested in knowing about for the extracted entities.

You can modify these inputs to match the type of text you're working with for your particular use case.

In [2]:
def extract_entities(message: str, entities: str, properties:str):
    prompt = f"""
    You are an information extraction system.
    You respond to each message with a JSON formatted summary of all entities in the message. 
    Each named entity appears as an entry in a JSON-formatted list.
    Each named entity in the JSON list should contain relevant properties of that entry.
    
    The types of entities that we are most interested in are:
    <entities>
        {entities}
    </entities>
    
    The properties of the entities we‚Äôre most interested in are: 

    <properties>
        {properties}
    </properties>

    Important: Only include entities that appear in the message. Only include properties that are relevant to each entity.

    Respond with just the formatted JSON summary of the following message:
    
    <message>
        {message}
    </message>
    """
    
    # Send the user and assistant messages to the API
    response = client.messages.create( 
        model="claude-3-sonnet-20240229",        
        max_tokens=4096, 
        temperature=0.0,
        messages=[{"role":"user", "content": prompt}, {"role":"assistant", "content": "{"}])
    
    # Extract the text response from the response from Claude
    result = response.content[0].text.strip()

    # Returns the JSON response, adding back in the { from the Assistant message
    # which wasn't returned as part of the output message, to ensure valid JSON
    return "{\n" + result

## Load the voicemails
Let's look at three voicemails that were received recently at the bank:

> "Hi, this message is for the customer support team at Woodlands Bank. My name is Michael Thompson. I'm calling today, June 14th, 2024 at 2:15 pm. I tried to make a transfer between my checking and savings account on your mobile app, but it gave me an error message and the transfer didn't go through. I bank at your West Ridge Mall branch location here in Atlanta. Please let me know how to resolve this issue. You can reach me at 404-555-7823. Thank you."

> "Good afternoon, this is Samantha Davis leaving a voicemail for Woodlands Bank customer service. I'm calling on Friday, June 14th at 4:45 pm. I recently received a new debit card in the mail to replace my expiring one. However, when I tried to activate it online, the website isn't accepting the new card number. I bank at your Downtown Denver branch location. Someone needs to look into this activation issue as soon as possible. My number is 720-555-0192 if you need any other information from me."

> "Hi there, Stephen Garcia here leaving a message for Woodlands Bank on Friday, June 14th around 3:45pm. I recently opened a new savings account at your bank and went to make my first deposit at the ATM, but the machine is showing my account is locked or restricted? I'm not sure what the issue is. My phone number is 917-555-4621 if you can please call me back about this. I bank at your Midtown Manhattan branch in New York City. I appreciate you looking into it."

Run the cell below to load them as the list `voicemails`.

In [3]:
with open('voicemails.txt', 'r') as raw_data:
    voicemails = raw_data.read().split('\n')

## Defining the parameters for extraction

Each voicemail contains some information about the person who called, some information about their local branch, a date (sometimes with a year) and time of their call, and a brief description of the issue they are having. We'd like to extract these entities from the text, and any accompanying details for each entity, so we define those entries and properties as comma separated values in the strings `entities_list` and `properties_list`.

In [4]:
entities_list = "person, bank, date, issue"
properties_list = "name, phone, email, city, state, type, branch location, day, month, time, issue type"

## Extracting entities

For each voicemail in the `voicemails.txt` file, the code below will use Claude to:
1. Extract the provided entities and properties using `extract_entities`.
2. Convert the output from `extract_entities` to a JSON object.
3. Print the original voicemail transcription, followed by the corresponding JSON object.

In [5]:
import json

for voicemail in voicemails:
    entities = extract_entities(voicemail, entities_list, properties_list)
    json_entities = json.loads(entities)
    print(voicemail)
    print(json.dumps( json_entities, indent=2))
    print('='*50)

"Hi, this message is for the customer support team at Woodlands Bank. My name is Michael Thompson. I'm calling today, June 14th, 2024 at 2:15 pm. I tried to make a transfer between my checking and savings account on your mobile app, but it gave me an error message and the transfer didn't go through. I bank at your West Ridge Mall branch location here in Atlanta. Please let me know how to resolve this issue. You can reach me at 404-555-7823. Thank you."
{
  "entities": [
    {
      "name": "Michael Thompson",
      "type": "person",
      "phone": "404-555-7823"
    },
    {
      "name": "Woodlands Bank",
      "type": "bank",
      "branch location": "West Ridge Mall, Atlanta"
    },
    {
      "day": 14,
      "month": 6,
      "year": 2024,
      "time": "2:15 pm",
      "type": "date"
    },
    {
      "type": "issue",
      "issue type": "error transferring between accounts on mobile app"
    }
  ]
}
"Good afternoon, this is Samantha Davis leaving a voicemail for Woodlands Bank

## Conclusion
Claude was successfully able to determine the key information from each voicemail transcription and organized them in a JSON object. This flexible generic prompt can effectively be adapted for different uses cases, however it does has some limitations. For increased accuracy for a specific application, you should look to incorporate information specific to your task into the prompt to provide additional context of your text and the desired structure of your output.

## If only...

If I had additional time to perfect this recipe I would have tried to:
* **Increase example variety:** It would have been great to showcase different use cases and output formats to highlight the flexibility that Claude can provide for entity extraction.
* **Improve prompt accuracy:** I would have liked to work alongside a prompt engineer to improve the prompt I used in this recipe, as it was a bit difficult to get this prompt to work consistently across all the sample voicemail transcriptions I generated.
* **Acknowledge limitations:** I also would have liked to show some of the limitations of this generic prompt, and how to overcome them with stronger prompting techniques.
* **Utilize advanced Claude features:** Lastly, I would have loved to showcase how you can use the [tool use feature of Claude to force a JSON output](https://docs.anthropic.com/en/docs/tool-use-examples#json-mode) as an extra section of the recipe. This additional complexity would have been a nice extension for more experienced users.