# Parse Resumes with Tensorlake
*Author: [Antaripa Saha](https://x.com/doesdatmaksense)*

In this example you will learn, how to use extract structured data when parsing candidate resumes using Tensorlake DocumentAI. To learn more about Resume Parsing [check out the Tensorlake docs](https://docs.tensorlake.ai/use-cases/hr-and-onboarding/resume-parsing)

Resume parsing is crucial in modern hiring workflows where recruiters deal with hundreds or thousands of resumes.
Automating the extraction of key information (skills, experience, education) saves time and enables efficient candidate screening.
It also powers recommendation engines, applicant tracking systems (ATS), and helps maintain structured databases of talent profiles.

# Why Traditional Resume Parsers Struggle

Resume parsing sounds simple until you come across the real-world diversity of resumes. Here's a quick overview of why conventional tools often fail.

**Format Variety**

| Resume Type           | Format Description                        | Common Issues                                  |
|------------------------|--------------------------------------------|------------------------------------------------|
| Creative Designer      | PDF with columns, graphics, custom fonts   | Can't parse columns, OCR fails on fonts        |
| Standard Corporate     | Clean Word doc                             | Date inconsistencies, skills in text blocks     |
| Academic CV            | LaTeX, multi-page PDF                      | Complex tables, publication lists               |
| International Resume   | Non-US formats                             | Date/address confusion, terminology mismatch    |
| Scanned Resume         | Image-based PDFs                           | OCR errors, layout detection issues             |

Even clean resumes can trip up parsers due to inconsistent phrasing or document structure.



## Step 0: Prerequisites

1. Install the [Tensorlake SDK](https://pypi.org/project/tensorlake/)
2. Import necessary packages
3. Set your [Tensorlake API Key](https://docs.tensorlake.ai/platform/authentication)

**Note:** Learn more with the [Tensorlake docs](https://docs.tensorlake.ai/).

In [None]:
!pip install -q --upgrade tensorlake

In [None]:
# Import libraries
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import (
    ParsingOptions,
    StructuredExtractionOptions,
    ParseStatus,
    ChunkingStrategy
)
import time
import json

In [None]:
%env TENSORLAKE_API_KEY=your_tensorlake_api_key

## Step 1: Specify Structured Data Extraction

Create a simple JSON schema to specify what structured data you want extracted from the document

In [None]:
# JSON schema to extract relevant data from resumes in a structured format
structured_schema = {
  "title": "ResumeInfo",
  "type": "object",
  "properties": {
    "candidateName":   { "type": "string" },
    "email":           { "type": "string" },
    "phone":           { "type": "string" },
    "address":         { "type": "string" },

    "professionalSummary": { "type": "string" },
    "skills": {
      "type": "array",
      "items": { "type": "string" }
    },
    "workExperience": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "jobTitle":     { "type": "string" },
          "companyName":  { "type": "string" },
          "location":     { "type": "string" },
          "startDate":    { "type": "string" },
          "endDate":      { "type": "string" },
          "description":  { "type": "string" }
        }
      }
    },
    "education": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "degree":        { "type": "string" },
          "fieldOfStudy":  { "type": "string" },
          "institution":   { "type": "string" },
          "location":      { "type": "string" },
          "graduationDate":{ "type": "string" }
        }
      }
    }
  }
}

## Step 2: Parse the Document
To use the Tensorlake Python SDK, you need to:

1. Create a Tensorlake Client
2. Specify a file, in this a candidate's resume file
3. Specify Parsing Options, including page numbers and a chunking strategy for parsing
4. Specify Structured Extraction Options, which need to include a schema name and schema at minimum.
5. Initiate the parsing job and wait until it compeltes successfully

In [None]:
# Create a Tensorlake Client, this will reference the `TENSORLAKE_API_KEY` environment variable you set above
doc_ai = DocumentAI()

# Reference to a resume that you want to parse
file_path = 'https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/jakes-resume.pdf'

# Configure parsing with structured schema
parsing_options = ParsingOptions(
    chunking_strategy=ChunkingStrategy.PAGE
)

structured_extraction_options = StructuredExtractionOptions(
    schema_name="Candidate Resume",
    json_schema=structured_schema  # schema for structured extraction
)

# Parse the document with the specified extraction options for structured data
parse_id = doc_ai.parse(file_path, parsing_options=parsing_options, structured_extraction_options=[structured_extraction_options])

print(f"Parse job submitted with ID: {parse_id}")

# Wait for completion
result = doc_ai.wait_for_completion(parse_id)

Parse job submitted with ID: parse_mLpNjGGr8NHFkTzfrGgfW
waiting 5 s…
parse status: processing
waiting 5 s…
parse status: processing
waiting 5 s…
parse status: processing
waiting 5 s…
parse status: processing
waiting 5 s…
parse status: processing
waiting 5 s…
parse status: successful


# Understanding Tensorlake Parsing Output

In one single DocumentAI API call, Tensorlake returns both the full markdown content of the document and the structured data in JSON format.


## Review the Structured Data Output

In [None]:
# Print the structured data output
print(json.dumps(result.structured_data[0].data, indent=2))

{
  "address": null,
  "candidateName": "Jake Ryan",
  "education": [
    {
      "degree": "Bachelor of Arts in Computer Science, Minor in Business",
      "fieldOfStudy": "Computer Science",
      "graduationDate": "May 2021",
      "institution": "Southwestern University",
      "location": "Georgetown, TX"
    },
    {
      "degree": "Associate's in Liberal Arts",
      "fieldOfStudy": "Liberal Arts",
      "graduationDate": "May 2018",
      "institution": "Blinn College",
      "location": "Bryan, TX"
    }
  ],
  "email": "jake@su.edu",
  "phone": "123-456-7890",
  "professionalSummary": null,
  "skills": [
    "Java",
    "Python",
    "C/C++",
    "SQL (Postgres)",
    "JavaScript",
    "HTML/CSS",
    "R",
    "React",
    "Node.js",
    "Flask",
    "JUnit",
    "WordPress",
    "Material-UI",
    "FastAPI",
    "Git",
    "Docker",
    "TravisCI",
    "Google Cloud Platform",
    "VS Code",
    "Visual Studio",
    "PyCharm",
    "IntelliJ",
    "Eclipse",
    "pandas",
  

## Review the Markdown Chunks Output

In [None]:
# Get the markdown from extracted data
for index, chunk in enumerate(result.chunks):
    print(f"Chunk {index}:")
    print(chunk.content)

Chunk 0:
Jake Ryan
123-456-7890 | jake@su.edu | linkedin.com/in/jake | github.com/jake

## EDUCATION


<table>
<tr>
<th>Southwestern University</th>
<th>Georgetown, TX</th>
</tr>
<tr>
<td>Bachelor of Arts in Computer Science, Minor in Business</td>
<td>Aug. 2018 - May 2021</td>
</tr>
<tr>
<td>Blinn College</td>
<td>Bryan, TX</td>
</tr>
<tr>
<td>Associate's in Liberal Arts</td>
<td>Aug. 2014 - May 2018</td>
</tr>
</table>


## EXPERIENCE


### Undergraduate Research Assistant

Texas A&M University
June 2020 - Present College Station, TX
· Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems
· Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data
· Explored ways to visualize GitHub collaboration in a classroom setting

### Information Technology Support Specialist Southwestern University

Sep. 2018 - Present Georgetown, TX
· Communicate with managers to set up campus computers used on campus
· A

# Next Steps

Now that you have the basics down, check out one of these other resources to dive deeper into document parsing with Tensorlake:
- [Python SDK and API Docs](https://docs.tensorlake.ai/)
- [Blog](https://tensorlake.ai/blog)
- [YouTube Channel](https://tensorlake.ai/blog)
- [Community Slack](https://tensorlakecloud.slack.com/)