Skip to content

thomasgauvin/cv-knowledge-engine-accelerator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 

Repository files navigation

CV Knowledge Engine Solution Accelerator

About this accelerator

The CV Knowledge Engine solution accelerator aims to provides a quick way of creating an intelligent search engine, that enables searching and filtering through CV and resume documents. It leverages Knowledge Mining and Cognitive Services technologies, to extract all the valuable information and insights from the CV documents, like names, contact information, years of experience, skills and qualifications, ..etc. It also creates an intuitive, easy-to-navigate user interface, that allows for a full search experience with capabilities like searching with search terms, customized filters, and informative result cards. Moreover, the extracted information can also be exported to Power BI and used to create informative dashboards, to give a high level overview of the information extracted.

Business Use Case

The process of screening and searching through CV documents submitted by job seekers is a long and costly process, where the recruiters teams usually have to go through each submitted CV manually, to find the best candidates suited for the job opening. This accelerator will help extract all the essential information from CVs and resumes and simplify the process of searching and filtering through applicants, which will significantly reduce costs and time to isights.

Resources and Architecture

  • Storage Account
  • Cognitive Search
  • Cognitive Services
  • Azure Function
  • App Service

Architecture

Sample Documents

The sample documents used to demo this accelerator are 223 dummy CV documents, acquired from Resume Krafts website.

Extracted Information

The information and isights extracted from the CV documents can be grouped into three categories:

PII Information

  • Name
  • Email
  • Phone number
  • Location
  • LinkedIn

Professional Information

  • Years of experience
  • Qualifications
  • Languages
  • Organizations
  • Skills

Other Insights

  • Key phrases

Web App Interface

The first interface created to display the extracted insights is a website interface, that can be used to search and filter through the CV documents.

Home Page

Home Page

Search Results

Search Results

PowerBI Dashboard

COMING SOON

Deployemnt Process

Deploying the accelerator can be done in seven simple steps, that cover every aspect from deploying the resources, creating the search service elements, and conecting to the web interface.

Prerequisites

In order to deploy the accelerator, clone or download this repository, and make sure the following requirements are met:

  • Azure Subscription
  • Visual Studio 2019 or later
  • VS Code with Azure Functions extension
  • Sample CV documents
  • Postman

Step 0: Deploy the resources

Using the provided ARM template, create all the required Azure resources by clicking on this button:

Deploy to Azure

Ensure resources are properly deployed before continuing.

Step 1: Setup the Environemnt

Navigate to the newly created Storage Account in Azure, and upload the sample documents in a new blob container. The sample documents can be found in Assets/Sample Documents folder.

Next, navigate to the folder Assets/Postman Script to find the Postman collection that will be used to create the Search Service elements.

In Postman, import the collection and fill in the global variables with the proper values. To do that, click on the collection's name and navigate to the "Variables" tab. Modify the values in the "CURRENT VALUE" column according to the following table:

(Note that some resources have not been configured yet and will be deployed later. At this point, all variables should be filled in with the exception of variables from CUSTOM_SKILL_URL_ONE to LOOKUP_TABLE_URL_TWO which will be completed in Step 3.)

CURRENT VALUE Value to replace
<SEARCH_SERVICE_NAME> Name of Search Service
<SEARCH_SERVICE_ADMIN_KEY> Admin Key of Search Service
<COGNATIVE_SERVICE_KEY> Key of Cognitive Services
<STORAGE_ACCOUNT_NAME> Name of Storage Account
<STORAGE_ACCOUNT_CONTAINER_NAME> Name of Storage Container, create container in storage account
<STORAGE_CONTAINER_FOLDER_NAME> Name of Storage Folder, only if used, otherwise replace with empty space
<STORAGE_ACCOUNT_CONNECTION_STRING> Connection String of Storage Account
<CUSTOM_SKILL_URL_ONE> Azure Function URL of Text Extraction Skill
<CUSTOM_SKILL_URL_TWO> Azure Function URL of Years of Experience Skill
<LOOKUP_TABLE_URL_ONE> Lookup table URL of Qualifications
<LOOKUP_TABLE_URL_TWO> Lookup table URL of Languages
<DATASOURCE_NAME> Name of Datasource
<INDEX_NAME> Name of Index
<SKILLSET_NAME> Name of Skillset
<INDEXER_NAME> Name of Indexer

Step 2: Create the Datasource

In Postman, navigate to Create Datasource and run the request.

This will create a Datasource in the Search Service from the container that has the sample documents.

Step 3: Create the Skillset

Step 3a: Custom Skill

In VS Code, create an HTTP Trigger Azure Function in Python, and replace the code in the "init" file with the code provided in Assets/Function Script.

This process should be done twice to create two functions, one for Text Extraction and the other for Years of Experience.

In Text Extraction, make sure to add the values for the Cognitive Services Key and Endpoint in the script.

In the requirements.txt file, add "requests".

To deploy the function, you can follow the instructions provided in here: Develop Azure Functions by using Visual Studio Code.

After deploying both custom skill functions, we can procceed to create the Skillset.

Step 3b: Built-in Skills

For the "Custom Entity Lookup" skills, we need to provide the URL for the CSV lookup tables.

In the Storage Account, create a new Blob container (ie: "lookuptables"). Upload the two files in Assets/Lookup Tables to this container, and get their SAS URL to be used in the skill definition. After both URLs are provided, add them to your Postman variables for <LOOKUP_TABLE_URL_ONE> and <LOOKUP_TABLE_URL_TWO>.

In Postman, navigate to Create Skillset request and run the request. This will create a Skillstet in the Search Service that identifies all the information to be extracted from the CVs.

Step 4: Create the Index

In Postman, navigate to Create Index and run the request.

This will create an Index in the Search Service for the information to be extracted from the CVs as mentioned earlier.

Step 5: Create the Indexer

In Postman, navigate to Create Indexer and run the request.

This will create an Indexer in the Search Service that will exctract the defined information from the CVs.

Step 6: Create the Web App Interface

In Assets/Website Template, open the solution file "CognitiveSearch.Template.sln" in Visual Studio.

Navigate to the "appsettings.json" file, and change the values according to the following table:

Placeholder Value Value to replace
<SEARCH_SERVICE_NAME> Name of Cognitive Search Service
<SEARCH_SERVICE_KEY> Admin Key of Cognitive Search Service
<INDEX_NAME> Index Name in Search Service
<INDEXER_NAME> Indexer Name in Search Service
<STORAGE_ACCOUNT_NAME> Storage Account Name that stores the documents
<STORAGE_ACCOUNT_KEY> Storage Account Key
<CONTAINER_NAME> Container Name in Storage Account that stores the documents

You can test the website locally by running the solution in Visual Studio, or publish the website to Azure by following the instructions found here: Quickstart: Publish an ASP.NET web app.

Step 7: Create the PowerBI Dashboard

COMING SOON

References

This accelerator was inspired by the Knowledge Mining Solution Accelerator.

License

For all licensing information refer to LICENSE.

About

A Knowledge Mining Solution Accelerator that extracts information and insights from CV and Resume documents, to enable searching and filtering through job applicants.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CSS 81.3%
  • JavaScript 7.9%
  • C# 7.3%
  • HTML 2.6%
  • Python 0.9%