Skip to content

Name Extraction with PHP Text Analysis

yooper edited this page Mar 2, 2017 · 1 revision

The name extraction functionality is provided by an sqlite database that was built using 2010 Census data and surnames provided by the SSN department. The class NameCorpus provides several calls to for helping to identify if a token is a first name or a last name also called surname.

In order to use this functionality you must run the following the following command.

php text console pta:package:install us_names

This command will download and unpackage the database for you. From there you can use the following commands to determine how valid the name is.

<?php
use TextAnalysis\Corpus\NameCorpus;

$corpus = new NameCorpus();

// returns a boolean, true if the name exists, the name is normalized to lower case internally
$corpus->isFirstName('Mike'));
$corpus->isLastName('Williamson');

$corpus->isFir

// returns a single record, but multiple records are available, because the underlying dataset has the frequency
// count of persons born with that name since 1915        
$corpus->getFirstName('Mike');

// $lastName is an array of data that has additional frequency counts and population statistics associated to
// the given last name
$lastName = $corpus->getLastName('Williamson');
var_dump($lastName);


// takes the first and last tokens and checks if the 1st token is a 1st name
// and if the last token is a last name
$corpus->isFullName('Brad Von Williamson')

// get the raw pdo connection so you can issue your own sql statements
$pdo = $corpus->getPdo();