-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Word Cloud #18
Comments
Hello, sorry for my delayed response. To make the word cloud you will need D3 word cloud, https://github.com/jasondavies/d3-cloud As for the PHP code, here is what I have done in the past ... use TextAnalysis\Analysis\Keywords\Rake;
use TextAnalysis\Documents\TokensDocument;
use TextAnalysis\Tokenizers\WhitespaceTokenizer;
use StopWordFactory;
use TextAnalysis\Filters;
class WordCloud
{
const NGRAM_SIZE = 3;
/**
* @var \TextAnalysis\Interfaces\ITokenTransformation[]
*/
protected $tokenFilters = [];
/**
* @var \TextAnalysis\Interfaces\ITokenTransformation[]
*/
protected $contentFilters = [];
/**
* The keyword scores are not setup in a compatible way with
* what D3 cloud expects
* @param array $keywordScores
*/
public function getScaledScores($keywordScores)
{
$scaleFactor = 1 / array_sum(array_values($keywordScores));
array_walk($keywordScores,
function(&$value, $key) use ($scaleFactor){
$value = round($value * $scaleFactor, 5);
});
return $keywordScores;
}
/**
*
* @return \TextAnalysis\Interfaces\ITokenTransformation[]
*/
public function getContentFilters()
{
if(empty($this->contentFilters)) {
$lambdaFunc = function($word){
return preg_replace('/[^[:print:]]/', ' ', $word);
};
$this->contentFilters = [
new Filters\StripTagsFilter(),
new Filters\LowerCaseFilter(),
new Filters\NumbersFilter(),
new Filters\EmailFilter(),
new Filters\UrlFilter(),
new Filters\PossessiveNounFilter(),
new Filters\QuotesFilter(),
new Filters\PunctuationFilter(),
new Filters\CharFilter(),
new Filters\LambdaFilter($lambdaFunc),
new Filters\WhitespaceFilter()
];
}
return $this->contentFilters;
}
/**
*
* @return \TextAnalysis\Interfaces\ITokenTransformation[]
*/
public function getTokenFilters()
{
if(empty($this->tokenFilters)) {
$stopwords = StopWordFactory::get('stop-words-fox.txt');
$this->tokenFilters = [
new Filters\StopWordsFilter($stopwords),
];
}
return $this->tokenFilters;
}
/**
*
* @param string $content
* @return array
*/
public function getKeywordScores($content)
{
$tokens = (new WhitespaceTokenizer())->tokenize($content);
$tokenDoc = new TokensDocument(array_map('strval', $tokens));
unset($tokens);
foreach($this->getTokenFilters() as $filter)
{
$tokenDoc->applyTransformation($filter, false);
}
// will return null values in an array
$size = count($tokenDoc->toArray());
if($size < self::NGRAM_SIZE || !array_filter($tokenDoc->toArray())) {
return [];
}
$rake = new Rake($tokenDoc, self::NGRAM_SIZE);
return $rake->getKeywordScores();
}
}
$cloud = new WordCloud();
$scores = $cloud->getKeywordScores("YOUR CONTENT GOES HERE")
// scales the scores for the D3 cloud library
$scaledScores = $cloud->getScaledScores($scores); You must use $scaledScores with the D3 cloud library. Sorry for the incomplete example. Please post your completed solution and I will use it to update the documentation. |
No problem, thank you for this! I'll report back after I try this. I did get a working prototype with jQCloud and the |
Sounds good. I am closing this issue. |
Saw this package and noticed on the wiki page it mentions building a word cloud, but the page is empty. https://github.com/yooper/php-text-analysis/wiki/PHP-Keyword-Phrases-Word-Cloud
How could I potentially go about building a word cloud with this package?
Thanks!
The text was updated successfully, but these errors were encountered: