Skip to content

uprush/reader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Reader

Reader reads documents and extract important keywords in real-time. Here is an example chart created by Reader. It shows the top 25 most important keywords from Jeff's blog. Reader computes keyword scores just after a document is uploaded to S3 using AWS Lambda.

Example Reader Chart

Technologies

  • TF.IDF (Term Frequency times Inverse Document Frequency) *1
  • AWS Services
    • S3 notification
    • AWS Lambda
    • DynamoDB

Architecture

The architecture leverages AWS managed services. Zero server / EC2 instance required to run the application.

  • Clients send text document to S3
  • S3 notification triggers Lambda function called Reader
  • Reader gets text from S3, calculate TF
  • Reader gets IDF from DynamoDB
  • Reader updates DynamoDB with new IDF
  • Reader extracts important keywords using TFIDF
  • Reader saves Top 25 keywords and stores into DynamoDB
  • Reader-dashboard get keywords from DynamoDB and draw the charts

Reader Architecture

Code

Sample code on Github:

Sample AWS Lambda metrics: AWS Lambda Metrics

*1 IDFi = log2(N/ni). Term exsistance data in other documents is required by IDF calculation, which is not implemented in this sample. The idea is to use DynamoDB to store the data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published