SmartDocAnalyzer

SmartDoc Website

Login to the website or sign up

Dashboard once you login

Upload a file, and view your upload history

Run your Analysis

Choose summary mode

Choose paragraph mode

Search by specific paragraph index or keyword

Settings (Zoomed out)

Activate virtual envrionment

Type into root directory .\smartDoc\Scripts\activate

Requirements in Requirments.txt

These requirement are only for the flask API and are found in requirements.txt pip freeze

Start Application

Frontend Application (React)

Go to the frontend directory and type npm start

Backend Application (Flask API)

-Go to root directory and type flask run or python run.py

Project Architecture

Frontend is all in the frontend directory `frontend/`

SRC directory contains all the source code
package-lock.json & package.json link the source code to the dependencies
node-modulescontains all the dependencies downloaded for the frontend

Backend is parts of the root and app directoy `app/`

run.py is the python file in the root directory and used to start API
app directory countains all the code for the API
User and document metadata is stored in MongoDB
Document andimage files are stored using the Dropbox API

API Routes `app/routes.py`

/users

Get all the users in the collection

/user/create

Create a user, save user metadata, and a folder for storage

/user/login

Verifies the information input by user is correct

/user/<username>

Get metadata information about specific user

/user/<username>/upload

Upload a file to mongoDB and storage API

/user/<username>/<filename>/download

Download a file from the cloud locally

/documents

Get all documents in the collection

/user/<username>/documents

Get all the documents from a specific user

/document/<filename>

Get file by filename

/user/<username>/<filename>/text

Extract all the text from a file

/user/<username>/<filename>/summary

Recieve the summary, sentiment, and keywords of whole file

/user/<username>/<filename>/paragraphs

Recieve the summary, sentiment, and keywords for each paragraph in file

/user/<username>/<filename>/paragraph/<int:paragraph_number>

Recieve the summary, sentiment, and keywords for a respective paragraph

/user/<username>/<filename>/delete

Delete the file from the document storage and metadata linked

/user/<username>/<filename>/paragraphs/<keyword>

Recieve the summary, sentiment, and keywords for all paragraph with matching keyword

/user/<username>/<filename>/<keyword>/definition

Recieve the definition of the respective keyword

/user/<username>/delete_all_files

Delete all the files and metadata connected to the user

/user/<username>/delete_all

Delete all information and documents by the user

Test the API

test/api_test

pytest_api.py test the api using pytest and start by running pytest pytest_api.py
test_api.py test the api using pythons unit test by running python -m unittest discover -s tests

Containers

Frontend

To build the container:

cd frontend
docker build -t my-frontend:latest . To run the container:
docker run -d -p 3000:3000 my-frontend:latest

Backend

To build the container:

docker build -t my-backend:latest . To run the container:
docker run -d -p 5000:5000 my-backend:latest

Build and Run both together

-docker-compose up -d

Task Queue Implementation (No longer implemented)

To run a celery work, make sure application is already running and type celery -A app.celery worker --loglevel=info

Project Expectations

Text summary and NLP

For the text summarization, keywords, and sentiment, fofor text from images and documents i utlized an array of open source libraries. This was due to the restrictions placed by google and OpenAI for their API's
For extracting text from PDFs PyPDF2 & python-docx was used
For extracting text from images pytesseract & TESSERACT was used
For summarizing text, Spacy was used
For sentiment analyzes of text, nltk was used
For keywords identification of text, scikit-learn was used

Guidelines

I should login to a secure service to upload my content
I should be able to upload documents, PDFs or images. The application should translate my documents to text
I want the service to tag all my documents and paragraphs within every document with the keywords and know the topics each document cover
I should be able to access different paragraphs of different documents based on keywords
I should be able to to find all positive, neutral and negative paragraphs and sentences-
Keywords within paragraphs should be searchable in government opendata, wikipedia and media organizations, e.g., NYTimes
I should find definition of keywords using open services (e.g., OpenAI)
I should be able to get summaries of each document
I want to discover content from the WEB to enhance story
I want to know all names, locations, institutions and address in my documents.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
app		app
frontend		frontend
images		images
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SmartDocAnalyzer

SmartDoc Website

Login to the website or sign up

Dashboard once you login

Upload a file, and view your upload history

Run your Analysis

Choose summary mode

Choose paragraph mode

Search by specific paragraph index or keyword

Settings (Zoomed out)

Activate virtual envrionment

Requirements in Requirments.txt

Start Application

Frontend Application (React)

Backend Application (Flask API)

Project Architecture

Frontend is all in the frontend directory `frontend/`

Backend is parts of the root and app directoy `app/`

API Routes `app/routes.py`

Test the API

Containers

Frontend

Backend

Build and Run both together

Task Queue Implementation (No longer implemented)

Project Expectations

Text summary and NLP

Guidelines

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Ruben304/SmartDocAnalyzer

Folders and files

Latest commit

History

Repository files navigation

SmartDocAnalyzer

SmartDoc Website

Login to the website or sign up

Dashboard once you login

Upload a file, and view your upload history

Run your Analysis

Choose summary mode

Choose paragraph mode

Search by specific paragraph index or keyword

Settings (Zoomed out)

Activate virtual envrionment

Requirements in Requirments.txt

Start Application

Frontend Application (React)

Backend Application (Flask API)

Project Architecture

Frontend is all in the frontend directory frontend/

Backend is parts of the root and app directoy app/

API Routes app/routes.py

Test the API

Containers

Frontend

Backend

Build and Run both together

Task Queue Implementation (No longer implemented)

Project Expectations

Text summary and NLP

Guidelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Frontend is all in the frontend directory `frontend/`

Backend is parts of the root and app directoy `app/`

API Routes `app/routes.py`

Packages