Skip to content

Ruben304/SmartDocAnalyzer

Repository files navigation

SmartDocAnalyzer

SmartDoc Website

Login to the website or sign up

Login Signup

Dashboard once you login

Dashboard DashboardLower

Upload a file, and view your upload history

Documents

Run your Analysis

Choose summary mode

AnlysisOne AnlysisTwo

Choose paragraph mode

AnlysisThree

Search by specific paragraph index or keyword

AnalysisFour

Settings (Zoomed out)

Settings

Activate virtual envrionment

  • Type into root directory .\smartDoc\Scripts\activate

Requirements in Requirments.txt

  • These requirement are only for the flask API and are found in requirements.txt pip freeze

Start Application

Frontend Application (React)

  • Go to the frontend directory and type npm start

Backend Application (Flask API)

-Go to root directory and type flask run or python run.py

Project Architecture

Frontend is all in the frontend directory frontend/

  • SRC directory contains all the source code
  • package-lock.json & package.json link the source code to the dependencies
  • node-modulescontains all the dependencies downloaded for the frontend

Backend is parts of the root and app directoy app/

  • run.py is the python file in the root directory and used to start API
  • app directory countains all the code for the API
  • User and document metadata is stored in MongoDB
  • Document andimage files are stored using the Dropbox API

API Routes app/routes.py

/users

  • Get all the users in the collection

/user/create

  • Create a user, save user metadata, and a folder for storage

/user/login

  • Verifies the information input by user is correct

/user/<username>

  • Get metadata information about specific user

/user/<username>/upload

  • Upload a file to mongoDB and storage API

/user/<username>/<filename>/download

  • Download a file from the cloud locally

/documents

  • Get all documents in the collection

/user/<username>/documents

  • Get all the documents from a specific user

/document/<filename>

  • Get file by filename

/user/<username>/<filename>/text

  • Extract all the text from a file

/user/<username>/<filename>/summary

  • Recieve the summary, sentiment, and keywords of whole file

/user/<username>/<filename>/paragraphs

  • Recieve the summary, sentiment, and keywords for each paragraph in file

/user/<username>/<filename>/paragraph/<int:paragraph_number>

  • Recieve the summary, sentiment, and keywords for a respective paragraph

/user/<username>/<filename>/delete

  • Delete the file from the document storage and metadata linked

/user/<username>/<filename>/paragraphs/<keyword>

  • Recieve the summary, sentiment, and keywords for all paragraph with matching keyword

/user/<username>/<filename>/<keyword>/definition

  • Recieve the definition of the respective keyword

/user/<username>/delete_all_files

  • Delete all the files and metadata connected to the user

/user/<username>/delete_all

  • Delete all information and documents by the user

Test the API

test/api_test

  • pytest_api.py test the api using pytest and start by running pytest pytest_api.py
  • test_api.py test the api using pythons unit test by running python -m unittest discover -s tests

Containers

Frontend

To build the container:

  • cd frontend
  • docker build -t my-frontend:latest . To run the container:
  • docker run -d -p 3000:3000 my-frontend:latest

Backend

To build the container:

  • docker build -t my-backend:latest . To run the container:
  • docker run -d -p 5000:5000 my-backend:latest

Build and Run both together

-docker-compose up -d

Task Queue Implementation (No longer implemented)

  • To run a celery work, make sure application is already running and type celery -A app.celery worker --loglevel=info

Project Expectations

Text summary and NLP

  • For the text summarization, keywords, and sentiment, fofor text from images and documents i utlized an array of open source libraries. This was due to the restrictions placed by google and OpenAI for their API's
  • For extracting text from PDFs PyPDF2 & python-docx was used
  • For extracting text from images pytesseract & TESSERACT was used
  • For summarizing text, Spacy was used
  • For sentiment analyzes of text, nltk was used
  • For keywords identification of text, scikit-learn was used

Guidelines

  • I should login to a secure service to upload my content
  • I should be able to upload documents, PDFs or images. The application should translate my documents to text
  • I want the service to tag all my documents and paragraphs within every document with the keywords and know the topics each document cover
  • I should be able to access different paragraphs of different documents based on keywords
  • I should be able to to find all positive, neutral and negative paragraphs and sentences-
  • Keywords within paragraphs should be searchable in government opendata, wikipedia and media organizations, e.g., NYTimes
  • I should find definition of keywords using open services (e.g., OpenAI)
  • I should be able to get summaries of each document
  • I want to discover content from the WEB to enhance story
  • I want to know all names, locations, institutions and address in my documents.

blocks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published