- Type into root directory
.\smartDoc\Scripts\activate
- These requirement are only for the flask API and are found in requirements.txt
pip freeze
- Go to the frontend directory and type
npm start
-Go to root directory and type flask run
or python run.py
SRC
directory contains all the source codepackage-lock.json
&package.json
link the source code to the dependenciesnode-modules
contains all the dependencies downloaded for the frontend
run.py
is the python file in the root directory and used to start APIapp
directory countains all the code for the API- User and document metadata is stored in MongoDB
- Document andimage files are stored using the Dropbox API
/users
- Get all the users in the collection
/user/create
- Create a user, save user metadata, and a folder for storage
/user/login
- Verifies the information input by user is correct
/user/<username>
- Get metadata information about specific user
/user/<username>/upload
- Upload a file to mongoDB and storage API
/user/<username>/<filename>/download
- Download a file from the cloud locally
/documents
- Get all documents in the collection
/user/<username>/documents
- Get all the documents from a specific user
/document/<filename>
- Get file by filename
/user/<username>/<filename>/text
- Extract all the text from a file
/user/<username>/<filename>/summary
- Recieve the summary, sentiment, and keywords of whole file
/user/<username>/<filename>/paragraphs
- Recieve the summary, sentiment, and keywords for each paragraph in file
/user/<username>/<filename>/paragraph/<int:paragraph_number>
- Recieve the summary, sentiment, and keywords for a respective paragraph
/user/<username>/<filename>/delete
- Delete the file from the document storage and metadata linked
/user/<username>/<filename>/paragraphs/<keyword>
- Recieve the summary, sentiment, and keywords for all paragraph with matching keyword
/user/<username>/<filename>/<keyword>/definition
- Recieve the definition of the respective keyword
/user/<username>/delete_all_files
- Delete all the files and metadata connected to the user
/user/<username>/delete_all
- Delete all information and documents by the user
test/api_test
pytest_api.py
test the api using pytest and start by runningpytest pytest_api.py
test_api.py
test the api using pythons unit test by runningpython -m unittest discover -s tests
To build the container:
cd frontend
docker build -t my-frontend:latest .
To run the container:docker run -d -p 3000:3000 my-frontend:latest
To build the container:
docker build -t my-backend:latest .
To run the container:docker run -d -p 5000:5000 my-backend:latest
-docker-compose up -d
- To run a celery work, make sure application is already running and type
celery -A app.celery worker --loglevel=info
- For the text summarization, keywords, and sentiment, fofor text from images and documents i utlized an array of open source libraries. This was due to the restrictions placed by google and OpenAI for their API's
- For extracting text from PDFs PyPDF2 & python-docx was used
- For extracting text from images pytesseract & TESSERACT was used
- For summarizing text, Spacy was used
- For sentiment analyzes of text, nltk was used
- For keywords identification of text, scikit-learn was used
- I should login to a secure service to upload my content
- I should be able to upload documents, PDFs or images. The application should translate my documents to text
- I want the service to tag all my documents and paragraphs within every document with the keywords and know the topics each document cover
- I should be able to access different paragraphs of different documents based on keywords
- I should be able to to find all positive, neutral and negative paragraphs and sentences-
- Keywords within paragraphs should be searchable in government opendata, wikipedia and media organizations, e.g., NYTimes
- I should find definition of keywords using open services (e.g., OpenAI)
- I should be able to get summaries of each document
- I want to discover content from the WEB to enhance story
- I want to know all names, locations, institutions and address in my documents.