Book2SocialFeed 📚➡️📱

This Python script extracts text from PDF files, splits it into chunks, and saves the chunks as both JSON and HTML files. It's useful for processing large documents and preparing text data for further analysis or processing, such as creating social media content from books.

Features 🌟

Extracts text from PDF files 📄
Saves text as JSON and HTML files 📊

Roadmap 🛣️

Accept input from file explorer.
Add a web interface.
Improve chunking with AI models.

Requirements 🛠️

Python 3.6+
PyPDF2 library
PyQt5

Installation 🚀

Clone this repository:

git clone https://github.com/thethmuu/book2socialfeed.git

Navigate to the project directory:
```
cd book2socialfeed
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage 🖥️

Run the script:
```
python main.py
```
Enter the following prompts:
- PDF file name 📁
- Number of pages to skip (default is 1) ⏭️
- Chunk size (default is 50) 📏
The script generates:
- output.json: Extracted text chunks
- {input_filename}_output.html: Basic styled representation of the chunks, where {input_filename} is the name of the PDF file (truncated to 20 characters if necessary).

Output 📊

output.json contains an array of text chunks.
{input_filename}_output.html displays the text chunks in a simple format, named after the input PDF file for easier identification.

Customization ⚙️

Modify chunk_size and skip_pages in the script for different defaults.

Contributing 🤝

Contributions and feature requests are welcome! Check the issues page.

License 📜

This project is MIT licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
template.html		template.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book2SocialFeed 📚➡️📱

Features 🌟

Roadmap 🛣️

Requirements 🛠️

Installation 🚀

Usage 🖥️

Output 📊

Customization ⚙️

Contributing 🤝

License 📜

About

Releases

Packages

Languages

thethmuu/book2socialfeed

Folders and files

Latest commit

History

Repository files navigation

Book2SocialFeed 📚➡️📱

Features 🌟

Roadmap 🛣️

Requirements 🛠️

Installation 🚀

Usage 🖥️

Output 📊

Customization ⚙️

Contributing 🤝

License 📜

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages