
GitHub Repository: https://github.com/Riotcoke123/IP2Ditch
IP2Ditch is a Python-based web application designed to automatically fetch media (videos and images) from specified communities.win forums, upload them to Fileditch for backup, and provide a simple web interface to browse the archived content. It runs a background process to periodically check for new media and can also be triggered manually.
- Automated Media Fetching: Monitors multiple communities.win API endpoints (e.g., new/hot posts from specified communities).
- Broad Media Support: Handles common video (
.mp4
) and image (.jpg
,.jpeg
,.gif
,.png
,.webp
) formats. - Concurrent API Fetching: Uses threading to fetch data from multiple API URLs simultaneously for efficiency.
- Fileditch Integration: Securely backs up media to Fileditch.
- Local Metadata Storage: Saves information about archived posts (title, author, original link, Fileditch link, media type) in a local
data.json
file. - Web Interface: A Flask-powered web UI to view the collection of archived media, displaying the newest items first.
- Background Processing: Continuously checks for new content at a configurable interval.
- Manual Trigger: Option to trigger the processing cycle via a POST request to an API endpoint.
- Highly Configurable: Uses environment variables for API URLs, Fileditch settings, API credentials, data storage paths, and operational parameters.
- Robust Logging: Detailed logging of operations, errors, and system status, including thread names and UTC timestamps.
- Duplicate Prevention: Avoids reprocessing and re-uploading media that has already been archived by checking post title and author.
- Safe File Handling: Implements atomic writes for the data file to prevent corruption.
- Initialization:
- Loads configuration from environment variables (API keys, URLs, etc.).
- Starts a background thread for periodic processing.
- Launches a Flask web server (using Waitress) to serve the UI and API endpoints.
- Processing Cycle (Background or Manual):
- Fetch Data: Concurrently queries the configured
COMMUNITIES_API_URLS
for new posts.- Requires valid
CW_API_KEY
,CW_API_SECRET
, andCW_XSRF_TOKEN
for authentication.
- Requires valid
- Filter Posts:
- Identifies posts containing direct links to media files with supported extensions (
.mp4
,.jpg
, etc.). - Checks against a local list (
existing_post_ids
derived fromdata.json
) to skip already processed posts (based on title and author).
- Identifies posts containing direct links to media files with supported extensions (
- Download & Upload:
- For each new, supported media post:
- Downloads the media file from its original URL.
- Uploads the downloaded file to the configured
FILEDITCH_UPLOAD_URL
.
- For each new, supported media post:
- Store Metadata:
- If the upload to Fileditch is successful, a new entry is created containing:
- Title of the post
- Author of the post
- Fileditch link (the new backup URL)
- Original media link
- Type of media ("video" or "image")
- This new entry is appended to the
data.json
file.
- If the upload to Fileditch is successful, a new entry is created containing:
- Fetch Data: Concurrently queries the configured
- Web Interface:
- The Flask application serves an
index.html
page that readsdata.json
and displays the archived items in a table, with the most recent entries shown first. - Provides direct links to the media on Fileditch.
- The Flask application serves an
- Clone the Repository:
git clone https://github.com/Riotcoke123/IP2Ditch.git cd IP2Ditch
- Install Dependencies:
Ensure you have Python 3.x installed. Then, install the required packages. It's recommended to use a virtual environment.
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install Flask requests python-dotenv waitress
Alternatively, if a
requirements.txt
file is provided:pip install -r requirements.txt
- Create
.env
File:Create a file named
.env
in the root directory of the project and populate it with the necessary environment variables. Sensitive credentials should never be hardcoded.# Required Communities.win API Credentials CW_API_KEY="YOUR_CW_API_KEY" CW_API_SECRET="YOUR_CW_API_SECRET" CW_XSRF_TOKEN="YOUR_CW_XSRF_TOKEN_FROM_HEADERS_OR_COOKIES"
CW_API_URLS="https://communities.win/api/v2/post/newv2.json?community=somecommunity,https://communities.win/api/v2/post/hotv2.json?community=anothercommunity"
APP_FILEDITCH_URL="https://up1.fileditch.com/upload.php"
Critical:CW_API_KEY
,CW_API_SECRET
, andCW_XSRF_TOKEN
are mandatory for the application to fetch data from communities.win. The application will exit if these are not set. - Ensure Templates and Static Directories:
The application expects an
index.html
file in atemplates
directory. The script attempts to createtemplates
andstatic
directories if they don't exist. Make sure yourtemplates/index.html
is correctly placed. Exampletemplates/index.html
(basic structure):<!DOCTYPE html> <html> <head> <title>Archived Media</title> <!-- Add styles here --> </head> <body> <h1>Archived Media ({{ item_count }} items)</h1> <form action="/process" method="POST"> <button type="submit">Run Processor Manually</button> </form> <table border="1"> <thead> <tr> <th>Title</th> <th>Author</th> <th>Type</th> <th>Fileditch Link</th> <th>Original Link</th> </tr> </thead> <tbody> {% for item in items %} <tr> <td>{{ item.title }}</td> <td>{{ item.author }}</td> <td>{{ item.type }}</td> <td><a href="{{ item.fileditch_link }}" target="_blank">View on Fileditch</a></td> <td><a href="{{ item.original_link }}" target="_blank">Original</a></td> </tr> {% else %} <tr><td colspan="5">No items found.</td></tr> {% endfor %} </tbody> </table> </body> </html>
- Run the Application:
Execute the main Python script (e.g.,
main.py
or the name of your script file).python your_script_name.py
The application will start, initiate the background processor, and the web server will be accessible at
http://<APP_HOST>:<APP_PORT>
(e.g.,http://0.0.0.0:5000
orhttp://localhost:5000
).
The application is configured using environment variables. These can be set directly in your system or placed in a .env
file in the project's root directory (which will be loaded automatically).
Variable Name | Description | Default Value | Required |
---|---|---|---|
CW_API_KEY |
communities.win API Key. | None | Yes |
CW_API_SECRET |
communities.win API Secret. | None | Yes |
CW_XSRF_TOKEN |
communities.win XSRF Token. This is usually obtained from browser cookies or request headers when interacting with the site. | None | Yes |
CW_API_URLS |
Comma-separated list of communities.win API URLs to fetch posts from. |
https://communities.win/api/v2/post/newv2.json?community=ip2always ,https://communities.win/api/v2/post/hotv2.json?community=ip2always ,https://communities.win/api/v2/post/newv2.json?community=spictank ,https://communities.win/api/v2/post/hotv2.json?community=spictank ,https://communities.win/api/v2/post/newv2.json?community=freddiebeans2 ,https://communities.win/api/v2/post/hot2.json?community=freddiebeans2
|
No |
APP_FILEDITCH_URL |
The upload URL for Fileditch. | https://up1.fileditch.com/upload.php |
No |
APP_DATA_FILE_PATH |
Path to the JSON file where archived media metadata is stored. The directory will be created if it doesn't exist. | data.json (relative to script location) |
No |
APP_HOST |
Host address for the Flask application to listen on. | 0.0.0.0 |
No |
APP_PORT |
Port number for the Flask application. | 5000 |
No |
WAITRESS_THREADS |
Number of worker threads for the Waitress WSGI server. | 8 |
No |
PROCESSING_INTERVAL_SECONDS |
Interval in seconds for the background processing thread to fetch and process new posts. | 120 (2 minutes) |
No |
REQUEST_TIMEOUT |
Timeout in seconds for general network requests (fetching API data, downloading files). | 30 |
No |
UPLOAD_TIMEOUT |
Timeout in seconds for uploading files to Fileditch. | 300 (5 minutes) |
No |
GET /
- Description: Displays the main HTML page with a table of archived media.
- Response: HTML page.
POST /process
- Description: Manually triggers one full processing cycle (fetch, download, upload, save).
- Response: JSON object indicating the outcome of the processing.
{ "message": "Processing complete.", "new_items_added": 2, "total_items_in_file": 10, "posts_checked_this_cycle": 50 }
GET /data
- Description: Returns the raw JSON data of all archived items.
- Response: JSON array of archived media objects.
[ { "title": "Cool Video Title", "author": "User123", "fileditch_link": "https://fileditch.com/...", "original_link": "https://example.com/video.mp4", "type": "video" }, ... ]
- Python 3.x
- Flask
- Requests
- python-dotenv
- Waitress
concurrent.futures
(Standard Python library)mimetypes
(Standard Python library)
The application employs Python's built-in logging
module. Logs are output to standard output.
- Format:
YYYY-MM-DDTHH:MM:SSZ - LEVELNAME - [ThreadName] - Message
- Timestamp: UTC.
- Level: Primarily INFO, with DEBUG for more verbose output, WARNING for recoverable issues, ERROR for significant problems, and CRITICAL for fatal errors (like missing essential configs).
Check the console output where the script is running to monitor its activity and troubleshoot issues.
- Network Issues: Timeouts and request exceptions are caught for API fetching, file downloading, and uploading. The application will log the error and typically skip the problematic item or API, continuing with others.
- API Errors: HTTP errors (like 401/403 for bad credentials or 404 for not found) from communities.win or Fileditch are logged. Specific warnings are issued for credential-related errors.
- Data File: Uses an atomic write process (save to a temporary file then replace) to minimize data corruption in
data.json
during saves. If the data file is missing, empty, or malformed, it starts with an empty list. - Background Thread: The main loop of the background processing thread is wrapped in a try-except block to catch unexpected errors and log them, preventing the thread from crashing silently.
x-api-key
, x-api-secret
, x-xsrf-token
) are loaded from environment variables. Ensure these are correctly set for the API requests to succeed.
This project is licensed under the GNU General Public License v3.0.
You may obtain a copy of the license at https://www.gnu.org/licenses/gpl-3.0.en.html.