Skip to content
This repository was archived by the owner on Jun 6, 2025. It is now read-only.

Riotcoke123/IP2Ditch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

screenshot

IP2Ditch - IP2Always Media Backup & Archiver

GitHub Repository: https://github.com/Riotcoke123/IP2Ditch

Overview

IP2Ditch is a Python-based web application designed to automatically fetch media (videos and images) from specified communities.win forums, upload them to Fileditch for backup, and provide a simple web interface to browse the archived content. It runs a background process to periodically check for new media and can also be triggered manually.

Features

  • Automated Media Fetching: Monitors multiple communities.win API endpoints (e.g., new/hot posts from specified communities).
  • Broad Media Support: Handles common video (.mp4) and image (.jpg, .jpeg, .gif, .png, .webp) formats.
  • Concurrent API Fetching: Uses threading to fetch data from multiple API URLs simultaneously for efficiency.
  • Fileditch Integration: Securely backs up media to Fileditch.
  • Local Metadata Storage: Saves information about archived posts (title, author, original link, Fileditch link, media type) in a local data.json file.
  • Web Interface: A Flask-powered web UI to view the collection of archived media, displaying the newest items first.
  • Background Processing: Continuously checks for new content at a configurable interval.
  • Manual Trigger: Option to trigger the processing cycle via a POST request to an API endpoint.
  • Highly Configurable: Uses environment variables for API URLs, Fileditch settings, API credentials, data storage paths, and operational parameters.
  • Robust Logging: Detailed logging of operations, errors, and system status, including thread names and UTC timestamps.
  • Duplicate Prevention: Avoids reprocessing and re-uploading media that has already been archived by checking post title and author.
  • Safe File Handling: Implements atomic writes for the data file to prevent corruption.

How It Works

  1. Initialization:
    • Loads configuration from environment variables (API keys, URLs, etc.).
    • Starts a background thread for periodic processing.
    • Launches a Flask web server (using Waitress) to serve the UI and API endpoints.
  2. Processing Cycle (Background or Manual):
    1. Fetch Data: Concurrently queries the configured COMMUNITIES_API_URLS for new posts.
      • Requires valid CW_API_KEY, CW_API_SECRET, and CW_XSRF_TOKEN for authentication.
    2. Filter Posts:
      • Identifies posts containing direct links to media files with supported extensions (.mp4, .jpg, etc.).
      • Checks against a local list (existing_post_ids derived from data.json) to skip already processed posts (based on title and author).
    3. Download & Upload:
      • For each new, supported media post:
        • Downloads the media file from its original URL.
        • Uploads the downloaded file to the configured FILEDITCH_UPLOAD_URL.
    4. Store Metadata:
      • If the upload to Fileditch is successful, a new entry is created containing:
        • Title of the post
        • Author of the post
        • Fileditch link (the new backup URL)
        • Original media link
        • Type of media ("video" or "image")
      • This new entry is appended to the data.json file.
  3. Web Interface:
    • The Flask application serves an index.html page that reads data.json and displays the archived items in a table, with the most recent entries shown first.
    • Provides direct links to the media on Fileditch.

Setup and Running

  1. Clone the Repository:
    git clone https://github.com/Riotcoke123/IP2Ditch.git
    cd IP2Ditch
  2. Install Dependencies:

    Ensure you have Python 3.x installed. Then, install the required packages. It's recommended to use a virtual environment.

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install Flask requests python-dotenv waitress

    Alternatively, if a requirements.txt file is provided:

    pip install -r requirements.txt
  3. Create .env File:

    Create a file named .env in the root directory of the project and populate it with the necessary environment variables. Sensitive credentials should never be hardcoded.

    # Required Communities.win API Credentials
    CW_API_KEY="YOUR_CW_API_KEY"
    CW_API_SECRET="YOUR_CW_API_SECRET"
    CW_XSRF_TOKEN="YOUR_CW_XSRF_TOKEN_FROM_HEADERS_OR_COOKIES"
    

    Optional: Override default API URLs (comma-separated)

    Optional: Override default Fileditch upload URL

    Optional: Override default data file path

    APP_DATA_FILE_PATH="data/my_archive.json"

    Optional: Server Configuration

    APP_HOST="0.0.0.0"

    APP_PORT="5000"

    WAITRESS_THREADS="8"

    Optional: Processing Configuration

    PROCESSING_INTERVAL_SECONDS="120" # How often to check for new posts (in seconds)

    REQUEST_TIMEOUT="30" # Timeout for network requests (seconds)

    UPLOAD_TIMEOUT="300" # Timeout for file uploads (seconds)

    Critical: CW_API_KEY, CW_API_SECRET, and CW_XSRF_TOKEN are mandatory for the application to fetch data from communities.win. The application will exit if these are not set.
  4. Ensure Templates and Static Directories:

    The application expects an index.html file in a templates directory. The script attempts to create templates and static directories if they don't exist. Make sure your templates/index.html is correctly placed. Example templates/index.html (basic structure):

    <!DOCTYPE html>
    <html>
    <head>
    <title>Archived Media</title>
    <!-- Add styles here -->
    </head>
    <body>
    <h1>Archived Media ({{ item_count }} items)</h1>
    <form action="/process" method="POST">
    <button type="submit">Run Processor Manually</button>
    </form>
    <table border="1">
    <thead>
    <tr>
    <th>Title</th>
    <th>Author</th>
    <th>Type</th>
    <th>Fileditch Link</th>
    <th>Original Link</th>
    </tr>
    </thead>
    <tbody>
    {% for item in items %}
    <tr>
    <td>{{ item.title }}</td>
    <td>{{ item.author }}</td>
    <td>{{ item.type }}</td>
    <td><a href="{{ item.fileditch_link }}" target="_blank">View on Fileditch</a></td>
    <td><a href="{{ item.original_link }}" target="_blank">Original</a></td>
    </tr>
    {% else %}
    <tr><td colspan="5">No items found.</td></tr>
    {% endfor %}
    </tbody>
    </table>
    </body>
    </html>
    

  5. Run the Application:

    Execute the main Python script (e.g., main.py or the name of your script file).

    python your_script_name.py

    The application will start, initiate the background processor, and the web server will be accessible at http://<APP_HOST>:<APP_PORT> (e.g., http://0.0.0.0:5000 or http://localhost:5000).

Configuration (Environment Variables)

The application is configured using environment variables. These can be set directly in your system or placed in a .env file in the project's root directory (which will be loaded automatically).

Variable Name Description Default Value Required
CW_API_KEY communities.win API Key. None Yes
CW_API_SECRET communities.win API Secret. None Yes
CW_XSRF_TOKEN communities.win XSRF Token. This is usually obtained from browser cookies or request headers when interacting with the site. None Yes
CW_API_URLS Comma-separated list of communities.win API URLs to fetch posts from. https://communities.win/api/v2/post/newv2.json?community=ip2always,
https://communities.win/api/v2/post/hotv2.json?community=ip2always,
https://communities.win/api/v2/post/newv2.json?community=spictank,
https://communities.win/api/v2/post/hotv2.json?community=spictank,
https://communities.win/api/v2/post/newv2.json?community=freddiebeans2,
https://communities.win/api/v2/post/hot2.json?community=freddiebeans2
No
APP_FILEDITCH_URL The upload URL for Fileditch. https://up1.fileditch.com/upload.php No
APP_DATA_FILE_PATH Path to the JSON file where archived media metadata is stored. The directory will be created if it doesn't exist. data.json (relative to script location) No
APP_HOST Host address for the Flask application to listen on. 0.0.0.0 No
APP_PORT Port number for the Flask application. 5000 No
WAITRESS_THREADS Number of worker threads for the Waitress WSGI server. 8 No
PROCESSING_INTERVAL_SECONDS Interval in seconds for the background processing thread to fetch and process new posts. 120 (2 minutes) No
REQUEST_TIMEOUT Timeout in seconds for general network requests (fetching API data, downloading files). 30 No
UPLOAD_TIMEOUT Timeout in seconds for uploading files to Fileditch. 300 (5 minutes) No

API Endpoints

  • GET /
    • Description: Displays the main HTML page with a table of archived media.
    • Response: HTML page.
  • POST /process
    • Description: Manually triggers one full processing cycle (fetch, download, upload, save).
    • Response: JSON object indicating the outcome of the processing.
      {
      "message": "Processing complete.",
      "new_items_added": 2,
      "total_items_in_file": 10,
      "posts_checked_this_cycle": 50
      }
  • GET /data
    • Description: Returns the raw JSON data of all archived items.
    • Response: JSON array of archived media objects.
      [
      {
      "title": "Cool Video Title",
      "author": "User123",
      "fileditch_link": "https://fileditch.com/...",
      "original_link": "https://example.com/video.mp4",
      "type": "video"
      },
      ...
      ]

Dependencies

  • Python 3.x
  • Flask
  • Requests
  • python-dotenv
  • Waitress
  • concurrent.futures (Standard Python library)
  • mimetypes (Standard Python library)

Logging

The application employs Python's built-in logging module. Logs are output to standard output.

  • Format: YYYY-MM-DDTHH:MM:SSZ - LEVELNAME - [ThreadName] - Message
  • Timestamp: UTC.
  • Level: Primarily INFO, with DEBUG for more verbose output, WARNING for recoverable issues, ERROR for significant problems, and CRITICAL for fatal errors (like missing essential configs).

Check the console output where the script is running to monitor its activity and troubleshoot issues.

Error Handling and Resilience

  • Network Issues: Timeouts and request exceptions are caught for API fetching, file downloading, and uploading. The application will log the error and typically skip the problematic item or API, continuing with others.
  • API Errors: HTTP errors (like 401/403 for bad credentials or 404 for not found) from communities.win or Fileditch are logged. Specific warnings are issued for credential-related errors.
  • Data File: Uses an atomic write process (save to a temporary file then replace) to minimize data corruption in data.json during saves. If the data file is missing, empty, or malformed, it starts with an empty list.
  • Background Thread: The main loop of the background processing thread is wrapped in a try-except block to catch unexpected errors and log them, preventing the thread from crashing silently.
Note on Communities.win Headers: The script uses a specific set of HTTP headers, including a User-Agent string, when making requests to the communities.win API. Sensitive parts of these headers (x-api-key, x-api-secret, x-xsrf-token) are loaded from environment variables. Ensure these are correctly set for the API requests to succeed.

License

This project is licensed under the GNU General Public License v3.0.
You may obtain a copy of the license at https://www.gnu.org/licenses/gpl-3.0.en.html.