URLS-MTHRFCKR

Clean your Bookmark files of non-ascii characters with cleanr.py (I use this because my collection of exported bookmarks always seems to have some bad unicode characters and this script fixes that) If you don;t have these issues than don't use it, but if you do use it, make sure to run this first.
Dedupe, fetch url descriptions and url images for CSV files using csvrl.py
Dedupe and fetch url descriptions for bookmark.html files using htmrl.py
Dedupe and fetch url descriptions for Raindrop.io exported html files using rdurl.py
Dedupe and fetch url descriptions for any markdown file using mdurl.py

CLEANR

Overview

The File Cleanup Utility is a Python script that removes non-ASCII characters from various types of files including CSV, HTML, and Markdown. It normalizes non-ASCII characters to their closest ASCII equivalents or removes them entirely.

Features

Support for multiple file types: CSV, HTML, and Markdown.
Logging: Detailed logging to a .log file.
User-Friendly: Interactive command-line interface for easy usage.
Statistics: Shows the total number of changes made during the process.

Dependencies

Python 3.x
BeautifulSoup4 (pip install beautifulsoup4)

Installation

Clone the repository or download the script to your local machine.
Make sure Python is installed.
Install required Python packages.

pip install beautifulsoup4

Usage

Run the script in your terminal:

python file_cleanup.py

You will be prompted to enter the file path and specify the file type (csv, html, md).

Functions

remove_non_ascii(text: str) -> Tuple[str, int] - Replaces non-ASCII characters in a given text string with their closest ASCII equivalent or removes them. Returns a tuple with the new text and the number of changes made.
process_csv(file_path: str) -> int - Processes a CSV file to remove non-ASCII characters. Returns the total number of changes made.
process_html(file_path: str) -> int - Processes an HTML file to remove non-ASCII characters using BeautifulSoup. Returns the total number of changes made.
process_md(file_path: str) -> int - Processes a Markdown file to remove non-ASCII characters. Returns the total number of changes made.
main() - The main function which orchestrates the file cleanup process based on user input.

Logs

Logs are saved in a file named file_cleanup.log in the same directory as the script.

CSVRL

This script reads a CSV file containing bookmark information, processes the URLs, and writes the updated data to a new CSV file. It performs the following tasks:

Normalizes the URLs to remove redundant protocols.
Removes duplicate URLs.
Fetches book descriptions from the URLs using BeautifulSoup.
Fetches book cover images using BeautifulSoup.
Writes the updated data to a new CSV file.

To use the script, follow these steps:

Instructions

Save the script wherever you wish.
Open a terminal or command prompt.
Navigate to the directory where the script is saved.
Run the script with the following command:

python csvrl.py

The script will prompt you to enter the path to the CSV file.
After providing the file path, the script will process the URLs.
Fetch descriptions and cover images.
Write the updated data to a new CSV file.
Real time logging in the console as the script is running.
It also creates a log file called url_processing.log

HTMRL

This script processes an HTML bookmark file to deduplicate URLs and fetch missing descriptions.

Usage

To use this script:

Ensure you have Python 3 and the required modules installed:
- BeautifulSoup
- Requests
- Logging
Save the htmrl.py script and run:
```
python htmrl.py
```
Enter the path to your HTML bookmark file when prompted.
The script will process the file, removing duplicates and fetching descriptions.
Updated output will be saved to a new HTML file.
Progress and statistics will be logged to the console and bookmark_processing.log.

What it Does

Normalizes URLs to remove duplicate protocols
Removes duplicate bookmark URLs
Fetches missing descriptions using Requests
Writes updated bookmarks to a new HTML file
Provides statistics and logging

Requirements

Python 3
BeautifulSoup
Requests
Logging

MDURL

mdurl.py is a Python script that helps you manage URLs in your markdown files. It can fetch the description of a URL and normalize the URL for consistency.

Features

Fetch Description - This script can fetch the description of a URL by sending a GET request to the URL and parsing the HTML response to find the meta description tag.
Normalize URL - This script can normalize a URL by converting it to lowercase and removing the 'http://' or 'https://' prefix and trailing slashes.

Functions

fetch_description(url) - This function sends a GET request to the provided URL and parses the HTML response to find the meta description tag. If the description is found, it is returned. If any error occurs during this process, it is logged and None is returned.
normalize_url(url) - This function normalizes the provided URL by converting it to lowercase, removing the 'http://' or 'https://' prefix, and removing any trailing slashes.

Dependencies

This script depends on the following Python libraries:

requests - For sending HTTP requests.
BeautifulSoup - For parsing HTML responses.

Make sure to install these dependencies using pip

RDURL

This Python script is designed to process an HTML bookmark file, removing duplicate bookmarks and fetching missing descriptions for bookmarks. The script uses the following Python modules:

bs4 (BeautifulSoup) For parsing the HTML bookmark file
requests For fetching descriptions for bookmarks
logging For logging errors and information
os For file handling

Usage

To use this script, follow these steps:

Ensure that the required Python modules are installed ( bs4 , requests , logging , and os ).
Run the script.
When prompted, enter the path to the HTML bookmark file to be processed.
The script will remove duplicate bookmarks and fetch missing descriptions, and save the updated HTML to a new file in the same directory as the original file.

Code Overview

The script begins by importing the required Python modules ( bs4 , requests , logging , and os ).
Next, the script initializes logging, statistics counters, and a function to normalize URLs.
The user is prompted for the path to the HTML bookmark file to be processed.
The script then reads the HTML file using BeautifulSoup , and initializes a dictionary to hold unique URLs.
The script iterates through all <DT> tags containing bookmarks, and extracts the URL and description (if available) for each bookmark.
Duplicate URLs are removed, and missing descriptions are fetched using requests , if possible.
Finally, the updated HTML is saved to a new file in the same directory as the original file, and statistics are displayed to the user.

URL2MD

This Python script is designed to fetch the description and title of a list of URLs and save them in a markdown file. The script uses the following Python modules:

requests - For sending HTTP requests and fetching the raw HTML content of the URLs
re - For extracting URLs from user input using regular expressions
bs4 - (BeautifulSoup) For parsing the HTML content and extracting the description and title

Usage

To use this script, follow these steps:

Ensure that the required Python modules are installed ( requests , re , and bs4 ).
Run the script.
When prompted, enter the URLs you want to fetch descriptions and titles for. URLs can be provided in plain format or in markdown format.
The script will extract the URLs using regular expressions and fetch the description and title for each URL.
The fetched information will be saved in a markdown file named link_descriptions.md in the same directory as the script.

Code Overview

The script begins by importing the required Python modules ( requests , re , and bs4 ).

Next, the script defines a function fetch_description_and_title(url) to fetch the description and title of a given URL. This function sends a GET request to fetch the raw HTML content, parses the HTML using BeautifulSoup , and extracts the description and title using specific meta tags and title tags.

The script then defines the main() function. This function prompts the user for a bulk list of URLs, extracts the URLs using regular expressions, and iterates through each URL to fetch the description and title using the fetch_description_and_title() function. The fetched information is then written to a markdown file.

Finally, the main() function is called if the script is run directly, executing the entire process.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
MTHRFCKR.png		MTHRFCKR.png
README.md		README.md
banner.png		banner.png
byerl.py		byerl.py
cleanr.py		cleanr.py
csvrl.py		csvrl.py
csvurl.py		csvurl.py
htmclean.py		htmclean.py
htmrl.py		htmrl.py
mdurl.py		mdurl.py
rdurl.py		rdurl.py
url2md.py		url2md.py

whoisdsmith/urls-mthrfckr

Folders and files

Latest commit

History

Repository files navigation

URLS-MTHRFCKR

Table of Contents

CLEANR

Overview

Features

Dependencies

Installation

Usage

Functions

Logs

CSVRL

Instructions

HTMRL

Usage

What it Does

Requirements

MDURL

Features

Functions

Dependencies

RDURL

Usage

Code Overview

URL2MD

Usage

Code Overview

About

Topics

Resources

Stars

Watchers

Forks

Languages