New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git Reporter Hook #330
Comments
It sounds like a custom reporter subclass in The There's a simple example in the |
Thanks, i will give it a try. My Python Skills just crit to -10 |
Did you make any head way with this? This is exactly what I'd like to achieve. Output to a script action! |
Nothing so far. I didn't find time to dig in. |
class GitReport(reporters.ReporterBase):
__kind__ = 'gitreport'
def submit(self):
for job_state in self.report.get_filtered_job_states(self.job_states):
pass @lancelon with the code snipped you can achieve it, probably. at pass there are only jobs that are changed or new. In html2txt.py is a example how you execute a shell command and read the output from it. if you want the diff from the changes. Information about job_state are in handler.py. job_state.job comes from jobs.py I stumble thru it with try and error if i find a bit of time. But my goal has changed a bit. The initial idea was just to call a Script or execute a command but now i try it with gitpython without using the shell. If i have a working version i probably make a pull request for the hooks example or i just post them in this ticket if its not "well enough" |
So, i have a running version from my gitreport done. To enable if add
to your urlwatch config and it requires gitpython It create for each Domain a Subfolder in the Repository and create for each job a textfile with the new content. The File is named after its job.name+job.guid.txt All kind of feedback is welcome. (But please remind "My Python Skills just crit to -10") In the future i plan to add remote support, fix all bugs that i didn't find right now and maybe the stuff that is suggested. import logging
import os
import unicodedata
import string
from urllib.parse import urlparse
from appdirs import AppDirs
import urlwatch
from urlwatch import filters
from urlwatch import reporters
logger = logging.getLogger(__name__)
# Custom Git Reporter
class GitReport(reporters.ReporterBase):
"""Create a File for each Job and Commit it to a Git Repository"""
__kind__ = 'gitreport'
def submit(self):
if self.config.get('enabled', False) is False:
return
from git import Repo
#We look if there is a Git Path in the config or we use a fallback
urlwatch_cache_dir = AppDirs(urlwatch.pkgname).user_cache_dir
fallback = os.path.join(urlwatch_cache_dir, 'git')
git_path = self.config.get('path', fallback)
if (git_path == ''):
logger.info('Git path is emptry. Using: ' + os.path.abspath(fallback))
git_path = fallback
#Look if the Folder is presend and if not create it
if not os.path.exists(git_path):
logger.debug('Create Folder: ' + git_path)
os.mkdir(git_path)
# Because its a new Folder, create a new Repository
repo = Repo.init(os.path.abspath(git_path))
else:
repo = Repo(os.path.abspath(git_path))
# Check for Untracked Files and Abort
assert repo.untracked_files == []
#Check if we have a remote Repository and fetch changes befor adding or changin files.
if repo.remotes != []:
remote = True
repo.remotes.origin.fetch()
repo.remotes.origin.pull()
else:
remote = False
#Write all Changes.
for job_state in self.report.get_filtered_job_states(self.job_states):
# We use the Domain as Subdirectory
parsed_uri = urlparse(job_state.job.get_location())
result = '{uri.netloc}'.format(uri=parsed_uri)
#Check if the job_path exist and if not create it
job_path = os.path.join(git_path, result)
if not os.path.exists(job_path):
os.mkdir(job_path)
# Generate a save Filename
filename = self.clean_filename(job_state.job.pretty_name())
filename = filename + '.' + job_state.job.get_guid() + '.txt'
# Create the File or override the old file
with open(os.path.join(job_path, filename), 'w+', encoding='utf-8') as writer:
writer.write(job_state.new_data)
repo.index.add([os.path.join(job_path, filename)])
repo.index.commit(job_state.job.pretty_name() + ' \n' + result + ' \n' + job_state.job.get_location())
#Check if we have a remote Repository and push the changes.
if remote:
repo.remotes.origin.push()
#This Function is from https://gist.github.com/wassname/1393c4a57cfcbf03641dbc31886123b8
@staticmethod
def clean_filename(filename, replace=' '):
whitelist = "-_.() %s%s" % (string.ascii_letters, string.digits)
char_limit = 210 # I add a Sha-1 Hash and the file extension
# replace spaces
for r in replace:
filename = filename.replace(r, '_')
# keep only valid ascii chars
cleaned_filename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore').decode()
# keep only whitelisted chars
cleaned_filename = ''.join(c for c in cleaned_filename if c in whitelist)
if len(cleaned_filename) > char_limit:
logger.info("Warning, filename truncated because it was over {}. Filenames may no longer be unique".format(char_limit))
return cleaned_filename[:char_limit] |
You mean |
Thank you. I guess they are generated in JobBase.init() so i didn't find them and didn't understand that code. And is there a unfiltered version of the request? I did try |
The unfiltered data is not saved in the current implementation. |
I change it a bit. I moved all changes in a single Commit. I added a pseudo filter to provide a job based subfolder in the repository. My Todo List is to add the ability to clone a existing repository with the url from the urlwatch-config. import logging
import os
import unicodedata
import string
from urllib.parse import urlparse
from appdirs import AppDirs
import lxml.html
import urlwatch
from urlwatch import filters
from urlwatch import reporters
class GitSubPath(filters.FilterBase):
"""This is a Dummyfilter for git-report.
Its only purpose is to provide a subfilter String as path for gitreporter
"""
__kind__ = 'git-path'
def filter(self, data, subfilter=None):
if subfilter is None:
raise ValueError('git-path needs a name for a Subfolder in the Git Repository')
return data
class bUnicodeDummy(filters.FilterBase):
"""This is a Dummyfilter for git-report
If you use non asscii charakters in your Name you can change the filename Whitelist to a Blacklist
"""
__kind__ = 'bUnicode'
def filter(self, data, subfilter=None):
if subfilter is None:
subfilter = True
return data
# Custom Git Reporter
class GitReport(reporters.ReporterBase):
"""Create a File for each Job and Commit it to a Git Repository"""
__kind__ = 'gitreport'
def submit(self):
if self.config.get('enabled', False) is False:
return
from git import Repo
# We look if there is a Git Path in the config or we use a fallback
urlwatch_cache_dir = AppDirs(urlwatch.pkgname).user_cache_dir
fallback = os.path.join(urlwatch_cache_dir, 'git')
git_path = self.config.get('path', fallback)
if (git_path == ''):
logger.info('Git path is emptry. Using: ' + os.path.abspath(fallback))
git_path = fallback
# Look if the Folder is presend and if not create it
if not os.path.exists(git_path):
logger.debug('Create Folder: ' + git_path)
os.mkdir(git_path)
# Because its a new Folder, create a new Repository
repo = Repo.init(os.path.abspath(git_path))
else:
repo = Repo(os.path.abspath(git_path))
# Check if we have a remote Repository and fetch changes befor adding or changin files.
if repo.remotes != []:
print("Fetch and Pull from Git Repository")
remote = True
repo.remotes.origin.fetch() # Tthis 2 Steps need some time.
repo.remotes.origin.pull()
else:
remote = False
commit_message = ""
# Write all Changes.
for job_state in self.report.get_filtered_job_states(self.job_states):
# Unchanged or Error states are nothing we can do with
if (job_state.verb == "unchanged" or job_state.verb == "error"):
continue
# I try to get a filterlist with its parameter
# if we find git-path filter then lets read its parameter
filters = {}
if job_state.job.filter is not None:
filterslist = job_state.job.filter.split(',')
for key in filterslist:
if len(key.split(':', 1)) == 2:
filters[key.split(':', 1)[0]] = key.split(':', 1)[1]
parsed_uri = urlparse(job_state.job.get_location())
result = '{uri.netloc}'.format(uri=parsed_uri)
if filters.get('git-path', None) is not None:
job_path = os.path.join(git_path, filters['git-path'])
if not os.path.exists(job_path):
os.mkdir(job_path)
else:
# Check if the job_path exist and if not create it
job_path = os.path.join(git_path, result)
if not os.path.exists(job_path):
os.mkdir(job_path)
# Generate a save Filename
if(filters.get('bUnicode', False)): # bUnicode is a Dummyfilter, he does nothing else as to provide a Boolean
filename = self.clean_filename2(job_state.job.pretty_name())
else:
filename = self.clean_filename(job_state.job.pretty_name())
filename = filename + '.' + job_state.job.get_guid() + '.txt'
# Create the File or override the old file
with open(os.path.join(job_path, filename), 'w+', encoding='utf-8') as writer:
writer.write(job_state.new_data)
repo.index.add([os.path.join(job_path, filename)])
message = "%s\n%s \n%s\n\n" % (job_state.job.pretty_name(), result, job_state.job.get_location())
commit_message += message
# Add all Changes in one Commit
if (len(list(self.report.get_filtered_job_states(self.job_states))) > 0):
repo.index.commit(commit_message)
# Check if we have a remote Repository and push the changes.
if remote:
print("Push Changes to the Repository ...")
repo.remotes.origin.push()
print("Done.")
# This Function is from https://gist.github.com/wassname/1393c4a57cfcbf03641dbc31886123b8
@staticmethod
def clean_filename(filename, replace=' '):
whitelist = "-_.() %s%s" % (string.ascii_letters, string.digits)
char_limit = 210 # I add a Sha-1 Hash and the file extension
# replace spaces
for r in replace:
filename = filename.replace(r, '_')
# keep only valid ascii chars
cleaned_filename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore').decode()
# keep only whitelisted chars
cleaned_filename = ''.join(c for c in cleaned_filename if c in whitelist)
if len(cleaned_filename) > char_limit:
logger.info("Warning, filename truncated because it was over {}. Filenames may no longer be unique".format(char_limit))
return cleaned_filename[:char_limit]
# This Function is from https://gist.github.com/wassname/1393c4a57cfcbf03641dbc31886123b8
# I changed this to a blacklist to fit my needs with asian Filenames
@staticmethod
def clean_filename2(filename, replace=' '):
blacklist = "|*/\\%&$§!?=<>:\""
char_limit = 210 # I add a Sha-1 Hash and the file extension
# replace spaces
for r in replace:
filename = filename.replace(r, '_')
# keep only valid ascii chars
cleaned_filename = unicodedata.normalize('NFKD', filename)
# remove blacklistet chars
cleaned_filename = ''.join(c for c in cleaned_filename if c not in blacklist)
if len(cleaned_filename) > char_limit:
logger.info("Warning, filename truncated because it was over {}. Filenames may no longer be unique".format(char_limit))
return cleaned_filename[:char_limit] EDIT: Fix a Error if no filter is set. |
I'm not really happy with the email notification and how to see changes.
Is there a way to trigger a simple external script with some parameters like the name-of-urls.yaml, name and url? my intention is to pipe all in a text/html/xml file and commit them to a personal git/svn/hg repository. this way i have a nice history and see the changes on different devices in a easy way.
i would use the name of the urls.yaml as a initial folder name (i want to use this to group them), the name for the filename and the the url in the header of the text/html/xml file for a quickly open the site again. maybe complete entry from the urls.yaml.
Something like this is maybe helpful to #53 as well.
The text was updated successfully, but these errors were encountered: