Permalink
Browse files

Initial

  • Loading branch information...
0 parents commit 3e248db216d302a1b80c81744697afe7d21f5fbc Tom Link committed Jun 7, 2009
Showing with 3,899 additions and 0 deletions.
  1. +112 −0 History.txt
  2. +88 −0 Makefile
  3. +100 −0 Makefile.config
  4. +780 −0 README.txt
  5. +31 −0 Rakefile
  6. +43 −0 bin/websitary
  7. +30 −0 index.txt
  8. +555 −0 lib/websitary.rb
  9. +39 −0 lib/websitary/applog.rb
  10. +1,903 −0 lib/websitary/configuration.rb
  11. +58 −0 lib/websitary/filemtimes.rb
  12. +160 −0 lib/websitary/htmldiff.rb
@@ -0,0 +1,112 @@
+= 0.6
+
+* RSS attachments: Source title is preferred to the channel's title.
+* body_html: If there is no body tag, use the document as is.
+* rss: also scan items without descriptions with :rss_find_enclosure
+
+= 0.5
+
+* mailto: and javascript: hrefs are now handled via the exclude option
+* rewrite absolute URLs sans host correctly
+* strip href and image src tags in order to prevent parser errors
+* some scaffolding for mechanize
+* global proxy option (currently only used for mechanize)
+* use -nolist for lynx
+* catch errors in Websitary::App#execute_downdiff
+* :rss_find_enclosure => LAMBDA: Extract the enclosure URL from the item
+ description
+* :rss_format_local_copy => STRING|BLOCK/2: Format the display of the
+ local copy.
+
+
+= 0.4
+
+* Sources may have a :timeout option.
+* exclude: Argument can be a string or a regexp.
+* htmldiff: :ignore option to exclude certain nodes from the diff.
+* Left-mouse clicks make items collapse/expand.
+* iconv: Support for converting encodings (require the per-url iconv
+ option to be set).
+* exclude mailto urls.
+
+
+= 0.3
+
+* Renamed the global option :downloadhtml to :download_html.
+* The downloader for robots and rss enclosures should now be properly
+ configurable via the global options :download_robots and
+ :download_rss_enclosure (default: :openuri).
+* Respect rel="nofollow" on hyperreferences.
+* :wdays, :mdays didn't work.
+* --exclude command line options, exclude configuration command
+* Check for robots.txt-compliance after testing if the URL is
+ appropriate.
+* htmldiff.rb can now also highlight differences � la websec's webdiff.
+* configuration.rb: Ignore pubDate and certain other non-essential fields (tags
+ etc.) when constructing rss item IDs.
+
+
+= 0.2.1
+
+* Use URI.merge for constructing robots.txt uri.
+* Fixed minor show-stopper.
+
+
+= 0.2.0
+
+* Renamed the project from websitiary to websitary (without the
+ additional "i")
+* The default output filename is now constructed on basis of the profile
+ names joined with a comma.
+* Apply rewrite-rules to URLs in text output.
+* Set user-agent (:body_html)
+* Exit with 1 if differences were found
+* Command line options have slightly changed: -e now is the short form
+ for --execute
+* Commands that can be triggered by the -e command-line switch: downdiff
+ (default), configuration (list currently configured urls), latest
+ (show the current version of all urls), review (show the latest
+ report)
+* Protect against filenames being too long (max size can be configured
+ via: <tt>option :global, :filename_size => N</tt>)
+* Try to migrate local copies from the older flat to the new
+ hierarchical cache layout
+* Disabled -E/--edit, --review command-line options (use -e instead)
+* Try to maintain file atime/mtime when copying/moving files
+* FIX: Problem with loading robots.txt
+* Respect meta tag: robots="nofollow" (noindex is only checked in
+ conjunction with :download => :website*)
+* quicklist profile: register urls via the -eadd command-line switch;
+ see "Usage" for an example
+* Temporaly save diffs, so that we can reuse them when websitary should
+ exit ungracefully.
+* Renamed :inner_html to :body_html
+* New shortcuts: :ftp, :ftp_recursive, :img, :rss, :opml (rudementary)
+* New experimental commands: aggregate, show ... can be used to
+ periodically check for changes (e.g. of rss feeds) but to review these
+ changes only once in a while
+* Experimental --timer command-line option to re-run websitary every X
+ seconds.
+* The :rss differ has an option :rss_enclosure (true or directory name)
+ that will be used for automatically saving new enclosures (e.g. mp3
+ files in podcasts); in theory, one should thus be able to use
+ websitary as pod catcher etc.
+* Cache mtimes in order to reduce disk access.
+* Special profile "__END__": The section in the script file after the
+ __END__ line. This seems useful in some situations when employing a
+ single script.
+* Don't follow javascript links.
+* New date constraint for sources:
+ :daily => true ... Once a day
+ :days_of_month => BEGIN..END ... download URL only once per month
+ within this range of days.
+ :days_of_week => BEGIN..END ... download URL only once per week
+ within this range of days.
+ :months => N (calculated on basis of the calendar month, not the
+ number of days)
+
+
+== 0.1.0 / 2007-07-16
+
+* Initial release
+
@@ -0,0 +1,88 @@
+include Makefile.config
+
+all: dbk html pdf tex text man
+
+dvi: ${BASE}.dvi
+dbk: ${BASE}.dbk
+html: ${BASE}.html
+pdf:
+ make DFLAGS="${DFLAGS} --pdf" "${BASE}.pdf"
+php: ${BASE}.php
+tex: ${BASE}.tex
+text: ${BASE}.text
+man: ${BASE}.1
+
+pdfclean: pdf cleantex
+dviclean: dvi cleantex
+
+makefile:
+ ${DEPLATE} -m makefile ${DFLAGS} ${BASE}.txt ${OTHER}
+
+website:
+ make prepare_website
+ ${DEPLATE} ${DFLAGS} ${WEBSITE_DFLAGS} ${FILE} ${OTHER}
+ echo ${WEBSITE_DIR}/${BASE}.html > .last_output
+
+%.html: %.txt
+ make prepare_html
+ ${DEPLATE} ${DFLAGS} ${HTML_DFLAGS} $< ${OTHER}
+ echo ${HTML_DIR}/$@ > .last_output
+
+%.text: %.txt
+ make prepare_text
+ ${DEPLATE} ${DFLAGS} ${TEXT_DFLAGS} $< ${OTHER}
+ echo ${TEXT_DIR}/$@ > .last_output
+
+%.php: %.txt
+ make prepare_php
+ ${DEPLATE} ${DFLAGS} ${PHP_DFLAGS} $< ${OTHER}
+ echo ${PHP_DIR}/$@ > .last_output
+
+%.dbk: %.txt
+ make prepare_dbk
+ ${DEPLATE} ${DFLAGS} ${DBK_DFLAGS} $< ${OTHER}
+ echo ${DBK_DIR}/$@ > .last_output
+
+%.tex: %.txt
+ make prepare_tex
+ ${DEPLATE} ${DFLAGS} ${TEX_DFLAGS} $< ${OTHER}
+ echo ${TEX_DIR}/$@ > .last_output
+
+%.ref: %.txt
+ make prepare_ref
+ ${DEPLATE} ${DFLAGS} ${REF_DFLAGS} -o $@ $< ${OTHER}
+ echo ${REF_DIR}/$@ > .last_output
+
+%.dvi: %.tex
+ make prepare_dvi
+ cd ${TEX_DIR}; \
+ latex ${LATEX_FLAGS} $<; \
+ bibtex ${BIBTEX_FLAGS} $*; \
+ latex ${LATEX_FLAGS} $<; \
+ latex ${LATEX_FLAGS} $<;
+ echo ${TEX_DIR}/$@ > .last_output
+
+%.pdf: %.tex
+ make prepare_pdf
+ cd ${TEX_DIR}; \
+ pdflatex ${PDFLATEX_FLAGS} $<; \
+ bibtex ${BIBTEX_FLAGS} $*; \
+ pdflatex ${PDFLATEX_FLAGS} $<; \
+ pdflatex ${PDFLATEX_FLAGS} $<
+ echo ${TEX_DIR}/$@ > .last_output
+
+%.1: %.ref
+ cd ${REF_DIR}; \
+ xmlto man $<
+ echo ${REF_DIR}/$@ > .last_output
+
+view: show
+show:
+ cygstart `cat .last_output`
+
+cleantex:
+ cd ${TEX_DIR}; \
+ rm -f *.toc *.aux *.log *.cp *.fn *.tp *.vr *.pg *.ky \
+ *.blg *.bbl *.out *.lot *.ind *.4tc *.4ct \
+ *.ilg *.idx *.idv *.lg *.xref || echo Nothing to be done!
+
@@ -0,0 +1,100 @@
+FILE=index.txt
+BASE=$(basename ${FILE})
+OTHER=
+
+DEPLATE=deplate
+SCP=scp
+# SCP=pscp
+
+ # --css deplate \
+ # -t html-tabbar-top.html
+DFLAGS=-m code-gvim -D noSwallow=1
+
+HTML_DIR=html
+HTML_PLUS=-m html-obfuscate-email -m html-deplate-button \
+ --css deplate \
+ -t html-tabbar-right.html -m navbar-png
+HTML_DFLAGS=${HTML_PLUS} -d ${HTML_DIR} -f html
+
+WEBSITE_DIR=website
+WEBSITE_DFLAGS=${HTML_PLUS} -d ${WEBSITE_DIR} -f htmlsite
+
+PHP_DIR=php
+PHP_DFLAGS=${HTML_PLUS} -d ${PHP_DIR} -f phpsite -m html-obfuscate-email
+
+TEX_DIR=tex
+TEX_DFLAGS=-d ${TEX_DIR} -f latex
+
+TEXT_DIR=plain
+TEXT_DFLAGS=-d ${TEXT_DIR} -f plain
+
+DBK_DIR=docbook
+DBK_DFLAGS=-d ${DBK_DIR} -f dbk-article
+
+REF_DIR=${DBK_DIR}
+REF_DFLAGS=-d ${REF_DIR} -f dbk-ref
+
+LATEX_FLAGS=-interaction=nonstopmode
+PDFLATEX_FLAGS=${LATEX_FLAGS}
+BIBTEX_FLAGS=
+
+copy_images=if ls *.{jpg,jpeg,png,gif} 2> /dev/null; then cp -uv *.{jpg,jpeg,png,gif} $(1); fi
+copy_css=if ls *.css 2> /dev/null; then cp -vu *.css $(1); fi
+
+.PHONY: view show cleantex website makefile pdfclean dviclean dbk html pdf tex text man prepare_website prepare_html prepare_text prepare_php prepare_dbk prepare_tex prepare_ref prepare_dvi prepare_pdf
+
+default: website
+
+manual.pdf:
+ make FILE=websitary.txt pdf
+
+manual: website manual.pdf
+
+upload:
+ ${SCP} website/* tex/websitary.pdf tlink@rubyforge.org:/var/www/gforge-projects/websitiary/
+
+docs:
+ rake docs
+ ${SCP} docs/* tlink@rubyforge.org:/var/www/gforge-projects/websitiary/websitary/
+
+prepare_website:
+ mkdir -p ${WEBSITE_DIR}
+ $(call copy_images,"${WEBSITE_DIR}")
+ $(call copy_css,"${WEBSITE_DIR}")
+
+prepare_html:
+ mkdir -p ${HTML_DIR}
+ $(call copy_images,"${HTML_DIR}")
+ $(call copy_css,"${HTML_DIR}")
+
+prepare_text:
+ mkdir -p ${TEXT_DIR}
+
+prepare_php:
+ mkdir -p ${PHP_DIR}
+ $(call copy_images,"${PHP_DIR}")
+ $(call copy_css,"${PHP_DIR}")
+
+prepare_dbk:
+ mkdir -p ${DBK_DIR}cp -uv *.{jpg,jpeg,png,gif} $(1) || echo ... ignore errorcp -uv *.{jpg,jpeg,png,gif} $(1) || echo ... ignore error
+ $(call copy_images,"${DBG_DIR}")
+
+prepare_tex:
+ mkdir -p ${TEX_DIR}
+ $(call copy_images,"${TEX_DIR}")
+
+prepare_ref:
+ mkdir -p ${REF_DIR}
+
+prepare_dvi:
+
+prepare_pdf:
+
+ctags:
+ rm tags
+ ctags -R bin lib
+
+files:
+ find bin lib -name "*.rb" > files.lst
+
+# vi: ft=make:tw=72:ts=4
Oops, something went wrong.

0 comments on commit 3e248db

Please sign in to comment.