Permalink
Browse files

Fix some of rdoc's HTML worst typesetting issues:

* The ' character will be converted into an apostrophe, opening single quote,
  or closing single quote correctly in most cases.
* The " character will be converted into an opening double quote or a closing
  double quote in most cases.

Note, however, that tons of issues with HTML typesetting remain.  Fixing these
properly will require a rewrite of the markup engine, which I will look at next.
This is a short term fix intended to ameliorate the worst of the issues.
  • Loading branch information...
1 parent 05fb19b commit 272eee1a6bf0c964884f6d0ac74c715873158ca4 designingpatts committed Aug 12, 2008
View
5 History.txt
@@ -12,6 +12,11 @@
* Fixed main page for frameless template. Patch by Marcin Raczkowski.
* Fixed missing stylesheet in generated chm. Patch by Gordon Thiesfeld.
* Fixed the parsing of module names starting with '::'.
+ * Fixed some (but not all!) of the issues with RDoc's HTML typesetting:
+ ** RDoc now correctly converts ' characters to apostrophes, opening single
+ quotes, and closing single quotes in most cases (smart single quotes).
+ ** RDoc now correctly converts " characters to opening double quotes and
+ and closing double quotes in most cases (smart double quotes).
=== 2.1.0 / 2008-07-20
View
4 lib/rdoc/markup/to_html.rb
@@ -318,10 +318,10 @@ def convert_string_fancy(item)
gsub(/'/, '‘').
# convert double closing quote
- gsub(%r{([^ \t\r\n\[\{\(])\'(?=\W)}, '\1”'). # }
+ gsub(%r{([^ \t\r\n\[\{\(])\"(?=\W)}, '\1”'). # }
# convert double opening quote
- gsub(/'/, '“').
+ gsub(/"/, '“').
# convert copyright
gsub(/\(c\)/, '©').
View
64 lib/rdoc/markup/to_html_crossref.rb
@@ -25,6 +25,43 @@ class RDoc::Markup::ToHtmlCrossref < RDoc::Markup::ToHtml
CLASS_REGEXP_STR = '\\\\?((?:\:{2})?[A-Za-z]\w*(?:\:\:\w+)*)'
METHOD_REGEXP_STR = '(\w+[!?=]?)(?:\([\.\w+\*\/\+\-\=\<\>]*\))?'
+ # Regular expressions matching text that should potentially have
+ # cross-reference links generated are passed to add_special.
+ # Note that these expressions are meant to pick up text for which
+ # cross-references have been suppressed, since the suppression
+ # characters are removed by the code that is triggered.
+ CROSSREF_REGEXP = /(
+ # A::B::C.meth
+ #{CLASS_REGEXP_STR}[\.\#]#{METHOD_REGEXP_STR}
+
+ # Stand-alone method (proceeded by a #)
+ | \\?\##{METHOD_REGEXP_STR}
+
+ # A::B::C
+ # The stuff after CLASS_REGEXP_STR is a
+ # nasty hack. CLASS_REGEXP_STR unfortunately matches
+ # words like dog and cat (these are legal "class"
+ # names in Fortran 95). When a word is flagged as a
+ # potential cross-reference, limitations in the markup
+ # engine suppress other processing, such as typesetting.
+ # This is particularly noticeable for contractions.
+ # In order that words like "can't" not
+ # be flagged as potential cross-references, only
+ # flag potential class cross-references if the character
+ # after the cross-referece is a space or sentence
+ # punctuation.
+ | #{CLASS_REGEXP_STR}(?=[\s\)\.\?\!\,\;]|\z)
+
+ # Things that look like filenames
+ # The key thing is that there must be at least
+ # one special character (period, slash, or
+ # underscore).
+ | \w+[_\/\.][\w\/\.]+
+
+ # Things that have markup suppressed
+ | \\[^\s]
+ )/x
+
##
# We need to record the html path of our caller so we can generate
# correct relative paths for any hyperlinks that we find
@@ -33,32 +70,7 @@ def initialize(from_path, context, show_hash)
raise ArgumentError, 'from_path cannot be nil' if from_path.nil?
super()
- # Regular expressions matching text that should potentially have
- # cross-reference links generated are passed to add_special.
- # Note that these expressions are meant to pick up text for which
- # cross-references have been suppressed, since the suppression
- # characters are removed by the code that is triggered.
-
- @markup.add_special(/(
- # A::B::C.meth
- #{CLASS_REGEXP_STR}[\.\#]#{METHOD_REGEXP_STR}
-
- # Stand-alone method (proceeded by a #)
- | \\?\##{METHOD_REGEXP_STR}
-
- # A::B::C
- | #{CLASS_REGEXP_STR}
-
- # Things that look like filenames
- # The key thing is that there must be at least
- # one special character (period, slash, or
- # underscore).
- | \w*[_\/\.][\w\/\.]*
-
- # Things that have markup suppressed
- | \\[^\s]
- )/x,
- :CROSSREF)
+ @markup.add_special(CROSSREF_REGEXP, :CROSSREF)
@from_path = from_path
@context = context
View
28 test/test_rdoc_markup_attribute_manager.rb
@@ -1,5 +1,6 @@
require "test/unit"
require "rdoc/markup/inline"
+require "rdoc/markup/to_html_crossref"
class TestRDocMarkupAttributeManager < Test::Unit::TestCase
@@ -201,24 +202,23 @@ def test_protect
end
def test_special
- # class names, variable names, file names, or instance variables
- @am.add_special(/(
- \b([A-Z]\w+(::\w+)*)
- | \#\w+[!?=]?
- | \b\w+([_\/\.]+\w+)+[!?=]?
- )/x,
- :CROSSREF)
+ @am.add_special(RDoc::Markup::ToHtmlCrossref::CROSSREF_REGEXP, :CROSSREF)
- assert_equal(["cat"], @am.flow("cat"))
+ #
+ # The apostrophes in "cats'" and "dogs'" suppress the flagging of these
+ # words as potential cross-references, which is necessary for the unit
+ # tests. Unfortunately, the markup engine right now does not actually
+ # check whether a cross-reference is valid before flagging it.
+ #
+ assert_equal(["cats'"], @am.flow("cats'"))
- assert_equal(["cat ", crossref("#fred"), " dog"].flatten,
- @am.flow("cat #fred dog"))
+ assert_equal(["cats' ", crossref("#fred"), " dogs'"].flatten,
+ @am.flow("cats' #fred dogs'"))
- assert_equal([crossref("#fred"), " dog"].flatten,
- @am.flow("#fred dog"))
+ assert_equal([crossref("#fred"), " dogs'"].flatten,
+ @am.flow("#fred dogs'"))
- assert_equal(["cat ", crossref("#fred")].flatten, @am.flow("cat #fred"))
+ assert_equal(["cats' ", crossref("#fred")].flatten, @am.flow("cats' #fred"))
end
end
-
View
16 test/test_rdoc_markup_to_html.rb
@@ -10,11 +10,23 @@ def setup
end
def test_tt_formatting
- assert_equal "<p>\n<tt>--</tt> &#8212; <tt>(c)</tt> &#169;\n</p>\n",
- util_format("<tt>--</tt> -- <tt>(c)</tt> (c)")
+ assert_equal "<p>\n<tt>--</tt> &#8212; <tt>cats'</tt> cats&#8217;\n</p>\n",
+ util_format("<tt>--</tt> -- <tt>cats'</tt> cats'")
assert_equal "<p>\n<b>&#8212;</b>\n</p>\n", util_format("<b>--</b>")
end
+ def test_convert_string_fancy
+ #
+ # The HTML typesetting is broken in a number of ways, but I have fixed
+ # the most glaring issues for single and double quotes. Note that
+ # "strange" symbols (periods or dashes) need to be at the end of the
+ # test case strings in order to suppress cross-references.
+ #
+ assert_equal "<p>\n&#8220;cats&#8221;.\n</p>\n", util_format("\"cats\".")
+ assert_equal "<p>\n&#8216;cats&#8217;.\n</p>\n", util_format("\'cats\'.")
+ assert_equal "<p>\ncat&#8217;s-\n</p>\n", util_format("cat\'s-")
+ end
+
def util_fragment(text)
RDoc::Markup::Fragment.new 0, nil, nil, text
end
View
4 test/test_rdoc_markup_to_html_crossref.rb
@@ -160,8 +160,8 @@ def verify_invariant_crossrefs(xref)
# The hyphen character is not a valid class/method separator character, so
# rdoc just generates a class cross-reference (perhaps it should not
# generate anything?).
- result = "<a href=\"../classes/Ref_Class2/Ref_Class3.html\">Ref_Class2::Ref_Class3</a>-method(*)"
- verify_convert xref, "Ref_Class2::Ref_Class3-method(*)", result
+ result = "<a href=\"../classes/Ref_Class2/Ref_Class3.html\">Ref_Class2::Ref_Class3</a>;method(*)"
+ verify_convert xref, "Ref_Class2::Ref_Class3;method(*)", result
# There is one Ref_Class3 nested in Ref_Class2 and one defined in the
# top-level namespace; regardless, ::Ref_Class3 (Ref_Class3 relative

0 comments on commit 272eee1

Please sign in to comment.