Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Commits on Feb 11, 2015
  1. Increment to v3.29_4.

  2. Timestamp v3.29_3.

  3. Sync version.

Commits on Feb 10, 2015
  1. @khwilliamson

    Fix cp1252 for early Perls

    khwilliamson authored
    Perls prior to v5.8 did not work well with UTF-8.  This presents
    problems for the CP1252 character encoding, as some of its code points,
    when converted to Unicode, require UTF-8 to represent.  This patch
    instead uses ASCII approximations for them.  Prior to this patch, they
    were left alone, which would show up as C1 control characters.
    On Perls that do work with UTF-8, the code points are still properly
    converted to their Unicode equivalents.
Commits on Feb 9, 2015
  1. @khwilliamson

    encod04.t: Skip a test on early Perls

    khwilliamson authored
    This test's passing relies on a function that isn't available in v5.6
Commits on Feb 8, 2015
  1. @khwilliamson

    t/encod04.t: Fix so will run on Perls < v5.8

    khwilliamson authored
    This unconditionally called a function not available until part way
    through v5.7.  This is now avoided for early versions
  2. @khwilliamson

    t/ Don't crash on early Perl EBCDICs

    khwilliamson authored
    These would otherwise call an undefined function.  Just return instead
    of doing that, which leads to incorrect results, but it's better than
Commits on Feb 5, 2015
  1. @khwilliamson Fix inadvertently modified regex

    khwilliamson authored
    I had trouble applying the patch to make CP1252 the default; in trying
    to do work around that manually, I inadvertently overwrote this recent
    change.  Given that I had trouble, I should have tested before
    submitting the patch, and hopefully will learn my lesson from this.
  2. @khwilliamson Fix bad regex pattern

    khwilliamson authored
    The :^ascii: should be part of a bracketed character class.  I missed
    this in code review.  There is code in regcomp.c to warn on something
    like this, but it didn't get triggered, I'll look into that.  And I
    didn't add a test for this.  It's not critical if such characters don't
    get dropped.
  3. @khwilliamson

    Default to CP1252 instead of ISO 8859-1

    khwilliamson authored
    When there is no =encoding line and the file isn't UTF-8, the encoding
    is now presumed to be CP 1252 instead of Latin1.
    This was discussed in pod-people starting with
    I will submit a patch to perlpodspec once this is out.
  4. @khwilliamson

    Bump VERSION to 3.29_3

    khwilliamson authored
  5. @khwilliamson

    Clarify docs

    khwilliamson authored
  6. @khwilliamson
  7. @khwilliamson

    encod04.t: White-space only

    khwilliamson authored
    This properly indents blocks newly formed by the previous commit
  8. @khwilliamson
  9. @khwilliamson

    encod04.t: Fix-up two tests

    khwilliamson authored
    One test has been failing because it was testing that illegal UTF-8 was
    considered to be UTF-8.  This commit fixes that.
    The other test is made a TODO.  It is passed genuninely ambiguous text
    that could either be CP1252 or UTF-8.  This commit makes the text passed
    actually more plausible than previously.  The fact that it was hard to
    get a plausible example gives me hope that real-world examples will be
    quite unlikely to be guessed wrong.  The first byte must be between C2
    and DF, otherwise it would be a 3 byte sequence in UTF-8, and even
    harder to find a likely CP1252 equivalent sequence.  That means that the
    first byte is one of 1) an uppercase accented character, 2) the
    multiplication sign, or 3) the German sharp s 'ß'.  The second byte is
    in the range 80 to 9F.  Most of these in CP1252 are various punctuation
    characters or symbols such as a dagger.  These are mostly unlikely to
    immediately follow an uppercase letter, multiplcation sign, or the sharp
    s.  One that could is a right single quote used as an apostrophe in
    English.  But there are no accents in English except in borrowed words.
    Since it must be a capital, it's likely the whole word is in caps, like
    in a heading.  I came up with what looks like "JOSÉ'S" in CP1252, which
    looks like legal UTF-8 as well.
  10. @khwilliamson

    Generalize XHTML name detection for non-ASCII platforms

    khwilliamson authored
    This commit takes two identical regular expression patterns and makes
    them into a single qr//.  And it rewrites the revised one so it is
    platform-independent on sufficiently modern Perls.
    I think the pattern is wrong to exclude the digit '9', but I don't have
    time now to develop the expertise to delve into it, so am leaving it
    as-is.  I compiled the two versions under -Dr (one using hard-coded
    characters, and the other using [:posix:] classes) to verify that the
    new one generates the exact same code points as the original on ASCII
  11. @khwilliamson

    corpus.t: Skip on EBCDIC

    khwilliamson authored
    Until Encode is fixed to work on EBCDIC, this can't.
  12. @khwilliamson

    corpus.t: Allow to work on platforms without -u diff option

    khwilliamson authored
    This whole thing probably should be fixed to not call 'diff' at all, but
    for now, there is no real need for the '-u' option to diff, and some
    platforms don't have that option, so just remove it.
  13. @khwilliamson
  14. @khwilliamson EBCDIC enhancement

    khwilliamson authored
    Prior to this commit weird characters were dropped on ASCII platforms
    but not EBCDIC.  Now, on Perls of at least v5.6, they are dropped on
    EBCDIC platforms as well.
  15. @khwilliamson

    Fix encoding guessing to work on EBCDIC platforms

    khwilliamson authored
    When no =encoding line is present, the encoding is checked to see if it
    is UTF-8, and if not, currently ISO 8859-1 is chosen instead.  This
    wasn't working well on EBCDIC platforms prior to this commit.
    It is planned to change things so that CP 1252 is chosen instead of
    8859-1, and this code will have to be revised to handle that, but in
    case that doesn't work out, this commit can be fallen back to.
  16. @khwilliamson

    Fix some escapes to work on non-ASCII platforms

    khwilliamson authored
    This same code is repeated in multiple places.  I chose to not
    consolidate it.  The comments indicate that it was known it would work
    only on ASCII, but since v5.8, there is the capability to make it easily
    working on non-ASCII as well, using the translation functions available
    starting in that release
  17. @khwilliamson Generalize BOM handling for non-ASCII platforms

    khwilliamson authored
    For Perls starting in v5.8, this allows BOM detection on all platforms
  18. @khwilliamson

    Generalize NBSP and SHY handling for non-ASCII platforms

    khwilliamson authored
    The No-Break Space and Soft Hyphen are used in 6 modules.  This
    generalizes so they can be handled fully on non-ASCII platforms.  A
    recent patch had already fixed this this for one area of code, but it
    turns out that they are used in more than one place.  In most of those
    places, they were handled somewhat gracefully for non-ASCII platforms,
    but this patch makes them work completely correctly.
    I used global scalar variables in the base module to store what the
    native characters are for these code points, as the calculation of
    what they should be is not obvious, and so should be done in a single
    place.  An unlikely pitfall is that these scalars are not read-only; I
    suppose a subroutine could be used instead, I suppose, but I thought
    that this was adequate.
  19. @khwilliamson

    Generalize the t/search* fcns for non-ASCII platforms

    khwilliamson authored
    These tests fail on EBCDIC platforms because the expected sort order is
    hard-coded.  This introduces a helper .pl file which contains two
    functions to make the sort order come out ASCII (hence to the expected
    value) no matter what the current platform's character set is.
  20. @khwilliamson Fix debug statement

    khwilliamson authored
    This was printing out the wrong variable
  21. @khwilliamson

    xhtml01.t: Generalize for non-ASCII platforms

    khwilliamson authored
    Instead of hard-coding the ordinal of 'T', use ord("T")
  22. @khwilliamson

    Bump VERSION to 3.29_2

    khwilliamson authored
Commits on Feb 2, 2015
  1. Note change to find().

  2. Sync @INC handling between find() and survey().

    They were slightly different. So have fine() use the same method for managing
    search directories as that used by survey().
Commits on Jan 19, 2015
  1. Credit @rwstauner.

  2. @rwstauner
Commits on Jan 15, 2015
  1. Work around floating-point precision issue.

    When comparing version numbers, that is. Addresses a failure on 5.6, which
    is 32-bit on my box. Thanks to Karl Williamson for the fix.
Something went wrong with that request. Please try again.