Skip to content

Commit

Permalink
replace reference to Cygwin by MSYS2, add iconv(sub = "c99")
Browse files Browse the repository at this point in the history
git-svn-id: https://svn.r-project.org/R/trunk@81786 00db46b3-68df-0310-9c12-caf00c1e9a41
  • Loading branch information
ripley committed Feb 21, 2022
1 parent 8bb530f commit f19b4ae
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 9 deletions.
3 changes: 3 additions & 0 deletions doc/NEWS.Rd
Expand Up @@ -212,6 +212,9 @@
\item The return value from \code{ks.test()} now has class
\code{c("ks.test", "htest")} -- packages using \code{try()} need
to take care to use \code{inherits()} and not \code{==} on the class.
\item \code{iconv} now allows \code{sub = "c99"} to use C99-style
escapes on UTF-8 inputs which cannot be converted to \code{to}.
}
}
Expand Down
13 changes: 8 additions & 5 deletions src/library/base/man/iconv.Rd
@@ -1,6 +1,6 @@
% File src/library/base/man/iconv.Rd
% Part of the R package, https://www.R-project.org
% Copyright 1995-2021 R Core Team
% Copyright 1995-2022 R Core Team
% Distributed under GPL 2 or later

\name{iconv}
Expand Down Expand Up @@ -28,7 +28,9 @@ iconvlist()
any non-convertible bytes in the input. (This would normally be a
single character, but can be more.) If \code{"byte"}, the indication is
\code{"<xx>"} with the hex code of the byte. If \code{"Unicode"}
and converting from UTF-8, the Unicode point in the form \code{"<U+xxxx>"}.}
and converting from UTF-8, the Unicode point in the form
\code{"<U+xxxx>"}, or if \code{c99}, a C99-style escape
\code{"\\uxxxx"}.}
\item{mark}{logical, for expert use. Should encodings be marked?}
\item{toRaw}{logical. Should a list of raw vectors be returned rather
than a character vector?}
Expand Down Expand Up @@ -67,14 +69,14 @@ iconvlist()
validity checking and will often mis-convert inputs which are invalid
in encoding \code{from}.

If \code{sub = "Unicode"} is used for a non-UTF-8 input it is the same
as \code{sub = "byte"}.
If \code{sub = "Unicode"} or \code{sub = "c99"} is used for a
non-UTF-8 input it is the same as \code{sub = "byte"}.
}

\section{Implementation Details}{
There are three main implementations of \code{iconv} in use. Linux's
most common C runtime, \samp{glibc}, contains one. Several platforms
supply GNU \samp{libiconv}, including macOS, FreeBSD and Cygwin, in
supply GNU \samp{libiconv}, including macOS and FreeBSD, in
some cases with additional encodings. On Windows we use a version of
Yukihiro Nakadaira's \samp{win_iconv}, which is based on Windows'
codepages. (We have added many encoding names for compatibility with
Expand Down Expand Up @@ -187,6 +189,7 @@ iconv(x, "latin1", "ASCII", "?") # "fa?ile"
iconv(x, "latin1", "ASCII", "") # "faile"
iconv(x, "latin1", "ASCII", "byte") # "fa<e7>ile"
iconv(xx, "UTF-8", "ASCII", "Unicode") # "fa<U+00E7>ile"
iconv(xx, "UTF-8", "ASCII", "c99") # "fa\u00E7ile"

## Extracts from old R help files (they are nowadays in UTF-8)
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Expand Down
7 changes: 4 additions & 3 deletions src/library/utils/man/download.file.Rd
@@ -1,6 +1,6 @@
% File src/library/utils/man/download.file.Rd
% Part of the R package, https://www.R-project.org
% Copyright 1995-2021 R Core Team
% Copyright 1995-2022 R Core Team
% Distributed under GPL 2 or later

\name{download.file}
Expand Down Expand Up @@ -183,10 +183,11 @@ download.file(url, destfile, method, quiet = FALSE, mode = "w",

\command{wget} (\url{https://www.gnu.org/software/wget/}) is commonly
installed on Unix-alikes (but not macOS). Windows binaries are
available from Cygwin, gnuwin32 and elsewhere.
available from MSYS2 and elsewhere.

\command{curl} (\url{https://curl.se/}) is installed on macOS and
commonly on Unix-alikes. Windows binaries are available at that URL.
increasingly commonly on Unix-alikes. Windows binaries are available
at that URL.
}
\section{Setting Proxies}{
For the Windows-only method \code{"wininet"}, the \sQuote{Internet
Expand Down
30 changes: 29 additions & 1 deletion src/main/sysutils.c
Expand Up @@ -727,7 +727,35 @@ SEXP attribute_hidden do_iconv(SEXP call, SEXP op, SEXP args, SEXP env)
}
}
goto next_char;
} else if(strcmp(sub, "byte") == 0) {
} else if(fromUTF8 && streql(sub, "c99")) {
if(outb < 11) {
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
goto top_of_loop;
}
wchar_t wc;
ssize_t clen = utf8toucs(&wc, inbuf);
if(clen > 0 && inb >= clen) {
R_wchar_t ucs;
if (IS_HIGH_SURROGATE(wc))
ucs = utf8toucs32(wc, inbuf);
else
ucs = (R_wchar_t) wc;
inbuf += clen; inb -= clen;
if(ucs < 65536) {
// gcc 7 objects to this with unsigned int
snprintf(outbuf, 7, "\\u%04x", (unsigned short) ucs);
outbuf += 6; outb -= 6;
} else {
/* R_wchar_t is unsigned int on Windows,
otherwise wchar_t (usually int).
In any case Unicode points <= 0x10FFFF
*/
snprintf(outbuf, 11, "\\u%08x", (unsigned int) ucs);
outbuf += 10; outb -= 10;
}
}
goto next_char;
} else if(strcmp(sub, "byte") == 0) {
if(outb < 5) {
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
goto top_of_loop;
Expand Down

0 comments on commit f19b4ae

Please sign in to comment.