/
TASKS
1828 lines (1646 loc) · 65.8 KB
/
TASKS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Note: This file hasn't been updated in quite a while. We'll be looking into
*real* bug-tracking systems which should make it obsolete. -pd
THE R TASK LIST
``Somebody, somebody has to, you see ...''
The Cat in the Hat Comes Back.
----------------------------------------------------------------------
TASK: Multiple Graphics Device Drivers
STATUS: Open
FROM: Everyone
R needs to have multiple active device drivers and a means for
copying pictures from one device to another, etc. etc.
[ This is a medium-sized task. It would be most useful to ]
[ do this in conjunction with moving to an event driven model. ]
[ Greg Warnes has written some code which maintains, a device ]
[ "display list". How much memory this might devour in the ]
[ multiple device case is an open question. There is also ]
[ the question of what to do about the graphics parameters. ]
[ Should each device maintain a complete "par" state, or ]
[ should some parameters (like col, lty, font ...) be global. ]
[ Could a user have any memory of the last values in effect ]
[ for a driver which had been idle for a while. ]
[ This is just about to hit the top of the list. ]
----------------------------------------------------------------------
TASK: complex gamma and log gamma function not implemented
STATUS: Open
FROM: R@stat.auckland.ac.nz
[ This is quite low priority. Complain if you need it. ]
[ The Fullerton library has complex gamma function code. ]
----------------------------------------------------------------------
TASK: solution of complex linear systems
STATUS: Open
FROM: R@stat.auckland.ac.nz
[ Really just a matter of grabbing the correct linpack code. ]
[ How general do we want to be here ... ]
----------------------------------------------------------------------
TASK: "nlm" documentation inaccuracies
STATUS: Open
FROM: jlindsey@luc.ac.be
The help for nlm is still called minimize although the
contents have been updated. As well, when an illegal
value is fed to nlm, the error message contains msg
instead of print.level.
[ The documentation looks ok. The function needs to be ]
[ rewritten so that it uses derivative information. ]
----------------------------------------------------------------------
TASK: "data.entry" problems
STATUS: Open
FROM: p.dalgaard@kubism.ku.dk
the as.character problem in de() - probably better to fix even
though it does make lists out of frames.
there's no way to change a data value to NA in data.entry, etc.
... earlier message ...
(Peter Dalgaard) data.entry et al do not seem to have been
adjusted for the new data frame structure. This is actually
a problem where a list is passed where a vector of character
strings is expected. To fix it change
snames <- substitute(list(...))[-1]
to
snames <- as.character(substitute(list(...))[-1])
However, there needs to be a look at the de... code. When
a data frame is edited it is returned as a list. This can
be cured with judicious use of "data.frame".
[ The indicated change has been made, but other changes ]
[ are needed. ]
----------------------------------------------------------------------
TASK: "x11" printcmd
STATUS: Open
FROM: maechler@stat.math.ethz.ch
There is in theory a "printcmd" argument to x11, which
is ignored. Make it do something.
----------------------------------------------------------------------
TASK: "source" requires a terminating newline on EOF
STATUS: Open
FROM: Kurt.Hornik@ci.tuwien.ac.at
source() fails in many cases where a file has no final
newline. (R&R, sorry for being ridiculouly nasty about
things that don't work for files without a final newline.
I have Emacs' next-line-add-newlines set to nil ...)
This seems to be a problem with parse() in src/main/source.c
in combo with the code in gram.y ...
I know this is NOT something to quickly fix over the weekend.
Please simply put it into your PROJECTS file.
[ This is actually a syntax error according to the R grammar ]
[ but maybe we can do something. ]
----------------------------------------------------------------------
TASK: help file ALIAS() and LINK() constructions
STATUS: Closed
FROM: R@stat.auckland.ac.nz
How do we know which file to LINK to? There needs to a step
which fills in the file name on the basis of all ALIAS
declarations.
[ A proprocessing step is needed. First we build a table ]
[ of aliases and corresponding file names. Then we pass ]
[ throught the files building the correct LINK references. ]
[ The new Rdconv and build-html... solve `everything' ]
----------------------------------------------------------------------
TASK: "paste" problem
STATUS: Closed
FROM: maechler@stat.math.ethz.ch
in S,
paste(....., collapse = string)
always returns ONE string (a character vector of length 1),
according to documentation and several examples.
in R, this is not true:
R> paste(rep(" ",0), collapse="...") #anything for collapse
character(0)
S> paste(rep(" ",0), collapse="...") #anything for collapse
[1] ""
Again, I think R is more logical than S here, but it was decided
that in minor cases, compatibility comes first...
[ We now return "" in the zero length case. ]
----------------------------------------------------------------------
TASK: missing functionality - modelling
STATUS: Open
FROM: maechler@stat.math.ethz.ch
aov, print.aov, summary.aov,... (!)
which I really missed for teaching a few months ago.
[ We'll get to this - it actually should be fun. ]
----------------------------------------------------------------------
TASK: warnings option
STATUS: Open
FROM: maechler@stat.math.ethz.ch
which reminds me that we/I also would like something similar as S's
options(warn = k)
k= 0 : [default] print warnings
k= -1 : do nothing (maybe append warnings to some temp-file)
k= 1 : produce an error ('warning' becomes 'stop').
----------------------------------------------------------------------
TASK: R has no stderr
STATUS: Open
FROM: Friedrich.Leisch@ci.tuwien.ac.at
When I invoke R like
R 2>errlog
I would error messages expect to go to the file errlog
instead of the screen.
[ We don't have standard error. This is problematic on ]
[ platforms other than Unix.
----------------------------------------------------------------------
TASK: "print.default" fix
STATUS: Open
FROM: la-jassine@aix.pacwan.net
When you fix print.default, please also add prefix=
----------------------------------------------------------------------
TASK: "print.default" fix
STATUS: Open
FROM: jlindsey@luc.ac.be
print.default in S has an option, right=T, but R does not
----------------------------------------------------------------------
TASK: "postscript" fix
STATUS: Open
FROM: la-jassine@aix.pacwan.net
postscript() also needs the options onefile, print.it, and
append (even if they are not supported yet it would be nice if
the arguments could be accepted and ignored).
[ I added these as arguments, but they have no effect. ]
----------------------------------------------------------------------
TASK: task scheduling
STATUS: Open
FROM: gwhite@cabot.bio.dfo.ca
More generally, the range of things that can be done in R would
be greater if there was a simple scheduling mechanism. Is
there a way to have a specific function invoked just before the
command prompt returns after a function? Such a function could
be used to run save(...) or check for various external cues
(update of a file's timestamp) to control an analysis.
I doubt it would make sense to have full context switching in
R, but perhaps save() could be done in a way that would allow
it to be used even in a long calculation under some timer
control. I expect the user would need to provide a list of the
data objects that need to be saved.
----------------------------------------------------------------------
TASK: Inf numerics
STATUS: Closed
FROM: plummer@iarc.fr
Could we have an Inf object in R? I would find it useful.
[ On systems using IEEE arithmetic, the builtin Inf and NaN ]
[ values are recognised and used as of 0.62. ]
----------------------------------------------------------------------
TASK: Auto-save
STATUS: Closed
FROM: <p.dalgaard@kubism.ku.dk> <hornik@ci.tuwien.ac.at>
> BTW: How about putting auto-save-workspace on the task list?
> Or just a manual save.work() currently, you can lose quite a
> bit of work to an unexpected segfault. (And q()+restart is
> cumbersome, esp. if you need to reattach subsetted dataframes,
> etc.)
Perhaps call it save.image() instead and use
save(list = ls(), file = ".RData")
as was suggested some time ago?
(Whatever the result is, it needs to go in the FAQ, which goes
into great length about that under R data can get lost when a
crash occurs, but does not say how to save them ...)
[ Added save.image() as above. And yes, it's been in the FAQ ]
[ for quite some time now ... ]
----------------------------------------------------------------------
TASK: "chisquare.test" problem
STATUS: Closed
FROM: <venkat@biosta.mskcc.org>
Can you change the explicit "cat" statement in the
"chisquare.test" function which insists on writing to the
screen even when the output is redirected to a variable? (Using
"htest" class as in "t.test" function.)
[ Replaced by chisq.test(), formerly in the ctest package, ]
[ which properly returns an object of class "htest". ]
----------------------------------------------------------------------
TASK: Graphics inconsistencies
STATUS: Closed
FROM: Bill.Venables@adelaide.edu.au
While transferring some old S-code I came across some minor
inconsistencies between R and S that are probably more nuisance
value than they would take to fix. I report them here for
reference, (but not in any campaigning mood, of course...)
1. No frame() command in R and so no graceful way to clear a
plotting screen. (Or is there?)
[ Added ]
2. There is a dev.off() function, but no other dev.xxx functions.
(The dev.xxx group are S-PLUS and not vanilla S, by the way.)
There is no graphics.off() function.
[ Added in 0.62. ]
3. If dfr is a data frame with components "x", "y" and some
others then points(dfr) uses dfr as an xy-list in S but not in
R. If there is some non-numeric component it actually fails
in R. This may be S being a bit inconsistent, but the
behaviour is different.
[ Fixed? ]
4. The plotting marks are a bit gappy in R and even the ones
that are there do not correspond to their S counterparts.
Here is a little function to make a wall chart showing the
gaps:
[ We now have all the S symbols and a new set of R ones. ]
show.marks <- function()
{
if(!exists(".Device") || is.null(.Device)) x11()
plot(1, type="n", axes=F, xlab="", ylab="")
oldpar <- par()
par(usr = c(-0.01, 5.01, -0.01, 5.01), pty = "s")
for(i in 0:18) {
x <- 1/2 + (i %% 5)
y <- 4.5 - (1/2 + (i %/% 5))
points(x + 1/5, y - 1/5, pch = i, cex = 3)
text(x - 1/5, y + 1/5, i, adj = 0.5, cex = 1.5)
}
abline(h = 1:5 - 0.5, lty = 1)
segments(0:5, rep(0.5, 5), 0:5, rep(4.5, 5))
par(oldpar)
invisible()
}
5. In S you may extend a list by assigning to a new component.
For example if lis has components "x" and "y", only, you can
extend it by assigning to lis$z, lis["z"] or lis[, "z] (the
last if it is also a data frame). In R only the first of
these works; the others give a "subscript out of bounds"
error. (This may have been discussed while I was not paying
attention, in which case I apologize.)
[ Fixed in 0.50. ]
----------------------------------------------------------------------
TASK: Function pointer access
STATUS: Open
FROM: <schwarte@feat.mathematik.uni-essen.de>
I want to report two problems with the Fortran code of R.
1) Configure does not adapt GETSYMBOLS.in if the Fortran Compiler
does not add underscores to the symbol names.
2) There is a name conflict if the Fortran Compiler does not add
underscores because there exist a Fortran function FMIN and a
C function fmin(). Thus the name of the Fortran FMIN should be
changed.
[ This is fixed I think. ]
Currently I am rewriting my robust location-scale code in C. I
intend to make this new code available as a library once a
standard for such libraries has been agreed upon. As I would
like to allow prospective users to experiment with private
psi/chi functions I need access to the hash table of available
function pointers. Is it possible that you insert a function
into dotcode.c that contains the code fragment form lines 482
to 495 and returns a function pointer?
----------------------------------------------------------------------
TASK: Partial string matching
STATUS: Open
FROM: <R@stat.auckland.ac.nz>
Is there an existing partial string match function which could
be used in place of pstrmatch in subset.c???
If not can pstrmatch take on the functions of all partial match
functions?
----------------------------------------------------------------------
Post 0.49 Additions
----------------------------------------------------------------------
TASK: Name Attributes on Calls
STATUS: Closed (almost)
FROM: <p.dalgaard@kubism.ku.dk>
A call with tagged arguments is something like a list, the tags
can be used to access elements, but the names attribute is absent,
until the call is coerced to a list. (Attempting to set the names()
causes evaluation. Changing "list" to "blipblop" causes an 'Error:
couldn't find function "blipblop"' at that point.)
> j<-substitute(list(a=1, b=2))
> j
list(a = 1, b = 2)
> j$b
[1] 2
> names(j)
NULL
> names(j)<-NULL
> j
[[1]]
[1] 1
[[2]]
[1] 2
[At least under SunOS this is fixed. RG]
[However, 'names(j) <- NULL' has no effect in R, but does in S. MM]
----------------------------------------------------------------------
TASK: String NAs Via the Back Door.
STATUS: Open
FROM: <p.dalgaard@kubism.ku.dk>
Ok, the right solution seems to be names(as.list(j)), but then we run
into some other fun with NA's... Shouldn't the real NA print without
quotes?
> ch[1]<-paste("N","A",sep="")
> is.na(ch)
[1] FALSE FALSE FALSE
> ch
[1] "NA" "a" "b"
> ch[1]=="NA"
[1] TRUE
> ch[1]<-"NA"
> is.na(ch)
[1] TRUE FALSE FALSE
[ We need a real NA. At present there is confusion between ]
[ the string "NA" and the NA value for strings. One solution ]
[ would be to use R_NilValue to indicate the missing string ]
[ value, and let NA be just an ordinary string in all cases. ]
[ This would be incompatible with S, but still an improvement. ]
----------------------------------------------------------------------
TASK: Directory Structure
STATUS: Closed
FROM: <Kurt.Hornik@ci.tuwien.ac.at> + Friedrich + Paul Gilbert
> Regarding the location of data for libraries it might be easier if
> everything for one library is included in one subdirectory. At least
> it would certainly be easier to clean-up, which I like to do every few
> years. Thus the code file, data, and any compiled code would be in
> one subdirectory under $RHOME/library.
Like
library/<section>/
library/<section>/data
library/<section>/exec (scripts and or binaries which
only make sense for the add-on)
library/<section>/funs
library/<section>/help
library/<section>/html
library/<section>/objs (*.so)
???
> I realize this means a small change to the way libraries are now
> found, but in the end I think it would be much cleaner.
I think the changes would not be too hard, and we need to do something
about the directory structure anyway.
Actually, I think if R&R ok'ed something like that, Fritz and I would
take a look.
(In a way, I NEED to do something like that anyway, because I promised
it for making an official Debian package ...)
Would it mean that we also employ the S library/section concept?
----------------------------------------------------------------------
TASK: Startup Processing
STATUS: Open
FROM: <p.dalgaard@kubism.ku.dk>
The x11() window can be a nuisance to have popping up at startup (esp.
on small screens) when you're not working with graphics. However,
currently you can't get rid of it without modifying the systemwide
Rprofile.
Current logic is:
Run $RHOME/library/Rprofile
if ./.Rprofile exists
run it
else if $HOME/.Rprofile exists
run that
endif
I think it should be
Run $RHOME/library/Rsetup
if ./.Rprofile exists
run it
else if $HOME/.Rprofile exists
run that
else if $RHOME/library/Rprofile exists
run that
endif
i.e. essential system initialisation goes in Rsetup, the rest in
Rprofile, which can be overridden by the user. Currently, the line
if(interactive()) x11()
is the candidate to move from one to the other. BTW, it really should read
if(interactive() && getenv("DISPLAY")!="") x11()
[BTW2: getenv() implemented using system()? is that really necessary?]
>> <Kurt.Hornik@ci.tuwien.ac.at>
I more or less agree, BUT:
I'd like (in the future) to have the system-wide Rprofile searched in a
site-specific location as well (similar to Emacs, following the idea of
keeping the distribution and the site-specific things apart).
So it would be
system-wide Rsetup (which should basically be platform-specific
stuff, cause otherwise it could go into base as well?)
if .Rprofile exists run it else
if ~/.Rprofile exists run it else
if Rprofile exists on the default library search path, run it
and that search path could e.g. specify all `library' trees with a
compile-time default of
~/lib/R:/usr/local/lib/R/site:/usr/local/lib/R/${version}
and settable at run time via e.g. the environment variable R_PATH.
----------------------------------------------------------------------
TASK: Old Unfixed Problems
STATUS: Closed
FROM: <Kurt.Hornik@ci.tuwien.ac.at>
I noticed the following problems (all already reported, but not in
TASKS).
* File permissions in data should be 644.
* In src/unix/system.c, one `Rdata' should be `RData' (d -> D).
* The documentation for the noncentral chisquare distribution is
not quite correct. (rnchisq does not exist, the existing
functions have x, df and the noncentrality parameter as args,
and the density should be pnchisq(x, df, lambda)
= exp(-lambda / 2)
* sum_{r=0}^\infty \frac{lambda^r}{2^r r!} pchisq(x, df + 2r)
(semiTeX notation only, sorry).
[ All fixed now. ]
----------------------------------------------------------------------
TASK: New Problems
STATUS: Closed
FROM: <Kurt.Hornik@ci.tuwien.ac.at>
New minor remarks:
* The documentation for `image' still has the old order z, x, y.
* Perhaps one should add `par(ask = T)' in the image demo?
* Perhaps one should save the original value of par() at the
beginning of the graphics demo, and restore that at its end
(s.t. typically asking is turned off again).
----------------------------------------------------------------------
TASK: Multiplatform Support
STATUS: Open
FROM: <warnes@biostat.washington.edu>
I've modified the "$RHOME/bin/R" and "$RHOME/cmd/filename" so that you
can use the same directories for multiple machines. That is, machines
running various flavors of UNIX can access the same directories.
The modified structure adds the directories
$RHOME/bin/$OSTYPE/
$RHOME/lib/$OSTYPE/
to hold the machine specific binaries.
For instance, here the $RHOME directory contains two subdirectories,
$RHOME/bin/solaris/
$RHOME/bin/sunos4/
which each hold the appropriate R.binary file.
These two modified functions assume that the environment
variable $OSTYPE is appropriately set, as is done automatically
by the shell tcsh. If it is not set, the directory names
collapse to the original values,
$RHOME/bin/ and $RHOME/lib/
To use them, create the approprate directories and place the
correct binaries therein. ( Note that the makefiles will not do
this automatically!) Then replace $RHOME/bin/R and
$RHOME/cmd/filename with the modified ones.
----------------------------------------------------------------------
TASK: Platform Independence
STATUS: Open
FROM: Friedrich.Leisch@ci.tuwien.ac.at
IMHO we should definetely have platform-dirs for everything that's
possibly platform-dependent ... resulting in something like
<library>/<section>/<type>
e.g. for R code and
<library>/<section>/<type>/<platform>
e.g. for exec and dynload-objects.
for exec there's a problem though, as some exec's are
shell/perl/whatever-scripts and *should* work on any platform
...
----------------------------------------------------------------------
TASK: Poly
STATUS: Open
FROM: <Kurt.Hornik@ci.tuwien.ac.at>
PS1. There was also `poly' function in your snapshot WORK tree
... do you already have a final version of that?
----------------------------------------------------------------------
TASK: Naming with Numeric Values and "unlist"
STATUS: Closed
FROM: <hornik@ci.tuwien.ac.at>
R> l <- list("11" = 1:5)
R> l
$11
[1] 1 2 3 4 5
R> unlist(l)
111 112 113 114 115
1 2 3 4 5
[ The same as S does, hence determined a feature ]
----------------------------------------------------------------------
TASK: all.names needed
STATUS: Closed
FROM: <bates@stat.wisc.edu>
I could not find the all.names function in R so I created the
enclosed. Comments, criticisms, or changes to a one-liner by
creating nested anonymous functions are welcome. I'll try to
work out a corresponding all.vars function.
[ all.names() and all.vars() added in 0.61. ]
----------------------------------------------------------------------
TASK: "sys.function" problem
STATUS: Open
FROM: <bates@stat.wisc.edu>
I attempted to create a recursive anonymous function to be called
within another function. You may want to stop reading for a bit and
consider how that would be done. That is, how do you recursively call
a function that has never been assigned a name?
OK, you're back. You probably came up with a better solution than I
did but I used (sys.function())(arg) to do the recursion. The piece
of code looks like
flist <- (function(x) {
if (mode(x) == "call") {
if (x[[1]] == as.name("/"))
return(c(sys.function()(x[[2]]), sys.function()(x[[3]])))
if (x[[1]] == as.name("(")) # for R
return(sys.function()(x[[2]]))
}
if (mode(x) == "(") return(sys.function()(x[[2]])) # for S
list(x)
})(getGroupsFormula(data, form, ...)[[2]])
## I know it's horribly obscure. Blame Bill Venables for teaching me this.
Regretably, it doesn't work in R. Using the debugger one finds that
sys.function() returns the function being called the first time
through but the second time through it returns NULL. Is this a bug or
a feature?
----------------------------------------------------------------------
TASK: "update" comments and fixes
STATUS: Open
FROM: <thomas@biostat.washington.edu>
1. To make update() work with a new formula for glms, change the
first line of the glm() function from
call <- sys.call(
to
call<-match.call()
(this means that the formula component of the returned call is
labelled so that update can find it)
2. update.lm doesn't do anything with its weights= argument
Add
if (!missing(weights))
call$weights<-substitute(weights)
Similarly, to get update to work properly on glms you need a lot
more of these if statements (see update.glm at the end of the message).
3. update.lm evaluates its arguments in the wrong frame.
It creates a modified version of the original call and evaluates
it in sys.frame(sys.parent()). If update.lm is called directly
this is correct, but if it is called via update() the correct
frame is sys.frame(sys.parent(2)). Worse still, if it is called
by NextMethod() from another update.foo() the correct frame is
still higher up the list.
My solution (a bit ugly) is to move up the list of enclosing calls
checking at each stage to see if the call is NextMethod, update or an
update method. It can be seen at the end of update.glm at the bottom of
this message, and something of this sort needs to be added to other update
methods.
update.glm<-function (glm.obj, formula, data, weights, subset,
na.action, offset, family, x)
{
call <- glm.obj$call
if (!missing(formula))
call$formula <- update.formula(call$formula, formula)
if (!missing(data))
call$data <- substitute(data)
if (!missing(subset))
call$subset <- substitute(subset)
if (!missing(na.action))
call$na.action <- substitute(na.action)
if (!missing(weights))
call$weights <- substitute(weights)
if (!missing(offset))
call$offset <- substitute(offset)
if (!missing(family))
call$family <- substitute(family)
if (!missing(x))
call$x <- substitute(x)
notparent <- c("NextMethod", "update", methods(update))
for (i in 1:(1+sys.parent())) {
parent <- sys.call(-i)[[1]]
if (is.null(parent))
break
if (is.na(match(as.character(parent), notparent)))
break
}
eval(call, sys.frame(-i))
}
----------------------------------------------------------------------
TASK: Wisdom
STATUS: Open
FROM: <bates@stat.wisc.edu>
Some of the "eternal truths" about the S language are:
- every object has a mode obtainable by mode(object) [ok]
- every object has a length obtainable by length(object) [ok]
- every object can be coerced to a list of the same length
[not yet, even for expression()s
(and functions)]
One can imagine that code that messes around with functions and
other expressions in R will break fairly quickly when these
conditions do not hold. I don't know how much work would be
involved in patching over these differences between R and S
but I suspect it would not be a trivial undertaking.
----------------------------------------------------------------------
TASK: frametools
STATUS: Open
FROM: <p.dalgaard@kubism.ku.dk>
The following three functions are designed to make manipulation of
dataframes easier. I won't write detailed docs just now, but if you
follow the example below, you should get the general picture. Comments
are welcome, esp. re. naming conventions.
Note that these functions are definitely not portable to S because
they rely on R's scoping rules. Not that difficult to fix, though: The
nm vector and the "parsing" functions need to get assigned to
(evaluation) frame 1 (the "expression frame" of S), and preferably
removed at exit.
data(airquality)
aq<-airquality[1:10,]
select.frame(aq,Ozone:Temp)
subset.frame(aq,Ozone>20)
modify.frame(aq,ratio=Ozone/Temp)
Notice that in modify.frame(), any *new* variable must appear as a
tag, not as the result of an assignment, i.e.:
modify.frame(aq,Ozone<-log(Ozone)) works as expected
modify.frame(aq,lOzone<-log(Ozone)) does not.
This is mainly because it was tricky to figure out what part of a left
hand side constitutes a new variable to be created (note that indexing
could be involved). So assignments to non-existing variables just
create them as local variables within the function. Making a virtue
out of necessity, that might actually be considered a feature...
----------------------------------------
"select.frame" <-
function (dfr, ...)
{
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
e
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
n <- match(as.character(e), nm)
if (is.na(n))
e
else n
}
nm <- names(dfr)
e <- substitute(c(...))
dfr[, eval(subst.expr(e))]
}
"modify.frame" <-
function (dfr, ...)
{
nm <- names(dfr)
e <- substitute(list(...))
if (length(e) < 2)
return(dfr)
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
substitute(e)
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
if (is.na(n <- match(as.character(e), nm)))
if (is.atomic(e))
e
else substitute(e)
else substitute(dfr[, n])
}
tags <- names(as.list(e))
for (i in 2:length(e)) {
ee <- subst.expr(e[[i]])
r <- eval(ee)
if (!is.na(tags[i])) {
if (is.na(n <- match(as.character(tags[i]),
nm))) {
n <- length(nm) + 1
dfr[[n]] <- numeric(nrow(dfr))
names(dfr)[n] <- tags[i]
nm <- names(dfr)
}
dfr[[tags[i]]][] <- r
}
}
dfr
}
"subset.frame" <-
function (dfr, expr)
{
nm <- names(dfr)
e <- substitute(expr)
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
e
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
if (is.na(n <- match(as.character(e), nm)))
e
else dfr[, n]
}
r <- eval(subst.expr(e))
r <- r & !is.na(r)
dfr[r, ]
}
----------------------------------------------------------------------
TASK: General Problems
STATUS: Open
FROM: <jlindsey@luc.ac.be>
1. A gentle reminder that the default has not been changed for saving
.RData in batch mode (as was promised).
2. The degrees of freedom for the null deviance in glm are wrong when
some observations are weighted out. This can give silly answers, for
example when applying anova. The number of weighted out observations
should be subtracted, as in other df calculations.
3. The null deviance itself is wrong in glm when an offset is used. It
can be smaller than that when variables are added to the model!
4. R gave a segmentation fault when I tried to fit a model with 49
factor levels in glm (using R -v4). All these glm problems were with
poisson.
5. R still does not read my environmental variables to set memory
size.
Suggestions:
1. d, p, q, and r functions for inverse Gauss and Laplace
distributions.
2. Add a fifth function for continuous distributions, the hazard
function, h. For example, ht <- function(...) dt(...)/(1-pt(...))
is the Student t hazard function.
For writing likelihood functions, these would be much faster in C than
R and some such as Weibull can be simplified.
3. Add the five functions for three parameter distributions such as
generalized F, extreme value, etc., Box-Cox,... (I have the densities,
cumulative, and hazard as R functions.)
4. Philippe Lambert and I have d and p functions working in R for the
four-parameter stable family by inverting the characteristic function
with a Fourier transform (requires C code). S-plus only has the r
function for stables.
----------------------------------------------------------------------
TASK: Generic Print
STATUS: Open
FROM: Paul Gilbert <la-jassine@aix.pacwan.net>
I have always thought that typing the name of an object generated
a call to the print method for the object, however, (in 0.49)
I redefined the generic print method as
print <- function(x, ...)
{if (is.tframe(x)) UseMethod("print.tframe")
else UseMethod("print")
}
Now I have an object z which returns TRUE to is.tframe(z) and
> class(z)
[1] "ts" "tframe"
Then
> print(z)
[1] 1981.50 2006.25 4.00
But
> z
Error: comparison is possible only for vector types
> traceback()
[1] "c(\"print.ts(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
[2] "c(\"print(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
This is generating a call to the class method print.ts
rather than to print.tframe.ts as is done when I use
print(z). If my understanding that typing the name of an
object should generate a call to the print method for the
object then this is a bug. Otherwise, could someone please
explain to me what it does. Thanks.
----------------------------------------------------------------------
TASK: getenv()
STATUS: Closed
FROM: Paul Gilbert <la-jassine@aix.pacwan.net>
Here are two small problems I've pointed out before, but still
seem to be in 0.49.
1/ getenv() should return everything, not complain missing item.
[ Fixed now. In fact, at least under Unix getenv() now returns ]
[ the whole environment, as in S. ]
----------------------------------------------------------------------
TASK: summary.default
STATUS: Closed
FROM: Paul Gilbert <la-jassine@aix.pacwan.net>
2/ In summary.default
...
sumry[i, 2] <- if (is.object(ii))
class(ii)
should be changed to
...
sumry[i, 2] <- if (is.object(ii))
paste(class(ii), collapse=" ")
so that it works with lists of lists. (This fix was suppose to be
added to Splus 4.)
[The solution is now different:
cls <- class(ii)
sumry[i, 2] <- if (length(cls) > 0) cls[1] else "-none-"
]
----------------------------------------------------------------------
TASK: Time Series Problems
STATUS: Open
FROM: <la-jassine@aix.pacwan.net>
Here are four problems with ts:
1/ ts matrix subscripting should support drop=F:
> z<- matrix(1:10,5,2)
> z <-ts(z)
> z[,1,drop=F]
Error in [.ts(z, , 1, drop = F) : unused argument to function
[ok]
2/ == and other comparisons with non-ts matrices should work:
> z <- matrix( 1:10,5,2)
> ts(z)
Time-Series:
Start = c(1, 1)
End = c(5, 1)
Frequency = 1
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> z == ts(z)
Error: invalid time series parameters specified
3/ The generic functions start and end need default methods to
return a result for matrices as previously and in S. The
following seems to work.
start.default <- function (x) start(ts(x))
end.default <- function (x) end(ts(x))
[ Added ]
4/ In the function start.ts (and in end.ts) ts[1] in the last line
is not defined. Perhaps I am missing something?
start.ts
function (x)
{
ts.eps <- .Options$ts.eps
if (is.null(ts.eps))
ts.eps <- 1e-06
tsp <- attr(as.ts(x), "tsp")
is <- tsp[1] * tsp[3]
if (abs(is - round(is)) < ts.eps) {
is <- floor(tsp[1])
fs <- floor(tsp[3] * (tsp[1] - is) + 0.001)
c(is, fs + 1)
}
else ts[1]
}
[ Fixed ]
----------------------------------------------------------------------
TASK: Recycling problems
STATUS: Open
FROM: Paul Gilbert <la-jassine@aix.pacwan.net>
In R 0.49 comparison of logic matrices with & and | seems
to sometimes generate false warning messages about longer
object length is not a multiple of shorter object length.
I have not been able to isolate the exact circumstances.
----------------------------------------------------------------------
TASK: Generic "write" function
STATUS: Open
FROM: <Kurt.Hornik@ci.tuwien.ac.at>
Following my posting of a write.table() function, Martin
suggested that one could have a generic write() function
and special methods for e.g. time series, data frames, etc.
Well, a month has passed since ...
What does everyone think? Is it a good idea, or would
write.table() be enough? If we think that it is not enough,
which arguments should the write methods typically allow?
What about
write.xxx (x, # object
file = # filename, default stdout
append = # obvious
sep = # obvious
eol = # end of line char
...)
???
On the other hand, it seems clear that something like
write.table() is nice, and what it should do. But what
about classes other than data.frame?
Note that S has a write(.) function which would be our
write.default(.)
your write.table would be our
write.data.frame
The only addition would be a 'write.matrix' which would be 'like'
write.data.frame, the only problem being that 'matrix' is not a class
(yet).
[Note that in S4, everything has a class;
I'm voting for matrices to have a class in R ..]
write.default could 'despatch' to write.matrix if x is a matrix.
----------------------------------------------------------------------
TASK: Comparison with NA and Zero-Length Vectors
STATUS: Open
FROM: <thomas@biostat.washington.edu> + <maechler@stat.math.ethz.ch>
Thomas: Any comparison with NULL generates an error
Error: comparison is possible only for vector types
whereas in S(-PLUS) it gives NA, which seems more sensible.
Along similar lines, comparison with a length 0 vector
returns logical(0) in R but NA in S.
Martin: Isn't logical(0) more logical than NA ?
I agree that it would be best (convenience)
if 'NULL==1' returned the same as 'numeric(0)==1'.
At the moment, I don't see why compatibility with S should be
important here:
if( NULL == anything)
or, e.g., if( numeric(0) == numeric(0) )
give an error anyway, i.e., you have to test for length 0 _anyway_
in the cases where one comparison argument may have zero length.
Thomas: I didn't (previously) make any comment on this --
I only said that NA was more logical than an error message.
However, the advantage of returning NA is that NA | TRUE
is TRUE, NA & FALSE is FALSE, which doesn't happen with
logical(0). Also, from a compatibility point of view one
of them is tested with is.na(), the other with length(),
so it can matter which one you use. Of course no-one should
deliberately write code where it matters, but these things
happen.
It seems in fact that logical(0) | TRUE causes R to freeze
(R0.49, sparc solaris).
Robert: Well, we thought
logical(0) & T should return logical(0)
logical(0) | T should return logical(0)
already we have NA | T returns T and NA & T returns NA