/
R-intro.texi
7054 lines (5694 loc) · 240 KB
/
R-intro.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo
@c %**start of header
@setfilename R-intro.info
@settitle An Introduction to R
@setchapternewpage on
@c %**end of header
@c Authors: If you edit/add @example(s) , please keep
@c ./R-intro.R up-to-date !
@c ~~~~~~~~~~~
@syncodeindex fn vr
@dircategory Programming
@direntry
* R Introduction: (R-intro). An introduction to R.
@end direntry
@finalout
@include R-defs.texi
@include version.texi
@ifnottex
@macro RIcopyright{}
Copyright @copyright{} 1990 W.@: N.@: Venables@*
Copyright @copyright{} 1992 W.@: N.@: Venables & D.@: M.@: Smith@*
Copyright @copyright{} 1997 R.@: Gentleman & R.@: Ihaka@*
Copyright @copyright{} 1997, 1998 M.@: Maechler@*
@Rcopyright{1999}
@end macro
@end ifnottex
@iftex
@macro RIcopyright{}
@noindent
Copyright @copyright{} 1990 W.@: N.@: Venables
Copyright @copyright{} 1992 W.@: N.@: Venables & D.@: M.@: Smith
Copyright @copyright{} 1997 R.@: Gentleman & R.@: Ihaka
Copyright @copyright{} 1997, 1998 M.@: Maechler
@Rcopyright{1999}
@end macro
@end iftex
@c <FIXME>
@c Apparently AUCTeX 11.06 has a problem with '@appendixsection' entries
@c when updating nodes---the equivalent '@appendixsec' seems to work.
@c Hence changed (temporarily?) ...
@c </FIXME>
@c <NOTE>
@c Conversion to PDF fails if sectioning titles contain (user-defined)
@c macros such as @R{}. Hence in section titles we changed @R{} to R.
@c Revert when this is fixed.
@c </NOTE>
@ifinfo
This is an introduction to R.
@RIcopyright{}
@permission{}
@end ifinfo
@titlepage
@title An Introduction to R
@subtitle Notes on @R{}: A Programming Environment for Data Analysis and Graphics
@subtitle Version @value{VERSION}
@author W. N. Venables, D. M. Smith
@author and the R Development Core Team
@page
@vskip 0pt plus 1filll
@flushleft
@RIcopyright{}
@end flushleft
@permission{}
@value{ISBN-intro}
@end titlepage
@ifnothtml
@contents
@end ifnothtml
@ifnottex
@node Top, Preface, (dir), (dir)
@top An Introduction to R
This is an introduction to R (``GNU S''), a language and environment for
statistical computing and graphics. R is similar to the award-winning S
system, which was developed at Bell Laboratories by John Chambers et al.
It provides a wide variety of statistical and graphical techniques
(linear and nonlinear modelling, statistical tests, time series
analysis, classification, clustering, ...).
This manual provides information on data types, programming elements,
statistical modeling and graphics.
The current version of this document is @value{VERSION}.
@value{ISBN-intro}
@end ifnottex
@menu
* Preface::
* Introduction and preliminaries::
* Simple manipulations numbers and vectors::
* Objects::
* Factors::
* Arrays and matrices::
* Lists and data frames::
* Reading data from files::
* Probability distributions::
* Loops and conditional execution::
* Writing your own functions::
* Statistical models in R::
* Graphics::
* A sample session::
* Invoking R::
* The command line editor::
* Function and variable index::
* Concept index::
* References::
@end menu
@node Preface, Introduction and preliminaries, Top, Top
@unnumbered Preface
This introduction to @R{} is derived from an original set of notes
describing the @Sl{} and @SPLUS{} environments written by Bill Venables
and David M. Smith (Insightful Corporation). We have made a number of
small changes to reflect differences between the @R{} and @Sl{}
programs, and expanded some of the material.
We would like to extend warm thanks to Bill Venables for granting
permission to distribute this modified version of the notes in this way,
and for being a supporter of @R{} from way back.
Comments and corrections are always welcome. Please address email
correspondence to @email{R-core@@r-project.org}.
@subheading Suggestions to the reader
Most @R{} novices will start with the introductory session in Appendix
A. This should give some familiarity with the style of @R{} sessions
and more importantly some instant feedback on what actually happens.
Many users will come to @R{} mainly for its graphical facilities. In
this case, @ref{Graphics} on the graphics facilities can be read at
almost any time and need not wait until all the preceding sections have
been digested.
@menu
* Introduction and preliminaries::
@end menu
@node Introduction and preliminaries, Simple manipulations numbers and vectors, Preface, Top
@chapter Introduction and preliminaries
@menu
* The R environment::
* Related software and documentation::
* R and statistics::
* R and the window system::
* Using R interactively::
* Getting help::
* R commands; case sensitivity etc::
* Recall and correction of previous commands::
* Executing commands from or diverting output to a file::
* Data permanency and removing objects::
@end menu
@node The R environment, Related software and documentation, Introduction and preliminaries, Introduction and preliminaries
@section The R environment
@R{} is an integrated suite of software facilities for data
manipulation, calculation and graphical display. Among other things it
has
@itemize @bullet
@item
an effective data handling and storage facility,
@item
a suite of operators for calculations on arrays, in particular matrices,
@item
a large, coherent, integrated collection of intermediate tools for data
analysis,
@item
graphical facilities for data analysis and display either directly at
the computer or on hardcopy, and
@item
a well developed, simple and effective programming language which
includes conditionals, loops, user defined recursive functions and input
and output facilities. (Indeed most of the system supplied functions
are themselves written in the @Sl{} language.)
@end itemize
The term ``environment'' is intended to characterize it as a fully
planned and coherent system, rather than an incremental accretion of
very specific and inflexible tools, as is frequently the case with other
data analysis software.
@R{} is very much a vehicle for newly developing methods of interactive
data analysis. As such it is very dynamic, and new releases have not
always been fully backwards compatible with previous releases. Some
users welcome the changes because of the bonus of new technology and new
methods that come with new releases; others seem to be more worried by
the fact that old code no longer works. Although @R{} is intended as a
programming language, one should regard most programs written in @R{} as
essentially ephemeral.
@node Related software and documentation, R and statistics, The R environment, Introduction and preliminaries
@section Related software and documentation
@R{} can be regarded as an implementation of the @Sl{} language which
was developed at Bell Laboratories by Rick Becker, John Chambers and
Allan Wilks, and also forms the basis of the @SPLUS{} systems.
The evolution of the @Sl{} language is characterized by four books by
John Chambers and coauthors. For @R{}, the basic reference is @emph{The
New @Sl{} Language: A Programming Environment for Data Analysis and
Graphics} by Richard A.@: Becker, John M.@: Chambers and Allan R.@:
Wilks. The new features of the 1991 release of @Sl{} (@Sl{} version 3)
are covered in @emph{Statistical Models in @Sl{}} edited by John M.@:
Chambers and Trevor J.@: Hastie. @xref{References}, for precise
references.
In addition, documentation for @Sl{}/@SPLUS{} can typically be used with
@R{}, keeping the differences between the @Sl{} implementations in mind.
@xref{What documentation exists for R?, , , R-FAQ, The R statistical
system FAQ}.
@node R and statistics, R and the window system, Related software and documentation, Introduction and preliminaries
@section R and statistics
@cindex Packages
Our introduction to the @R{} environment did not mention
@emph{statistics}, yet many people use @R{} as a statistics system. We
prefer to think of it of an environment within which many classical and
modern statistical techniques have been implemented. Some of these are
built into the base @R{} environment, but many are supplied as
@emph{packages}. (Currently the distinction is largely a matter of
historical accident.) There are about 8 packages supplied with @R{}
(called ``standard'' packages) and many more are available through the
@acronym{CRAN} family of Internet sites (via
@uref{http://cran.r-project.org}).
Most classical statistics and much of the latest methodology is
available for use with @R{}, but users will need to be prepared to do a
little work to find it.
There is an important difference in philosophy between @Sl{} (and hence
@R{}) and the other main statistical systems. In @Sl{} a statistical
analysis is normally done as a series of steps, with intermediate
results being stored in objects. Thus whereas SAS and SPSS will give
copious output from a regression or discriminant analysis, @R{} will
give minimal output and store the results in a fit object for subsequent
interrogation by further @R{} functions.
@node R and the window system, Using R interactively, R and statistics, Introduction and preliminaries
@section R and the window system
The most convenient way to use @R{} is at a graphics workstation running
a windowing system. This guide is aimed at users who have this
facility. In particular we will occasionally refer to the use of @R{}
on an X window system although the vast bulk of what is said applies
generally to any implementation of the @R{} environment.
Most users will find it necessary to interact directly with the
operating system on their computer from time to time. In this guide, we
mainly discuss interaction with the operating system on UNIX machines.
If you are running @R{} under Windows you will need to make some small
adjustments.
Setting up a workstation to take full advantage of the customizable
features of @R{} is a straightforward if somewhat tedious procedure, and
will not be considered further here. Users in difficulty should seek
local expert help.
@node Using R interactively, Getting help, R and the window system, Introduction and preliminaries
@section Using R interactively
When you use the @R{} program it issues a prompt when it expects input
commands. The default prompt is @samp{@code{>}}, which on UNIX might be
the same as the shell prompt, and so it may appear that nothing is
happening. However, as we shall see, it is easy to change to a
different @R{} prompt if you wish. We will assume that the UNIX shell
prompt is @samp{@code{$}}.
In using @R{} under UNIX the suggested procedure for the first occasion
is as follows:
@enumerate
@item
Create a separate sub-directory, say @file{work}, to hold data files on
which you will use @R{} for this problem. This will be the working
directory whenever you use @R{} for this particular problem.
@example
$ mkdir work
$ cd work
@end example
@item
Start the @R{} program with the command
@example
$ R
@end example
@item
At this point @R{} commands may be issued (see later).
@item
To quit the @R{} program the command is
@example
> q()
@end example
At this point you will be asked whether you want to save the data from
your @R{} session. You can respond @kbd{yes}, @kbd{no} or @kbd{cancel}
(a single letter abbreviation will do) to save the data before quitting,
quit without saving, or return to the @R{} session. Data which is saved
will be available in future @R{} sessions.
@end enumerate
Further @R{} sessions are simple.
@enumerate
@item
Make @file{work} the working directory and start the program as before:
@example
$ cd work
$ R
@end example
@item
Use the @R{} program, terminating with the @code{q()} command at the end
of the session.
@end enumerate
To use @R{} under Windows the procedure to
follow is basically the same. Create a folder as the working directory,
and set that in the @file{Start In} field in your @R{} shortcut.
Then launch @R{} by double clicking on the icon.
@section An introductory session
Readers wishing to get a feel for @R{} at a computer before proceeding
are strongly advised to work through the introductory session
given in @ref{A sample session}.
@node Getting help, R commands; case sensitivity etc, Using R interactively, Introduction and preliminaries
@section Getting help with functions and features
@findex help
@R{} has an inbuilt help facility similar to the @code{man} facility of
UNIX. To get more information on any specific named function, for
example @code{solve}, the command is
@example
> help(solve)
@end example
@findex help
An alternative is
@example
> ?solve
@end example
@findex ?
For a feature specified by special characters, the argument must be
enclosed in double or single quotes, making it a ``character string'':
This is also necessary for a few words with syntactic meaning including
@code{if}, @code{for} and @code{function}.
@example
> help("[[")
@end example
Either form of quote mark may be used to escape the other, as in the
string @code{"It's important"}. Our convention is to use
double quote marks for preference.
On most @R{} installations help is available in @HTML{} format by running
@findex help.start
@example
> help.start()
@end example
@noindent
which will launch a Web browser (@code{netscape} on UNIX) that allows
the help pages to be browsed with hyperlinks. On UNIX, subsequent help
requests are sent to the @HTML{}-based help system. The `Search Engine
and Keywords' link in the page loaded by @code{help.start()} is
particularly useful as it is contains a high-level concept list which
searches though available functions. It can be a great way to get you
bearings quickly and to understand the breadth of what @R{} has to
offer.
@findex help.search
The @code{help.search} command allows searching for help in various
ways: try @code{?help.search} for details and examples.
The examples on a help topic can normally be run by
@findex example
@example
> example(@var{topic})
@end example
Windows versions of @R{} have other optional help systems: use
@example
> ?help
@end example
@noindent
for further details.
@node R commands; case sensitivity etc, Recall and correction of previous commands, Getting help, Introduction and preliminaries
@section R commands, case sensitivity, etc.
Technically @R{} is an @emph{expression language} with a very simple
syntax. It is @emph{case sensitive} as are most UNIX based packages, so
@code{A} and @code{a} are different symbols and would refer to different
variables. The set of symbols which can be used in @R{} names depends
on the operating system and country within which @R{} is being run
(technically on the @emph{locale} in use). Normally all alphanumeric
symbols are allowed (and in some countries this includes accented
letters) plus @samp{@code{.}}@footnote{C programmers should note that
@samp{@code{_}} is not available, but @samp{@code{.}} is and is often
used to separate words in @R{} names.}, with the restriction that a name
cannot start with a digit.
Elementary commands consist of either @emph{expressions} or
@emph{assignments}. If an expression is given as a command, it is
evaluated, printed, and the value is lost. An assignment also evaluates
an expression and passes the value to a variable but the result is not
automatically printed.
Commands are separated either by a semi-colon (@samp{@code{;}}), or by a
newline. Elementary commands can be grouped together into one compound
expression by braces (@samp{@code{@{}} and @samp{@code{@}}}).
@emph{Comments} can be put almost@footnote{@strong{not} inside strings,
nor within the argument list of a function definition} anywhere,
starting with a hashmark (@samp{@code{#}}), everything to the end of the
line is a comment.
If a command is not complete at the end of a line, @R{} will
give a different prompt, by default
@smallexample
+
@end smallexample
@noindent
on second and subsequent lines and continue to read input until the
command is syntactically complete. This prompt may be changed by the
user. We will generally omit the continuation prompt
and indicate continuation by simple indenting.
@node Recall and correction of previous commands, Executing commands from or diverting output to a file, R commands; case sensitivity etc, Introduction and preliminaries
@section Recall and correction of previous commands
Under many versions of UNIX and on Windows, @R{} provides a mechanism
for recalling and re-executing previous commands. The vertical arrow
keys on the keyboard can be used to scroll forward and backward through
a @emph{command history}. Once a command is located in this way, the
cursor can be moved within the command using the horizontal arrow keys,
and characters can be removed with the @key{DEL} key or added with the
other keys. More details are provided later: @pxref{The command line
editor}.
The recall and editing capabilities under UNIX are highly customizable.
You can find out how to do this by reading the manual entry for the
@strong{readline} library.
Alternatively, the Emacs text editor provides more general support
mechanisms (via @acronym{ESS}, @emph{Emacs Speaks Statistics}) for
working interactively with @R{}. @xref{R and Emacs, , , R-FAQ, The R
statistical system FAQ}.
@node Executing commands from or diverting output to a file, Data permanency and removing objects, Recall and correction of previous commands, Introduction and preliminaries
@section Executing commands from or diverting output to a file
@cindex Diverting input and output
If commands are stored on an external file, say @file{commands.R} in the
working directory @file{work}, they may be executed at any time in an
@R{} session with the command
@example
> source("commands.R")
@end example
@findex source
For Windows @strong{Source} is also available on the
@strong{File} menu. The function @code{sink},
@example
> sink("record.lis")
@end example
@findex sink
@noindent
will divert all subsequent output from the console to an external file,
@file{record.lis}. The command
@example
> sink()
@end example
@noindent
restores it to the console once again.
@node Data permanency and removing objects, , Executing commands from or diverting output to a file, Introduction and preliminaries
@section Data permanency and removing objects
The entities that @R{} creates and manipulates are known as
@emph{objects}. These may be variables, arrays of numbers, character
strings, functions, or more general structures built from such
components.
During an @R{} session, objects are created and stored by name (we
discuss this process in the next session). The @R{} command
@example
> objects()
@end example
@noindent
(alternatively, @code{ls()} can be used to display the names of the
objects which are currently stored within @R{}. The collection of
objects currently stored is called the @emph{workspace}.
@cindex Workspace
To remove objects the function @code{rm} is available:
@example
> rm(x, y, z, ink, junk, temp, foo, bar)
@end example
@findex rm
@cindex Removing objects
All objects created during an @R{} sessions can be stored permanently in
a file for use in future @R{} sessions. At the end of each @R{} session
you are given the opportunity to save all the currently available
objects. If you indicate that you want to do this, the objects are
written to a file called @file{.RData}@footnote{The leading ``dot'' in
this file name makes it @emph{invisible} in normal file listings in
UNIX.} in the current directory.
When @R{} is started at later time it reloads the workspace from this
file. At the same time the associated command history is reloaded.
It is recommended that you should use separate working directories for
analyses conducted with @R{}. It is quite common for objects with names
@code{x} and @code{y} to be created during an analysis. Names like this
are often meaningful in the context of a single analysis, but it can be
quite hard to decide what they might be when the several analyses have
been conducted in the same directory.
@node Simple manipulations numbers and vectors, Objects, Introduction and preliminaries, Top
@chapter Simple manipulations; numbers and vectors
@cindex Vectors
@menu
* Vectors and assignment::
* Vector arithmetic::
* Generating regular sequences::
* Logical vectors::
* Missing values::
* Character vectors::
* Index vectors::
* Other types of objects::
@end menu
@node Vectors and assignment, Vector arithmetic, Simple manipulations numbers and vectors, Simple manipulations numbers and vectors
@section Vectors and assignment
@R{} operates on named @emph{data structures}. The simplest such
structure is the numeric @emph{vector}, which is a single entity
consisting of an ordered collection of numbers. To set up a vector
named @code{x}, say, consisting of five numbers, namely 10.4, 5.6, 3.1,
6.4 and 21.7, use the @R{} command
@example
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
@end example
@findex c
@findex vector
This is an @emph{assignment} statement using the @emph{function}
@code{c()} which in this context can take an arbitrary number of vector
@emph{arguments} and whose value is a vector got by concatenating its
arguments end to end.@footnote{With other than vector types of argument,
such as @code{list} mode arguments, the action of @code{c()} is rather
different. See @ref{Concatenating lists}.}
A number occurring by itself in an expression is taken as a vector of
length one.
Notice that the assignment operator (@samp{@code{<-}}) is @strong{not}
the usual @samp{@code{=}} operator, which is reserved for another
purpose. It consists of the two characters @samp{@code{<}} (``less
than'') and @samp{@code{-}} (``minus'') occurring strictly side-by-side
and it `points' to the object receiving the value of the expression.
@c In this text, the assignment operator is printed as @samp{<-}, rather
@c than ``@code{<-}''.
@footnote{The underscore character, @samp{@code{_}} is an allowable
synonym for the left pointing assignment operator @samp{@code{<-}},
however we discourage this option, as it can easily lead to much less
readable code.}
@cindex Assignment
Assignment can also be made using the function @code{assign()}. An
equivalent way of making the same assignment as above is with:
@example
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
@end example
@noindent
The usual operator, @code{<-}, can be thought of as a syntactic
short-cut to this.
Assignments can also be made in the other direction, using the obvious
change in the assignment operator. So the same assignment could be made
using
@example
> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
@end example
If an expression is used as a complete command, the value is printed
@emph{and lost}@footnote{Actually, it is still available as
@code{.Last.value} before any other statements are executed}. So now if we
were to use the command
@example
> 1/x
@end example
@noindent
the reciprocals of the five values would be printed at the terminal (and
the value of @code{x}, of course, unchanged).
The further assignment
@example
> y <- c(x, 0, x)
@end example
@noindent
would create a vector @code{y} with 11 entries consisting of two copies
of @code{x} with a zero in the middle place.
@node Vector arithmetic, Generating regular sequences, Vectors and assignment, Simple manipulations numbers and vectors
@section Vector arithmetic
Vectors can be used in arithmetic expressions, in which case the
operations are performed element by element. Vectors occurring in the
same expression need not all be of the same length. If they are not,
the value of the expression is a vector with the same length as the
longest vector which occurs in the expression. Shorter vectors in the
expression are @emph{recycled} as often as need be (perhaps
fractionally) until they match the length of the longest vector. In
particular a constant is simply repeated. So with the above assignments
the command
@cindex Recycling rule
@example
> v <- 2*x + y + 1
@end example
@noindent
generates a new vector @code{v} of length 11 constructed by adding
together, element by element, @code{2*x} repeated 2.2 times, @code{y}
repeated just once, and @code{1} repeated 11 times.
@cindex Arithmetic functions and operators
The elementary arithmetic operators are the usual @code{+}, @code{-},
@code{*}, @code{/} and @code{^} for raising to a power.
@findex +
@findex -
@findex *
@findex /
@findex ^
In addition all of the common arithmetic functions are available.
@code{log}, @code{exp}, @code{sin}, @code{cos}, @code{tan}, @code{sqrt},
and so on, all have their usual meaning.
@findex log
@findex exp
@findex sin
@findex cos
@findex tan
@findex sqrt
@code{max} and @code{min} select the largest and smallest elements of a
vector respectively.
@findex max
@findex min
@code{range} is a function whose value is a vector of length two, namely
@code{c(min(x), max(x))}.
@findex range
@code{length(x)} is the number of elements in @code{x},
@findex length
@code{sum(x)} gives the total of the elements in @code{x},
@findex sum
and @code{prod(x)} their product.
@findex prod
Two statistical functions are @code{mean(x)} which calculates the sample
mean, which is the same as @code{sum(x)/length(x)},
@findex mean
and @code{var(x)} which gives
@example
sum((x-mean(x))^2)/(length(x)-1)
@end example
@findex var
@noindent
or sample variance. If the argument to @code{var()} is an
@math{n}-by-@math{p} matrix the value is a @math{p}-by-@math{p} sample
covariance matrix got by regarding the rows as independent
@math{p}-variate sample vectors.
@code{sort(x)} returns a vector of the same size as @code{x} with the
elements arranged in increasing order; however there are other more
flexible sorting facilities available (see @code{order()} or
@code{sort.list()} which produce a permutation to do the sorting).
@findex sort
@findex order
Note that @code{max} and @code{min} select the largest and smallest
values in their arguments, even if they are given several vectors. The
@emph{parallel} maximum and minimum functions @code{pmax} and
@code{pmin} return a vector (of length equal to their longest argument)
that contains in each element the largest (smallest) element in that
position in any of the input vectors.
@findex pmax
@findex pmin
For most purposes the user will not be concerned if the ``numbers'' in a
numeric vector are integers, reals or even complex. Internally
calculations are done as double precision real numbers, or double
precision complex numbers if the input data are complex.
To work with complex numbers, supply an explicit complex part. Thus
@example
sqrt(-17)
@end example
@noindent
will give @code{NaN} and a warning, but
@example
sqrt(-17+0i)
@end example
@noindent
will do the computations as complex numbers.
@menu
* Generating regular sequences::
@end menu
@node Generating regular sequences, Logical vectors, Vector arithmetic, Simple manipulations numbers and vectors
@section Generating regular sequences
@cindex Regular sequences
@R{} has a number of facilities for generating commonly used sequences
of numbers. For example @code{1:30} is the vector @code{c(1, 2,
@dots{}, 29, 30)}.
@c <NOTE>
@c Info cannot handle ':' as an index entry.
@ifnotinfo
@findex :
@end ifnotinfo
@c </NOTE>
The colon operator has highest priority within an expression, so, for
example @code{2*1:15} is the vector @code{c(2, 4, @dots{}, 28, 30)}.
Put @code{n <- 10} and compare the sequences @code{1:n-1} and
@code{1:(n-1)}.
The construction @code{30:1} may be used to generate a sequence
backwards.
@findex seq
The function @code{seq()} is a more general facility for generating
sequences. It has five arguments, only some of which may be specified
in any one call. The first two arguments, if given, specify the
beginning and end of the sequence, and if these are the only two
arguments given the result is the same as the colon operator. That is
@code{seq(2,10)} is the same vector as @code{2:10}.
Parameters to @code{seq()}, and to many other @R{} functions, can also
be given in named form, in which case the order in which they appear is
irrelevant. The first two parameters may be named
@code{from=@var{value}} and @code{to=@var{value}}; thus
@code{seq(1,30)}, @code{seq(from=1, to=30)} and @code{seq(to=30,
from=1)} are all the same as @code{1:30}. The next two parameters to
@code{seq()} may be named @code{by=@var{value}} and
@code{length=@var{value}}, which specify a step size and a length for
the sequence respectively. If neither of these is given, the default
@code{by=1} is assumed.
For example
@example
> seq(-5, 5, by=.2) -> s3
@end example
@noindent
generates in @code{s3} the vector @code{c(-5.0, -4.8, -4.6, @dots{},
4.6, 4.8, 5.0)}. Similarly
@example
> s4 <- seq(length=51, from=-5, by=.2)
@end example
@noindent
generates the same vector in @code{s4}.
The fifth parameter may be named @code{along=@var{vector}}, which if
used must be the only parameter, and creates a sequence @code{1, 2,
@dots{}, length(@var{vector})}, or the empty sequence if the vector is
empty (as it can be).
A related function is @code{rep()}
@findex rep
which can be used for replicating an object in various complicated ways.
The simplest form is
@example
> s5 <- rep(x, times=5)
@end example
@noindent
which will put five copies of @code{x} end-to-end in @code{s5}.
@node Logical vectors, Missing values, Generating regular sequences, Simple manipulations numbers and vectors
@section Logical vectors
As well as numerical vectors, @R{} allows manipulation of logical
quantities. The elements of a logical vectors can have the values
@code{TRUE}, @code{FALSE}, and @code{NA} (for ``not available'', see
below). The first two are often abbreviated as @code{T} and @code{F},
respectively. Note however that @code{T} and @code{F} are just
variables which are set to @code{TRUE} and @code{FALSE} by default, but
are not reserved words and hence can be overwritten by the user. Hence,
you should always use @code{TRUE} and @code{FALSE}.
@findex FALSE
@findex TRUE
@findex F
@findex T
Logical vectors are generated by @emph{conditions}. For example
@example
> temp <- x > 13
@end example
@noindent
sets @code{temp} as a vector of the same length as @code{x} with values
@code{FALSE} corresponding to elements of @code{x} where the condition
is @emph{not} met and @code{TRUE} where it is.
The logical operators are @code{<}, @code{<=}, @code{>}, @code{>=},
@code{==} for exact equality and @code{!=} for inequality.
@findex <
@findex <=
@findex >
@findex >=
@findex ==
@findex !=
In addition if @code{c1} and @code{c2} are logical expressions, then
@w{@code{c1 & c2}} is their intersection (@emph{``and''}), @w{@code{c1 | c2}}
is their union (@emph{``or''}), and @code{!c1} is the negation of
@code{c1}.
@findex !
@findex |
@findex &
Logical vectors may be used in ordinary arithmetic, in which case they
are @emph{coerced} into numeric vectors, @code{FALSE} becoming @code{0}
and @code{TRUE} becoming @code{1}. However there are situations where
logical vectors and their coerced numeric counterparts are not
equivalent, for example see the next subsection.
@node Missing values, Character vectors, Logical vectors, Simple manipulations numbers and vectors
@section Missing values
@cindex Missing values
In some cases the components of a vector may not be completely
known. When an element or value is ``not available'' or a ``missing
value'' in the statistical sense, a place within a vector may be
reserved for it by assigning it the special value @code{NA}.
@findex NA
In general any operation on an @code{NA} becomes an @code{NA}. The
motivation for this rule is simply that if the specification of an
operation is incomplete, the result cannot be known and hence is not
available.
@findex is.na
The function @code{is.na(x)} gives a logical vector of the same size as
@code{x} with value @code{TRUE} if and only if the corresponding element
in @code{x} is @code{NA}.
@example
> z <- c(1:3,NA); ind <- is.na(z)
@end example
Notice that the logical expression @code{x == NA} is quite different
from @code{is.na(x)} since @code{NA} is not really a value but a marker
for a quantity that is not available. Thus @code{x == NA} is a vector
of the same length as @code{x} @emph{all} of whose values are @code{NA}
as the logical expression itself is incomplete and hence undecidable.
Note that there is a second kind of ``missing'' values which are
produced by numerical computation, the so-called @emph{Not a Number},
@code{NaN},
@findex NaN
values. Examples are
@example
> 0/0
@end example
@noindent
or
@example
> Inf - Inf
@end example
@noindent
which both give @code{NaN} since the result cannot be defined sensibly.
In summary, @code{is.na(xx)} is @code{TRUE} @emph{both} for @code{NA}
and @code{NaN} values. To differentiate these, @code{is.nan(xx)} is only
@code{TRUE} for @code{NaN}s.
@findex is.nan
@node Character vectors, Index vectors, Missing values, Simple manipulations numbers and vectors
@section Character vectors
@cindex Character vectors
Character quantities and character vectors are used frequently in @R{},
for example as plot labels. Where needed they are denoted by a sequence
of characters delimited by the double quote character, e.g.,
@code{"x-values"}, @code{"New iteration results"}.
Character strings are entered using either double (@code{"}) or single
(@code{'}) quotes, but are printed using double quotes (or sometimes
without quotes). They use C-style escape sequences, using @code{\} as
the escape character, so @code{\\} is entered and printed as @code{\\},
and inside double quotes @code{"} is entered as @code{\"}. Other
useful escape sequences are @code{\n}, newline, @code{\t}, tab and
@code{\b}, backspace.
Character vectors may be concatenated into a vector by the @code{c()}
function; examples of their use will emerge frequently.
@findex c
@findex paste
The @code{paste()} function takes an arbitrary number of arguments and
concatenates them one by one into character strings. Any numbers given
among the arguments are coerced into character strings in the evident
way, that is, in the same way they would be if they were printed. The
arguments are by default separated in the result by a single blank
character, but this can be changed by the named parameter,
@code{sep=@var{string}}, which changes it to @code{@var{string}},
possibly empty.
For example
@example
> labs <- paste(c("X","Y"), 1:10, sep="")
@end example
@noindent
makes @code{labs} into the character vector
@example
c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
@end example
Note particularly that recycling of short lists takes place here too;