# The R Base, Stats, and Graphics Packages

In [11]:
#https://vimeo.com/737664937
install.packages("vembedr")
library("vembedr")
embed_vimeo("737664937")


The downloaded binary packages are in
	/var/folders/wk/6why77bn1kn0l0pkd4vd3zl00000gn/T//RtmpA9vtPo/downloaded_packages


### Table of Contents <a class="anchor" id="DS102L1-WS.3_toc"></a>

* [Table of Contents](#DS102L1-WS.3_toc)
    * [Page 1 - The Base Package](#DS102L1-WS.3_page_1)
    * [Page 2 - Reserved Words](#DS102L1-WS.3_page_2)
    * [Page 3 - Built-in Constants](#DS102L1-WS.3_page_3)
    * [Page 4 - Trigonometric and Hyperbolic Functions](#DS102L1-WS.3_page_4)
    * [Page 5 - Exponential and Log Functions](#DS102L1-WS.3_page_5)
    * [Page 6 - Beta and Gamma Related Functions](#DS102L1-WS.3_page_6)
    * [Page 7 - Miscellaneous Mathematical Functions](#DS102L1-WS.3_page_7)
    * [Page 8 - Complex Numbers](#DS102L1-WS.3_page_8)
    * [Page 9 - Matrices, Arrays, and Data Frames](#DS102L1-WS.3_page_9)
    * [Page 10 - A Few Other Functions and Some Comments](#DS102L1-WS.3_page_10)
    * [Page 11 - The stats Package ](#DS102L1-WS.3_page_11)
    * [Page 12 - Some Functions That Do Tests ](#DS102L1-WS.3_page_12)
    * [Page 13 - Modeling Functions in stats ](#DS102L1-WS.3_page_13)
    * [Page 14 - Clustering Algorithms and Other Multivariate Techniques ](#DS102L1-WS.3_page_14)
    * [Page 15 - The graphics Package ](#DS102L1-WS.3_page_15)
   

Let's look at the base, stats, and graphics packages (three of the packages loaded by default in R). The base package contains things such as the trigonometric function and other mathematical functions, many of the as. and is. functions, the arithmetic operators, the flow control statements, some apply functions, and many other basic functions in R.

The stats package contains many basic statistical functions, such as functions to find the median, the standard deviation, and the variance. It also includes the functions associated with common probability distributions as well as many more statistical functions. The graphics package contains the basic plotting functions (_except plot()_) and ancillary functions used by _plot() and other plotting functions_.

The other packages loaded by default are datasets, which contains data sets; utils, which contains utility functions; grDevices, which contains information used in plotting, such as fonts and colors; and methods, which contains functions and information for working with S4 (formal) methods and classes.

For a list of the functions in a package with clickable links to the function help pages, you can use the Packages tab in RStudio and select the package name or enter help(package=package.name) or library(help=package.name) at the R prompt, where package.name is the name of the package.

The source of the information in this notebook is the [R help pages](https://www.r-project.org/help.html).

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - The Base Package <a class="anchor" id="DS102L1-WS.3_page_1"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

The base package contains many functions basic to R. The list of links to the help pages for base is 30 pages long. This section covers the reserved words, the built-in constants, the trigonometric and hyperbolic functions, the exponential and log functions, the functions related to the beta and gamma functions, some other mathematical functions, and functions for complex numbers, matrix functions, and a few other functions. It also discusses some other functions in the base package.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Reserve Words <a class="anchor" id="DS102L1-WS.3_page_2"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

The reserved words in R are if, else, repeat, while, for, function, next, break, in, TRUE, FALSE, Inf, NULL, NA, NaN, NA_integer_, NA_real_, NA_complex_, NA_character_, ..., ..1, ..2, and so forth.

Reserved Words in R

For more information, enter __?Reserved__ at the R prompt or use the Help tab in RStudio.

In [9]:
?Reserved

0,1
Reserved {base},R Documentation


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Built-in Constants <a class="anchor" id="DS102L1-WS.3_page_3"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

The built-in constants in R are LETTERS, which are the 26 letters in the English alphabet and which are capitalized; letters, which are the 26 letters in the English alphabet and which are lowercase; month.abb, which are three-letter abbreviations of the names of the months in English; month.name, which are the names of the months in English; and pi, the mathematical constant π. 

You can find more information about the constants by using the Help tab in RStudio or by entering __?Constants__ at the R prompt.

In [10]:
?Constants

0,1
Constants {base},R Documentation


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Trigonometric and Hyperbolic Functions <a class="anchor" id="DS102L1-WS.3_page_4"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

[SOHCAHTOA](https://mathworld.wolfram.com/SOHCAHTOA.html#:~:text=%22SOHCAHTOA%22%20is%20a%20helpful%20mnemonic,Other%20mnemonics%20include)


<p style="text-align: center">
  <img  src="SOHCAHTOA.png" width="800" alt="SOHCAHTOA">
</p>

- $\cos^2 \theta + \sin^2 \theta = 1 $

- $\sin \theta \frac{opposite}{hypotenuse}$

- $\cos \theta \frac{adjacent}{hypotenuse}$

- $\tan \theta \frac{opposite}{adjacent}$

The trigonometric and hyperbolic functions available in R are the cosine (cos()), the cosine for which the argument is multiplied by pi before the cosine is taken (cospi()), the sine (sin()), the sine for which the argument is multiplied by pi before the sine is taken (sinpi()), the tangent (tan()), the tangent for which the argument has been multiplied by pi before the tangent is taken (tanpi()), the inverse cosine (acos()), the inverse sine (asin()), two versions of the inverse tangent (atan() and atan2()), the hyperbolic cosine (cosh()), the hyperbolic sine (sinh()), the hyperbolic tangent (tanh()), the inverse hyperbolic cosine (acosh()), the inverse hyperbolic sine (asinh()), and the inverse hyperbolic tangent (atanh()).

Angles are entered into the functions as radians (an angle in radians equals pi divided by 180 times the angle measured in degrees), except for cospi(), sinpi(), and tanpi(). For cospi(), sinpi(), and tanpi(), angles are entered as fractions of a circle times two (e.g., one is equivalent to 180 degrees, since 180 degrees is one-half of a circle). For the inverse functions, the angles are returned in radians. (Note that the result in degrees equals 180 divided by pi times the result that is returned in radians.) The argument(s) to the functions must be of the logical, integer, double, or complex type, except for cospi(), sinpi(), and tanpi(), which cannot be of the complex type. Values of the logical type are coerced to the integer type (see as.integer() in Chapter 4).

For the inverse cosine and sine, the values must be between minus one and one, inclusive. For other values, the result is NaN. For the inverse tangent, atan() takes one argument (which can be any object or expression of the logical, integer, double, or complex type), and the result falls between minus pi divided by two (-π/2, which is equivalently -90°) and pi divided by two (π/2, which is equivalently 90°).

The atan2() function takes two arguments. The function returns the inverse tangent of the ratio of the two arguments, with the first argument being the numerator and the second the denominator. Both arguments of the function take any object or expression of the logical, integer, double, or complex type (logical values are coerced to the integer type). The arguments can be of different lengths and will cycle.

The tangent of x, for any number x, is the sine of x divided by the cosine of x. Since the function atan2() finds the angles associated with numbers in both a numerator and a denominator, the angles can fall in any quadrant, rather than just between minus pi divided by two and pi divided by two (between -90° and 90°). It follows that the function returns angular results between minus pi (-π, or equivalently -180°) and pi (π, or equivalently 180°).

The quadrant of the angle depends on the signs on the arguments for the numerator and the denominator. (The quadrants start at the positive x axis and take up 90 degrees of arc. The quadrant number increases in the counterclockwise direction.) For the sign combinations, +/+ puts the angle in the first quadrant (0 to π/2), +/– puts the angle in the second quadrant (π/2 to π), –/– puts the angle in the third quadrant (-π to -π/2), and –/+ puts the angle in the fourth quadrant (-π/2 to 0). Also, zero in the denominator returns pi divided by two (π/2, or equivalently 90°) or minus pi divided by two (–π/2, or equivalently -90°), depending on the sign of the numerator.

The hyperbolic functions can also take any object or expression of the logical, integer, double, or complex type (logical values are coerced to the integer type). For the inverse of the hyperbolic functions, any values in the argument for acosh() must be between one and plus infinity, inclusive; the argument of asinh() can be any object or expression of the logical, integer, double, or complex type; and the values in the argument of atanh() must be between minus one and one, inclusive. For acosh() and atanh(), illegal values return NaN and a warning.

Arguments to the trigonometric and hyperbolic functions can be vectors, matrices, data frames, or arrays. For arguments with more than one element, the operation is carried out elementwise. For atan2(), which takes two arguments, the arguments cycle. The functions return an object of the same dimension(s) as the argument(s) to the function.

You can find more information about the trigonometric and hyperbolic functions by using the Help tab in RStudio or by entering __?Trig__ and __?cosh__, respectively, at the R prompt.



In [3]:
?Trig

0,1
Trig {base},R Documentation

0,1
"x, y",numeric or complex vectors.


In [4]:
?cosh

0,1
Hyperbolic {base},R Documentation

0,1
x,a numeric or complex vector


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Exponential and Log Functions <a class="anchor" id="DS102L1-WS.3_page_5"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

There are two exponential functions in the base package: exp() and expm1(). There are five logarithmic functions: log(), logb(), log10(), log2(), and log1p().

The exp() function returns the value of Euler’s number (i.e., e) raised to the power of the argument of the function (i.e., ex). The expm1() function returns Euler’s number raised to the value of the argument, from which quantity one is subtracted (i.e., ex − 1). The expm1() function is useful for data where the smallest value is zero (e.g., count data). The exponential functions can find the result for any value of the logical, integer, double, or complex type (logical values are coerced to the integer type).

The log() and logb() functions return the logarithms of the values in the first argument of the function. The base for the logarithm is set by the second argument. For both functions, the default base is Euler’s number (i.e., the functions return natural logarithms by default). The log() function is an S4 generic function as well as an S3 function. The logb() function is just an S3 function.

For the log10() and log2() functions, the base of the logarithm is ten and two, respectively (i.e., log10(x) is the value for which 10log10(x) is x and log2(x) is the value for which 2log2(x) is x). The preceding four functions find logarithms for nonnegative values of the integer or double type and for any values of the complex or logical type (logical values are coerced to the integer type). The legal types for the second argument are the same as the legal types for the first argument. The first and second arguments need not be the same length and will cycle.

The log1p() function returns the logarithm of values to which one has been added (i.e., log(x + 1)). Like the expm1() function, log1p() is useful for data where the smallest value is zero. The first argument of the function takes values of the logical, integer, or double type (logical values are coerced to the integer type). The integer and double values must be greater than or equal to minus one (i.e., ≥ − 1). The second argument takes the same types of values as in the log() function.

The value returned for the log of zero, one, or Inf varies with the value of the base. For the two functions for which the base can be assigned, the log of zero returns -Inf for all bases except zero and Inf. For zero or Inf, taking the log of zero returns NaN.

Taking the log of one returns zero for all bases except when the base is set to one. For one, taking the log of one returns NaN.

The log of Inf returns -Inf when the base is set to legal values less than one (i.e., 0 ≤ base < 1) and Inf for values of base greater than or equal to one, except when the value of base is Inf (i.e., when 1 ≤ base < Inf). When the base is set equal to Inf, the function returns NaN. All other legal values for the two arguments give results other than NaN.

For more information about the functions, use the Help tab in RStudio or enter __?exp__ at the R prompt.

In [5]:
?exp

0,1
log {base},R Documentation

0,1
x,a numeric or complex vector.
base,a positive or complex number: the base with respect to which logarithms are computed. Defaults to e=exp(1).


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Beta and Gamma Related Functions <a class="anchor" id="DS102L1-WS.3_page_6"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Beta and Gamma Related Functions

The functions related to the beta and gamma functions are beta(), lbeta(), gamma(), lgamma(), psigamma(), bigamma(), trigamma(), choose(), lchoose(), factorial(), and lfactorial(). In R, these functions are called the Special functions. The arguments to these functions must be of the logical, integer, or double type (logical values are coerced to the integer type). The function returns a result with the same dimensions as the argument. The elements of the arguments cycle.

The beta() and lbeta() functions take the a and b arguments and return the value of the beta function or the natural logarithm of the value of the beta function, respectively. Both a and b must be nonnegative. Negative numbers return NaN, with a warning.

The gamma(), lgamma(), psigamma(), digamma(), and trigamma() functions take the x argument and, for psigamma(), the deriv argument. The x argument can be any number except zero or the negative integers (for which NaNs are returned with a warning). The gamma() and lgamma() functions return the value of the gamma function and the natural logarithm of the absolute value of the gamma function, respectively. The psigamma() function returns the derivative of the natural logarithm of the gamma function to the order given by deriv. The deriv argument must be set to an integer greater than or equal to zero. Otherwise, NaNs are returned, with a warning. By default, deriv equals zero. The digamma() function returns the value of the first derivative of the natural logarithm of the gamma function, while trigamma() returns the second derivative.

The choose() and lchoose() functions return binomial coefficients and the natural logarithms of the absolute values of binomial coefficients, respectively. The choose() function is the familiar “n choose k” if n is a positive integer and k is a nonnegative integer less than or equal to n.

Both functions take the n and k arguments. The n argument can be any object or expression of the logical, integer, or double type (logical values are coerced to the integer type). The k argument can be any object or expression of the logical, integer, or double type (logical values are coerced to the integer type and double-precision values are rounded to integers). The arguments need not be the same length and cycle out to the longer argument. If k contains numbers that are negative when rounded, the function returns 0 for the numbers.

The factorial() and lfactorial() functions return the factorial value and the natural logarithm of the absolute value of the factorial value, respectively. The factorial of a number is defined as gamma(x+1) for any value of x that is a real number. For x a positive integer, the factorial equals x factorial (i.e., (x)(x-1)(x-2)...(2)(1), denoted by x!).

The functions take one argument, x. The value of x can be any object or expression of the logical, integer, or double type (logical values are coerced to the integer type). For x equal to zero, factorial(x) equals one. Negative integers return NaNs with a warning.

ou can find more information about the functions by using the Help tab in RStudio or by entering __?Special__ at the R prompt.

$B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$

$B(a, b) = \int_0^1 t^{a-1} (1-t)^{b-1} dt$

In [6]:
?Special

0,1
Special {base},R Documentation

0,1
"a, b",non-negative numeric vectors.
"x, n",numeric vectors.
"k, deriv",integer vectors.


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Miscellaneous Mathematical Functions <a class="anchor" id="DS102L1-WS.3_page_7"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Miscellaneous Mathematical Functions

Some other mathematical functions include the following:

- sum() for the sum of the elements of an object
- prod() for the product of the elements of an object
- cumprod() for the cumulative product over an atomic object
- cumsum() for the cumulative sum over an atomic object
- mean() for the mean of the elements of an object
- range() for the range of the elements of an object
- rank() for the ranks of the elements of an object
- order() for indices giving the order of the elements of an object; with more than one object, the order of the first object, using the second object for ties, and so forth; used to reorder vectors, matrices, data frames, and arrays; x[order(x)] equals sort(x)
- sort() for sorting the elements of objects
- max() for the maximum of the elements in an object, can be character
- min() for the minimum of the elements in an object, can be character
- cummax() for the cumulative maximum over an atomic object
- cummin() for the cumulative minimum over an atomic object
- pmax() for multiple vectors or matrices (will cycle)—returns the maximum across rows between objects
- pmin() for multiple vectors or matrices (will cycle)—returns the minimum across rows between objects
- abs() for the absolute values of the elements of an object
- sign() for the signs of the elements of an object—returns 1 for positive numbers, –1 for negative numbers, and 0 for zeroes
- sqrt() for the square roots of the elements of an object
- ceiling() for rounding the elements of an object up to an integer
- floor() for rounding the elements of an object down to an integer
- trunc() for truncating the elements of an object to the decimal point
- zapsmall() for setting very small numbers to zero

Atomic vectors, matrices, arrays, and data frames of the legal types can be used for these functions. The results of these functions are various kinds of objects, depending on the function. For some of the functions, the result returns a property of the data. For the other functions, the function is applied elementwise, and the result has the same dimensions as the argument.

You can find more information about any of these functions by using the Help tab in RStudio or by entering __?function.name__ at the R prompt, where function.name is the name of the function. for instance:

In [7]:
?sum

0,1
sum {base},R Documentation

0,1
...,numeric or complex or logical vectors.
na.rm,logical. Should missing values (including NaN) be removed?


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - Complex Numbers <a class="anchor" id="DS102L1-WS.3_page_8"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Complex Numbers

The following functions are for complex numbers:

- Re(): The real part of a complex number
- Img(): The imaginary part of a complex number
- Arg(): The angle from the x axis in radians of the line between the origin and the complex number
- Mod(): The modulus of a complex number; equals the length of the line between the origin and the complex number
- Conj(): The complex conjugate of a complex number

The functions take objects or expressions of the logical, integer, double, or complex type for arguments. Values of the logical type are coerced to the integer type. The result has the same dimensions as the argument.

You can find more information about the complex functions by using the Help tab in RStudio or by entering ?complex at the R prompt.

In [8]:
?complex

0,1
complex {base},R Documentation

0,1
length.out,"numeric. Desired length of the output vector, inputs being recycled as needed."
real,numeric vector.
imaginary,numeric vector.
modulus,numeric vector.
argument,numeric vector.
x,"an object, probably of mode complex."
z,"an object of mode complex, or one of a class for which a methods has been defined."
...,further arguments passed to or from other methods.


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Matrices, Arrays, and Data Frames <a class="anchor" id="DS102L1-WS.3_page_9"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Matrices, Arrays, and Data Frames

There are functions for matrices, arrays, and data frames in the base package.

Some of the functions include the following:

- aperm(), which permutes an array.
- kronecker(), which returns the matrix or array that is the Kronecker product of two objects and where product is a specified function. The two objects can be vectors, matrices, and/or arrays. The dimensions of the result are the products of the dimensions of the two objects.
- append(), which appends elements to a vector (including lists and data frames) at a specified location on the vector. Lists and data frames return a list.
- col(), which returns a matrix of the same dimensions as the argument and which contains the column indices in the columns or a matrix of factors with each column one factor.
- row(), which returns a matrix of the same dimensions as the argument and which contains the row indices in the rows or a matrix of factors with each row one factor.
- slice.index(), which generalizes row() and col() to arrays, more than one dimension can be selected.
- colMeans(), which returns the means of the columns of a data frame or matrix or the means for given dimensions for an array—going from the first dimension to the specified dimension.
- rowMeans(), which returns the means of the rows of a data frame or matrix or the means over dimensions of an array—going from the specified dimension plus one to the last dimension.
- colSums(), which returns the sums of the columns of a data frame or matrix or the sums for an array—going from the first dimension to the specified dimension.
- rowSums(), which returns the sums of the rows or a data frame or matrix—going from the specified dimension plus one to the last dimension.
- rowsum(), which sums over rows of a matrix or data frame in groups set by the group variable.
- determinant(), which returns the modulus, or the logarithm of the modulus of the determinant, and the sign of the modulus.
- eigen(), which returns the eigenvalues and eigenvectors of a matrix.
- kappa(), which calculates the condition of a square matrix.
- norm(), which returns the norm of a matrix calculated by the one, infinity, Frobenius, maximum modulus, or spectral (or 2) method

Some functions used in model fitting are the following:
- backsolve(), which solves a matrix equation where the matrix on the left of the equation is upper triangular
- forwardsolve(), which solves a matrix equation where the matrix on the left of the equation is lower triangular
- chol(), the Cholesky decomposition of a square positive definite matrix
- chol2inv(), the inverse of a positive definite matrix using the Cholesky decomposition of the matrix
- qr(), the QR decomposition of a matrix
- svd(), a singular value decomposition of a matrix



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - A Few Other Functions and Some Comments <a class="anchor" id="DS102L1-WS.3_page_10"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## A Few Other Functions and Some Comments

A few other functions that are often useful are:

__R.home(), R.Version(), dir(), getwd(), setwd(), system(), all.equal(), Identical(), unique(), duplicated(), (and anyDuplicated()), rle(), (and inverse.rle()), jitter(), pretty(), cut(), rev(), hexamode(), margin.table(), prop.table(), try(), warning(), suppressWarnings(), warnings(), stop() and gc()__.

For the functions, we will just describe what they do. You can find more information about the functions by using the Help tab in RStudio or by entering ?function.name at the R prompt, where function.name is the name of the function.

The following are the function descriptions:

- R.home() gives the full path to the directory containing the R program.
- R.Version() gives the R version and other information about the version.
- dir() returns the contents of a directory on the hard drive.
- getwd() returns the computer directory (folder) in which R is operating.
- setwd() sets the default computer directory (folder) in which R reads and saves files.
- system() runs a system command from inside R (the command is entered in quotes).
- all.equal() tests if two objects are nearly equal.
- Identical() tests if two objects are identically equal.
- unique() returns a vector with any duplicated elements in the original vector removed. The function only works on vectors, including vectors of the list type.
- duplicated() and anyDuplicated() look for duplicates. For vectors, including lists, duplicated() returns a vector of the same length containing FALSE for elements that are not duplicated and for the first instance of elements that are duplicated. The function returns TRUE for the rest of the duplicates. For matrices and data frames, rows are compared. The function anyDuplicated() counts how many differing elements have duplicates, or duplicated rows for matrices and data frames.
- rle() (and inverse.rle()) returns the value and number of times the value is repeated (consecutively) in a vector, or reverses the process.
- jitter() adds a little jitter (noise) to the elements of numeric objects. The arguments to jitter() control how much jitter is added.
- pretty() takes any object that can be coerced to numeric and returns a vector of evenly spaced values close to a given length and similar to the values in the original object.
- cut() cuts a numeric vector into factors and returns a factor vector with the factor names in the place of the original elements. The object to be cut can be any object that can be coerced to vector but must be numeric. The break points and factor names can be assigned, but cut() creates break points and factor names from the break points by default.
- rev() reverses the order of the elements of an object and returns a vector. The object can be atomic or any type where reversing the order makes sense, like the list, expression, and call types.
- hexmode() returns the hexadecimal value of a number.
- margin.table() takes a logical, numeric, or complex object and returns margin sums for a margin in a table.
- prop.table() takes a logical, numeric, or complex object and returns the object divided by the sum of the elements in the object. Logical objects are coerced to numeric, and the real and imaginary parts of complex objects are treated separately.
- try() attempts to execute an expression or function and returns an error message or the result of the execution. Errors do not stop the program.
- warning() generates warning messages from within an expression or function.
- warnings() returns the warning messages if a program has run with warnings.
- suppressWarnings() suppresses warnings generated by an expression.
- suppressMessages() suppresses messages generated by an expression
- stop() tells R to stop the execution of a function. If stop() has a character string for an argument, the character string prints when stop() executes. The function is very useful for the process of debugging a function as well as for checking if conditions are met for objects entered into a function.
- gc() garbage collection—cleans up the session.

There are many other functions in the base package, many of which have to do with the running of R. The as. and is. functions are prevalent. On the help page for the base package, there are 53 links for as. functions and 43 links for is. functions. The Bessel functions and bitwise logical functions are also part of the base package.

If you are interested in what is in the listings, select the link to the base package under the Packages tab in RStudio or enter help(package=base) at the R prompt.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 11 - The stats Package <a class="anchor" id="DS102L1-WS.3_page_11"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## The stats Package

The stats package contains items such as basic descriptive statistics, probability distributions, tests, functions to fit models, clustering functions, some plotting functions, and other functions used for outputting results. The list of links to the help pages for the stats package is 18 pages long (enter help(package=stats) at the R prompt to see the list). In this section, we cover the basic descriptive statistics, the tests, clustering and other functions for multivariate data, and modeling functions, but in little detail. 

### Basic Descriptive Statistics
Some of the basic statistical functions in the stats package include the following (note that the mean() function is in the base package):
weighted.mean(), which finds the weighted mean of an object

- sd(), which finds the standard deviation of an object
- var(), which finds the variance of a vector or the covariance matrix of a matrix or data frame
- cov(), which finds the covariance matrix of a matrix or data frame—more flexible than var()
- cov.wt(), which finds the weighted covariance or correlation matrix of a matrix or data frame
- cor(), which finds the correlation between vectors or within matrices and data frames
- cov2cor(), which converts a covariance matrix (or other symmetric positive definite matrix) to a correlation matrix
- median(), which finds the median of the elements of an object
- mad(), which finds the median absolute deviation of the elements of an object
- IQR(), which finds the interquartile range of the elements of an object
- quantile(), which finds specific quantiles of the elements in an object
- fivenum(), which finds Tukey’s five-number summary for the elements in an object (the summary() function also returns the five-number summary, along with the mean of the elements)
- boxplot.stats(), returns a four-element list, with the statistics used in boxplots as the first element
- ave(), which uses a function to operate on groups of rows, for an object with rows, based on factor values
- xtabs(), which creates a contingency table based on a formula
- cancor(), which finds the canonical correlation between two matrices
- dist(), which finds a type of average difference between the rows of a matrix, based on the type of distance and the power used to find the average
- mahalanobis(), which finds the Mahalanobis distance between rows of a matrix
- r2dtable(), which creates a random two-way table based on marginal values—using Patefield’s algorithm
- simulate(), which simulates observations from a model that has been fitted

You can find more information about the functions by using the Help tab in RStudio or by entering ?function.name at the R prompt where function.name is the name of the function.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 12 - Some Functions That Do Tests <a class="anchor" id="DS102L1-WS.3_page_12"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Some Functions That Do Tests

There are functions in the stats package that do hypothesis tests. Some of the functions include the following:

- ansari.test() for the Ansari-Bradley test for testing for a difference between the scale parameters of two samples.
- bartlett.test() for the homogeneity of variances.
- binomial.test() for exact tests using the binomial distribution.
- Box.test() for the Box-Pierce and Ljung-Box tests—used in time series to test for independence.
- chisq.test() for testing count data using Pearson’s test.
- cor.test() for correlations in paired samples.
- fisher.test() for contingency tables using Fisher’s exact test.
- fligner.test() for the Fligner-Killeen test for homogeneity of variances.
- friedman.test() for the Friedman rank sum test.
- kruskal.test() for the Kruskal-Wallis rank sum test.
- ks.test() for the Kolmogorov-Smirnov tests on one or two samples.
- mantelhaen.test() for the Cochran-Mantel-Haenszel chi-squared test for count data.
- mauchly.test() for the test of sphericity developed by Mauchly.
- mcnemar.test() for the chi-squared test for count data developed by McNemar.
- mood.test() for the two sample tests of scale developed by Mood.
- oneway.test() for testing for equal means if the layout is one way.
- pairwise.prop.test() for comparing proportions pairwise.
- pairwise.t.test() for comparing t tests pairwise.
- pairwise.wilcox.test() for comparing Wilcox on rank sum tests pairwise.
- poisson.test() for an exact test using the Poisson distribution.
- power.anova.test() to find powers for a balanced one-way analysis of variance.
- power.prop.test() to find the powers for comparing two proportions.
- power.t.test() for the powers in one and two sample t tests.
- PP.test() for the Phillips-Perron test to test for unit roots in time series data.
- prop.test() for testing proportions.
- prop.trend.test() for testing trend in proportions.
- quade.test() for the Quade test.
- shapiro.test() for the Shapiro-Wilk test for normality.
- t.test() for doing a t test.
- TukeyHSD() finds confidence intervals for the coefficients of a model that take into account that more than one hypothesis is being tested—for analysis of variance models.
- var.test() for an F test to compare two variances.
- wilcox.test() for Wilcoxon rank sum and sign tests.

For more information about any of the tests, use the Help tab in RStudio or enter ?function.name at the R prompt, where function.name is the name of the function.


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 13 - Modeling Functions in stats <a class="anchor" id="DS102L1-WS.3_page_13"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

## Some Modeling Functions in stats

There are functions in the stats package that do modeling, including the following:
Time Series:

- acf() to estimate autocorrelation and autocovariance in time series
- pacf() to estimate partial autocovariances and autocorrelations for a time series
- ccf() to estimate cross correlation and cross covariance for two time series
- acf2AR() to exactly fit an autoregressive model to an autocorrelation function
- ar() to fit a time series autoregressive model
- arima() to fit an autoregressive integrated moving average to time series data
- arima.sim() to do simulations from an ARIMA model
- cpgram() to plot a cumulative periodogram for time series data
- spec() to find the spectral density for time series data
- fft() for fast discrete Fourier transforms for time series data
- mvfft() for fast discrete Fourier transforms for matrices
- filter() for linear filtering of time series
- KalmanForcast(), KalmanLike(), KalmanRun(), KalmanSmooth(), and makeARIMA() for Kalman filtering
- decompose() to decompose seasonal patterns using moving average
- stl() to use the loess method to seasonally decompose a time series
- StrucTS() to fit a structural time series model

Models:
- aov() to fit an analysis of variance model
- approx() and approxfun() to do linear interpolation
- density() for kernel density estimation
- ecdf() for the empirical cumulative distribution function
- glm() to fit a generalized linear model
- isoreg() isotonic or monotone regression
- line() to fit a line robustly—based on Tukey’s Exploratory Data Analysis
- lm() to fit a linear model
- loess() to fit a local polynomial model
- loglin() to fit a loglinear model
- lsfit() to fit a least squared linear model with one explanatory variable
- manova() to fit multiple analysis of variance models
- medpolish() for a median polish of a matrix
- nls() to fit a nonlinear least squares model
- ppr() to fit a projection pursuit regression model
- smooth.spline() to fit a smooth spline model

Smoothers:

- smooth() to smooth using a kernel smoother
- smooth() which creates a smoother version of a noisy set of data using Tukey’s running median smoothers—usually used for time series
- supsmu() for Friedman’s super smoother

Tools:

- add1() to find those single terms that can be added or dropped from a model, fit the models, and tabulate the results of the fitting
- AIC() and BIC() to find Akaike’s “Information Criterion” or the “Schwarz-Bayesian criterion” for an appropriate model
- complete.cases() to find complete cases for a sequence of vectors, matrices, or data.frames
- step() to use the AIC to choose a model using a stepwise algorithm
- update() for updating a model
- contrasts() to set or get contrasts for a factor object
- poly() and polym() to create orthogonal polynomials of the desired degree
- nlm() to find a minimum of a nonlinear model
- optim(), optimHess(), optimise(), and optimize() to optimize a function
- profile() to profile models (generic function)

There are many functions in the stats package that support the modeling functions. You can find more information at the help pages for the individual functions: either use the Help tab in RStudio or enter ?function.name at the R prompt where function.name is the name of the function.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 14 - Clustering Algorithms and Other Multivariate Techniques<a class="anchor" id="DS102L1-WS.3_page_14"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

Some of the functions used in multivariate analysis for clustering and working with multivariate data are the following:

- cut.dendrogram() for a general tree structure.
- dendrapply() to apply a function to all nodes of a dendrogram.
- as.dendrogram() to give an appropriate object to the class dendrogram.
- labels.dendrogram() gives the ordering of or the labels of the leaves on a dendrogram.
- merge.dendrogram() merges two dendrograms.
- order.dendrogram() gives the ordering or the labels of the leaves of a dendrogram.
- reorder.dendrogram() for reordering a dendrogram maintaining the initial constraints.
- rev.dendrogram() reverses the order of the nodes in a dendrogram.
- str.dendrogram() displays the internal structure of a dendrogram.
- cutree() for cutting a tree into groups.
- hclust() for hierarchical clustering.
- identify.hclust() to identify clusters.
- kmeans() for k-means clustering.
- prcomp() does principal component analysis.
- princomp() also does principal component analysis.
- cmdscale() for classical multidimensional scaling.
- cophenetic() for cophenetic distances in hierarchical clustering.
- factanal() for factor analysis.
- loadings() printing loadings from a factor analysis.
- promax() used for rotation of axes in factor analysis.
- varimax() used for rotation of axes in factor analysis.

The stats package also contains several probability distributions; eight as. functions; six is. functions; a number of plotting functions (like heatmap()) and 19 plot. functions (which are specific for many of the classes associated with modeling functions); functions used in kernel estimation; ancillary functions for models (like the seven model. functions); seven na. functions (to handle missing data); 13 predict. functions (for model output), 28 print. functions (for printing output); and nine summary. functions (for summarizing output).

For more information about any of the functions, use the Help tab in RStudio or enter ?function.name at the R prompt where function.name is the name of the function.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 15 - The graphics Package <a class="anchor" id="DS102L1-WS.3_page_15"></a>

[Back to Top](#DS102L1-WS.3_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

The graphics package does not contain the plot() function anymore (the function was recently moved to the base package). The graphics package does contain several methods for plot(). The ancillary functions for plot() are also in the graphics package. There are several plotting functions for specific types of plots (like histograms and bar charts). The list of links to the help pages for graphics is three pages long (from entering help(package=graphics) at the R prompt). In this section, we cover the specific types of plots and a few other functions related to plotting.

The following are the functions in the graphics package that do specific types of plots:

- assocplot() for a Cohen-Friendly association plot; used for contingency tables; will work with any matrix that is logical or numeric.
- barplot() for a bar plot; takes vector or matrix objects, which are of mode logical or numeric, for the heights of the bars.
- boxplot() for box plots; logical or numeric vectors, matrices, arrays, data frames, and some lists can be used as input to the function.
- bxp() for box plots of summaries.
- cdplot() for a conditional density plot.
- coplot() for scatter plots using a conditioning variable.
- contour() and filled.contour() for a contour plot and a contour plot where the regions between the contours are filled with different colors.
- curve() for plotting a one-variable function that is a one-to-one mapping.
- dotchart() for a Cleveland dot plot; numeric vectors and matrices can be used for the plot.
- fourfoldplot() for a fourfold plot of 2 x 2 x k contingency tables.
- hist() for histograms; gives histograms for numeric vectors, matrices, and arrays.
- matplot() for plotting a vector or the columns of a matrix on a single plot.
- mosaicplot() for mosaic plots; takes numeric or logical arguments that are vectors, matrices, data frames, or arrays; is meant for contingency tables.
- pairs() for scatter plots of paired variables; takes numeric vectors, matrices, and data frames as input; creates a matrix of plots.
- persp() for a perspective plot; does three-dimensional plotting.
- pie() for pie charts; uses numeric vectors, matrices, and arrays as input.
- smoothScatter() for a smoothed version of scatter plots—which are colored; is copyrighted by M. P. Wand.
- spineplot() for spine plots; uses a logical, numeric, or complex matrix as input to the plot; logical and complex matrices are coerced to numeric; was developed for two-way contingency tables.
- stars() for star or segment plots; uses a numeric matrix or data frame for the input to the plot.
- stem() for a stem and leaf plot; uses a numeric vector, matrix, or array as the input to the plot
- stripchart() for a one-dimensional scatter plot.
- sunflowerplot() for a sunflower plot, which is a scatter plot in which points with duplicates have sunflower leaves for the duplicated points; uses a logical, numeric, or complex vector, matrix, or data frame for the input to the plot.

There are also some functions in the graphics package that control the screen for plotting functions. The splitscreen() function and its ancillary functions, close.screen(), erase.screen(), and screen(), are used to split the plotting screen into regions and to plot to the regions. The frame() and plot.new() functions open a new frame for plotting.

The par() function in the graphics package is like options(), except the function is for plotting options and contains the options used by many plotting functions. When a session starts, the options in par() are the default options. To see the list of options, call par() with no arguments. The options can be changed at any time (in the same way that options are changed in options()). Calling par() opens a new plotting frame.

The function plot() (in the base package) is the basic plotting function in R and has a number of ancillary functions in the graphics package (like lines(), points(), and box()). Twelve methods for plot() are found in the graphics package. We do not cover plot() in this book.

You can find more information about the functions in the graphics package by using the Help tab in RStudio or by entering ?function.name at the R prompt where function.name is the name of the function.