6 CHAPTER 2.AN INTRODUCTION TO R http://ipsur.r-forge.r-project.org/book/download/R.exe Use the downloaded shortcut to run R. Steps 3 and 4 are not required but save you the trouble of navigating to the R-x.y.z/bin directory to double-click Rgui.exe every time you want to run the program.It is useless to create your own shortcut to Rgui.exe.Windows does not allow shortcuts to have relative paths;they always have a drive letter associated with them.So if you make your own shortcut and plug your USB drive into some other machine that happens to assign your drive a different letter,then your shortcut will no longer be pointing to the right place. 2.1.2 Installing and Loading Add-on Packages There are base packages (which come with R automatically),and contributed packages (which must be downloaded for installation).For example,on the version of R being used for this document the default base packages loaded at startup are getoption("defaultPackages") [1]"datasets""utils" "grDevices""graphics""stats" "methods" The base packages are maintained by a select group of volunteers,called "R Core".In addition to the base packages,there are literally thousands of additional contributed packages written by individuals all over the world.These are stored worldwide on mirrors of the Compre- hensive R Archive Network,or CRAN for short.Given an active Internet connection,anybody is free to download and install these packages and even inspect the source code. To install a package named foo,open up R and type install.packages("foo").To install foo and additionally install all of the other packages on which foo depends,instead type install.packages("foo",depends TRUE). The general command install.packages()will (on most operating systems)open a window containing a huge list of available packages;simply choose one or more to install. No matter how many packages are installed onto the system,each one must first be loaded for use with the library function.For instance,the foreign package [18]contains all sorts of functions needed to import data sets into R from other software such as SPSS,SAS,etc..But none of those functions will be available until the command library(foreign)is issued. Type library()at the command prompt (described below)to see a list of all available packages in your library. For complete,precise information regarding installation of R and add-on packages,see the R Installation and Administration manual,http://cran.r-project.org/manuals.html. 2.2 Communicating with R One line at a time This is the most basic method and is the first one that beginners will use. RGui(MicrosoftR Windows) Terminal Emacs/ESS,XEmacs JGR
6 CHAPTER 2. AN INTRODUCTION TO R http://ipsur.r-forge.r-project.org/book/download/R.exe Use the downloaded shortcut to run R. Steps 3 and 4 are not required but save you the trouble of navigating to the R-x.y.z/bin directory to double-click Rgui.exe every time you want to run the program. It is useless to create your own shortcut to Rgui.exe. Windows does not allow shortcuts to have relative paths; they always have a drive letter associated with them. So if you make your own shortcut and plug your USB drive into some other machine that happens to assign your drive a different letter, then your shortcut will no longer be pointing to the right place. 2.1.2 Installing and Loading Add-on Packages There are base packages (which come with R automatically), and contributed packages (which must be downloaded for installation). For example, on the version of R being used for this document the default base packages loaded at startup are > getOption("defaultPackages") [1] "datasets" "utils" "grDevices" "graphics" "stats" "methods" The base packages are maintained by a select group of volunteers, called “R Core”. In addition to the base packages, there are literally thousands of additional contributed packages written by individuals all over the world. These are stored worldwide on mirrors of the Comprehensive R Archive Network, or CRAN for short. Given an active Internet connection, anybody is free to download and install these packages and even inspect the source code. To install a package named foo, open up R and type install.packages("foo"). To install foo and additionally install all of the other packages on which foo depends, instead type install.packages("foo", depends = TRUE). The general command install.packages() will (on most operating systems) open a window containing a huge list of available packages; simply choose one or more to install. No matter how many packages are installed onto the system, each one must first be loaded for use with the library function. For instance, the foreign package [18] contains all sorts of functions needed to import data sets into R from other software such as SPSS, SAS, etc.. But none of those functions will be available until the command library(foreign) is issued. Type library() at the command prompt (described below) to see a list of all available packages in your library. For complete, precise information regarding installation of R and add-on packages, see the R Installation and Administration manual, http://cran.r-project.org/manuals.html. 2.2 Communicating with R One line at a time This is the most basic method and is the first one that beginners will use. RGui (Microsoftr Windows) Terminal Emacs/ESS, XEmacs JGR
2.2.COMMUNICATING WITH R 7 Multiple lines at a time For longer programs (called scripts)there is too much code to write all at once at the command prompt.Furthermore,for longer scripts it is convenient to be able to only modify a certain piece of the script and run it again in R.Programs called script editors are specially designed to aid the communication and code writing process.They have all sorts of helpful features including R syntax highlighting,automatic code completion,delimiter matching,and dynamic help on the R functions as they are being written.Even more,they often have all of the text editing features of programs like MicrosoftR Word.Lastly,most script editors are fully customizable in the sense that the user can customize the appearance of the interface to choose what colors to display,when to display them,and how to display them. R Editor (Windows):In MicrosoftR Windows,RGui has its own built-in script editor,called R Editor.From the console window,select FileNew Script.A script window opens, and the lines of code can be written in the window.When satisfied with the code.the user highlights all of the commands and presses Ctrl+R.The commands are automatically run at once in R and the output is shown.To save the script for later,click File Save as... in R Editor.The script can be reopened later with FileOpen Script...in RGui.Note that R Editor does not have the fancy syntax highlighting that the others do. RWinEdt:This option is coordinated with WinEdt for ITEX and has additional features such as code highlighting,remote sourcing,and a ton of other things.However,one first needs to download and install a shareware version of another program,WinEdt,which is only free for a while-pop-up windows will eventually appear that ask for a registration code. RWinEdt is nevertheless a very fine choice if you already own WinEdt or are planning to purchase it in the near future. Tinn-R/Sciviews-K:This one is completely free and has all of the above mentioned options and more.It is simple enough to use that the user can virtually begin working with it immediately after installation.But Tinn-R proper is only available for MicrosoftR Windows operating systems.If you are on MacOS or Linux,a comparable alternative is Sci-Views-Komodo Edit. Emacs/ESS:Emacs is an all purpose text editor.It can do absolutely anything with respect to modifying,searching,editing,and manipulating,text.And if Emacs can't do it,then you can write a program that extends Emacs to do it.Once such extension is called ESS. which stands for Emacs Speaks Statistics.With ESS a person can speak to R,do all of the tricks that the other script editors offer,and much,much,more.Please see the following for installation details,documentation,reference cards,and a whole lot more: http://ess.r-project.org Fair warning:if you want to try Emacs and if you grew up with MicrosoftR Windows or Macintosh,then you are going to need to relearn everything you thought you knew about computers your whole life.(Or,since Emacs is completely customizable,you can reconfigure Emacs to behave the way you want.)I have personally experienced this transformation and I will never go back. JGR(read "Jaguar"):This one has the bells and whistles of RGui plus it is based on Java, so it works on multiple operating systems.It has its own script editor like R Editor but with additional features such as syntax highlighting and code-completion.If you do not use MicrosoftR Windows (or even if you do)you definitely want to check out this one
2.2. COMMUNICATING WITH R 7 Multiple lines at a time For longer programs (called scripts) there is too much code to write all at once at the command prompt. Furthermore, for longer scripts it is convenient to be able to only modify a certain piece of the script and run it again in R. Programs called script editors are specially designed to aid the communication and code writing process. They have all sorts of helpful features including R syntax highlighting, automatic code completion, delimiter matching, and dynamic help on the R functions as they are being written. Even more, they often have all of the text editing features of programs like Microsoftr Word. Lastly, most script editors are fully customizable in the sense that the user can customize the appearance of the interface to choose what colors to display, when to display them, and how to display them. R Editor (Windows): In Microsoftr Windows, RGui has its own built-in script editor, called R Editor. From the console window, select File ⊲ New Script. A script window opens, and the lines of code can be written in the window. When satisfied with the code, the user highlights all of the commands and presses Ctrl+R. The commands are automatically run at once in R and the output is shown. To save the script for later, click File ⊲ Save as... in R Editor. The script can be reopened later with File ⊲ Open Script... in RGui. Note that R Editor does not have the fancy syntax highlighting that the others do. RWinEdt: This option is coordinated with WinEdt for LATEX and has additional features such as code highlighting, remote sourcing, and a ton of other things. However, one first needs to download and install a shareware version of another program, WinEdt, which is only free for a while – pop-up windows will eventually appear that ask for a registration code. RWinEdt is nevertheless a very fine choice if you already own WinEdt or are planning to purchase it in the near future. Tinn-R/Sciviews-K: This one is completely free and has all of the above mentioned options and more. It is simple enough to use that the user can virtually begin working with it immediately after installation. But Tinn-R proper is only available for Microsoftr Windows operating systems. If you are on MacOS or Linux, a comparable alternative is Sci-Views - Komodo Edit. Emacs/ESS: Emacs is an all purpose text editor. It can do absolutely anything with respect to modifying, searching, editing, and manipulating, text. And if Emacs can’t do it, then you can write a program that extends Emacs to do it. Once such extension is called ESS, which stands for Emacs Speaks Statistics. With ESS a person can speak to R, do all of the tricks that the other script editors offer, and much, much, more. Please see the following for installation details, documentation, reference cards, and a whole lot more: http://ess.r-project.org Fair warning: if you want to try Emacs and if you grew up with Microsoftr Windows or Macintosh, then you are going to need to relearn everything you thought you knew about computers your whole life. (Or, since Emacs is completely customizable, you can reconfigure Emacs to behave the way you want.) I have personally experienced this transformation and I will never go back. JGR (read “Jaguar”): This one has the bells and whistles of RGui plus it is based on Java, so it works on multiple operating systems. It has its own script editor like R Editor but with additional features such as syntax highlighting and code-completion. If you do not use Microsoftr Windows (or even if you do) you definitely want to check out this one
8 CHAPTER 2.AN INTRODUCTION TO R Kate,Bluefish,etc.There are literally dozens of other text editors available,many of them free,and each has its own(dis)advantages.I only have mentioned the ones with which I have had substantial personal experience and have enjoyed at some point.Play around, and let me know what you find. Graphical User Interfaces(GUIs)By the word"GUI"I mean an interface in which the user communicates with R by way of points-and-clicks in a menu of some sort.Again,there are many,many options and I only mention ones that I have used and enjoyed.Some of the other more popular script editors can be downloaded from the R-Project website at http://www.sciviews.org/_ On the left side of the screen (under Projects)there are several choices available. R Commander provides a point-and-click interface to many basic statistical tasks.It is called the"Commander"because every time one makes a selection from the menus,the code corresponding to the task is listed in the output window.One can take this code,copy- and-paste it to a text file,then re-run it again at a later time without the R Comman- der's assistance.It is well suited for the introductory level.Rcmdr also allows for user- contributed"Plugins"which are separate packages on CRAN that add extra functionality to the Rcmdr package.The plugins are typically named with the prefix RcmdrPlugin to make them easy to identify in the CRAN package list.One such plugin is the RcmdrPlugin.IPSUR package which accompanies this text. Poor Man's GUI is an alternative to the Rcmdr which is based on GTk instead of Tcl/Tk.It has been a while since I used it but I remember liking it very much when I did.One thing that stood out was that the user could drag-and-drop data sets for plots.See here for more information:http://wiener.math.csi.cuny.edu/pmg/. Rattle is a data mining toolkit which was designed to manage/analyze very large data sets,but it provides enough other general functionality to merit mention here.See [91]for more information. Deducer is relatively new and shows promise from what I have seen,but I have not actually used it in the classroom yet. 2.3 Basic R Operations and Concepts The R developers have written an introductory document entitled"An Introduction to R".There is a sample session included which shows what basic interaction with R looks like.I recom- mend that all new users of R read that document,but bear in mind that there are concepts mentioned which will be unfamiliar to the beginner. Below are some of the most basic operations that can be done with R.Almost every book about R begins with a section like the one below;look around to see all sorts of things that can be done at this most basic level. 2.3.1 Arithmetic >2+3 add [1]5
8 CHAPTER 2. AN INTRODUCTION TO R Kate, Bluefish, etc. There are literally dozens of other text editors available, many of them free, and each has its own (dis)advantages. I only have mentioned the ones with which I have had substantial personal experience and have enjoyed at some point. Play around, and let me know what you find. Graphical User Interfaces (GUIs) By the word “GUI” I mean an interface in which the user communicates with R by way of points-and-clicks in a menu of some sort. Again, there are many, many options and I only mention ones that I have used and enjoyed. Some of the other more popular script editors can be downloaded from the R-Project website at http://www.sciviews.org/_rgu On the left side of the screen (under Projects) there are several choices available. R Commander provides a point-and-click interface to many basic statistical tasks. It is called the “Commander” because every time one makes a selection from the menus, the code corresponding to the task is listed in the output window. One can take this code, copyand-paste it to a text file, then re-run it again at a later time without the R Commander’s assistance. It is well suited for the introductory level. Rcmdr also allows for usercontributed “Plugins” which are separate packages on CRAN that add extra functionality to the Rcmdr package. The plugins are typically named with the prefix RcmdrPlugin to make them easy to identify in the CRAN package list. One such plugin is the RcmdrPlugin.IPSUR package which accompanies this text. Poor Man’s GUI is an alternative to the Rcmdr which is based on GTk instead of Tcl/Tk. It has been a while since I used it but I remember liking it very much when I did. One thing that stood out was that the user could drag-and-drop data sets for plots. See here for more information: http://wiener.math.csi.cuny.edu/pmg/. Rattle is a data mining toolkit which was designed to manage/analyze very large data sets, but it provides enough other general functionality to merit mention here. See [91] for more information. Deducer is relatively new and shows promise from what I have seen, but I have not actually used it in the classroom yet. 2.3 Basic R Operations and Concepts The R developers have written an introductory document entitled “An Introduction to R”. There is a sample session included which shows what basic interaction with R looks like. I recommend that all new users of R read that document, but bear in mind that there are concepts mentioned which will be unfamiliar to the beginner. Below are some of the most basic operations that can be done with R. Almost every book about R begins with a section like the one below; look around to see all sorts of things that can be done at this most basic level. 2.3.1 Arithmetic > 2 + 3 # add [1] 5
2.3.BASIC R OPERATIONS AND CONCEPTS 9 >4 *5/6 multiply and divide [1]3.333333 >78 7 to the 8th power [1]5764801 Notice the comment character #Anything typed after a symbol is ignored by R.We know that 20/6 is a repeating decimal,but the above example shows only 7 digits.We can change the number of digits displayed with options: options(digits 16) >10/3 see more digits [1]3.333333333333333 sqrt(2) square root [1]1.414213562373095 exp(1) Euler's constant,e [1]2.718281828459045 >pi [1]3.141592653589793 options(digits 7)#back to default Note that it is possible to set digits up to 22,but setting them over 16 is not recommended (the extra significant digits are not necessarily reliable).Above notice the sqrt function for square roots and the exp function for powers of e,Euler's number. 2.3.2 Assignment,Object names,and Data types It is often convenient to assign numbers and values to variables (objects)to be used later.The proper way to assign values to a variable is with the <-operator(with a space on either side). The symbol works too,but it is recommended by the R masters to reserve for specifying arguments to functions (discussed later).In this book we will follow their advice and use < for assignment.Once a variable is assigned,its value can be printed by simply entering the variable name by itself. >x <-7*41/pi don't see the calculated value >X take a look [1]91.35494 When choosing a variable name you can use letters,numbers,dots“.”,or underscore“_” characters.You cannot use mathematical operators,and a leading dot may not be followed by a number.Examples of valid names are:x,x1,y.value,and y_hat.(More precisely,the set of allowable characters in object names depends on one's particular system and locale;see An Introduction to R for more discussion on this. Objects can be of many types,modes,and classes.At this level,it is not necessary to investigate all of the intricacies of the respective types,but there are some with which you need to become familiar:
2.3. BASIC R OPERATIONS AND CONCEPTS 9 > 4 * 5 / 6 # multiply and divide [1] 3.333333 > 7^8 # 7 to the 8th power [1] 5764801 Notice the comment character #. Anything typed after a # symbol is ignored by R. We know that 20/6 is a repeating decimal, but the above example shows only 7 digits. We can change the number of digits displayed with options: > options(digits = 16) > 10/3 # see more digits [1] 3.333333333333333 > sqrt(2) # square root [1] 1.414213562373095 > exp(1) # Euler's constant, e [1] 2.718281828459045 > pi [1] 3.141592653589793 > options(digits = 7) # back to default Note that it is possible to set digits up to 22, but setting them over 16 is not recommended (the extra significant digits are not necessarily reliable). Above notice the sqrt function for square roots and the exp function for powers of e, Euler’s number. 2.3.2 Assignment, Object names, and Data types It is often convenient to assign numbers and values to variables (objects) to be used later. The proper way to assign values to a variable is with the <- operator (with a space on either side). The = symbol works too, but it is recommended by the R masters to reserve = for specifying arguments to functions (discussed later). In this book we will follow their advice and use <- for assignment. Once a variable is assigned, its value can be printed by simply entering the variable name by itself. > x <- 7*41/pi # don't see the calculated value > x # take a look [1] 91.35494 When choosing a variable name you can use letters, numbers, dots “.”, or underscore “_” characters. You cannot use mathematical operators, and a leading dot may not be followed by a number. Examples of valid names are: x, x1, y.value, and y_hat. (More precisely, the set of allowable characters in object names depends on one’s particular system and locale; see An Introduction to R for more discussion on this.) Objects can be of many types, modes, and classes. At this level, it is not necessary to investigate all of the intricacies of the respective types, but there are some with which you need to become familiar:
10 CHAPTER 2.AN INTRODUCTION TO R integer:the values 0,±l,±2,..;these are represented exactly by R. double:real numbers(rational and irrational);these numbers are not represented exactly (save integers or fractions with a denominator that is a multiple of 2,see [85]). character:elements that are wrapped with pairs of"or'; logical:includes TRUE,FALSE,and NA (which are reserved words);the NA stands for "not available",i.e.,a missing value. You can determine an object's type with the typeof function.In addition to the above,there is the complex data type: sqrt(-1) isn't defined [1]NaN sqrt(-1+0i) is defined [1]0+1i sqrt(as.complex(-1))#same thing [1]0+1i >(0+1i)2 should be-1 [1]-1+0i typeof((0 1i)A2) [1]"complex" Note that you can just type (1i)^2 to get the same answer.The NaN stands for "not a number";it is represented internally as double. 2.3.3 Vectors All of this time we have been manipulating vectors of length 1.Now let us move to vectors with multiple entries. Entering data vectors 1.c:If you would like to enter the data 74,31,95,61,76,34,23,54,96 into R,you may create a data vector with the c function (which is short for concatenate). >x<-c(74,31,95,61,76,34,23,54,96) X [1]743195617634235496 The elements of a vector are usually coerced by R to the the most general type of any of the elements,so if you do c(1,"2")then the result will be c("1","2")
10 CHAPTER 2. AN INTRODUCTION TO R integer: the values 0, ±1, ±2, . . . ; these are represented exactly by R. double: real numbers (rational and irrational); these numbers are not represented exactly (save integers or fractions with a denominator that is a multiple of 2, see [85]). character: elements that are wrapped with pairs of " or '; logical: includes TRUE, FALSE, and NA (which are reserved words); the NA stands for “not available”, i.e., a missing value. You can determine an object’s type with the typeof function. In addition to the above, there is the complex data type: > sqrt(-1) # isn't defined [1] NaN > sqrt(-1+0i) # is defined [1] 0+1i > sqrt(as.complex(-1)) # same thing [1] 0+1i > (0 + 1i)^2 # should be -1 [1] -1+0i > typeof((0 + 1i)^2) [1] "complex" Note that you can just type (1i)^2 to get the same answer. The NaN stands for “not a number”; it is represented internally as double. 2.3.3 Vectors All of this time we have been manipulating vectors of length 1. Now let us move to vectors with multiple entries. Entering data vectors 1. c: If you would like to enter the data 74,31,95,61,76,34,23,54,96 into R, you may create a data vector with the c function (which is short for concatenate). > x <- c(74, 31, 95, 61, 76, 34, 23, 54, 96) > x [1] 74 31 95 61 76 34 23 54 96 The elements of a vector are usually coerced by R to the the most general type of any of the elements, so if you do c(1, "2") then the result will be c("1", "2")