2.3.BASIC R OPERATIONS AND CONCEPTS 11 2.scan:This method is useful when the data are stored somewhere else.For instance, you may type x <-scan()at the command prompt and R will display 1:to indicate that it is waiting for the first data value.Type a value and press Enter,at which point R will display 2:,and so forth.Note that entering an empty line stops the scan.This method is especially handy when you have a column of values,say,stored in a text file or spreadsheet.You may copy and paste them all at the 1:prompt,and R will store all of the values instantly in the vector x. 3.repeated data;regular patterns:the seq function will generate all sorts of sequences of numbers.It has the arguments from,to,by,and length.out which can be set in concert with one another.We will do a couple of examples to show you how it works. seq(from =1,to 5) [1]12345 seq(from =2,by =-0.1,length.out 4) [1]2.01.91.81.7 Note that we can get the first line much quicker with the colon operator >1:5 [1]12345 The vector LETTERS has the 26 letters of the English alphabet in uppercase and letters has all of them in lowercase. Indexing data vectors Sometimes we do not want the whole vector,but just a piece of it.We can access the intermediate parts with the [operator.Observe (with x defined above) >x[1] [1]74 >x[2:4] [1]319561 >x[c(1,3,4,8)] [1]74956154 >x[-c(1,3,4,8)] [1]3176342396 Notice that we used the minus sign to specify those elements that we do not want. LETTERS[1:5] [1]"A""B""C""D""E" >1 etters[-(6:24)] [1]"a""b""c""d""e""y""z
2.3. BASIC R OPERATIONS AND CONCEPTS 11 2. scan: This method is useful when the data are stored somewhere else. For instance, you may type x <- scan() at the command prompt and R will display 1: to indicate that it is waiting for the first data value. Type a value and press Enter, at which point R will display 2:, and so forth. Note that entering an empty line stops the scan. This method is especially handy when you have a column of values, say, stored in a text file or spreadsheet. You may copy and paste them all at the 1: prompt, and R will store all of the values instantly in the vector x. 3. repeated data; regular patterns: the seq function will generate all sorts of sequences of numbers. It has the arguments from, to, by, and length.out which can be set in concert with one another. We will do a couple of examples to show you how it works. > seq(from = 1, to = 5) [1] 1 2 3 4 5 > seq(from = 2, by = -0.1, length.out = 4) [1] 2.0 1.9 1.8 1.7 Note that we can get the first line much quicker with the colon operator : > 1:5 [1] 1 2 3 4 5 The vector LETTERS has the 26 letters of the English alphabet in uppercase and letters has all of them in lowercase. Indexing data vectors Sometimes we do not want the whole vector, but just a piece of it. We can access the intermediate parts with the [] operator. Observe (with x defined above) > x[1] [1] 74 > x[2:4] [1] 31 95 61 > x[c(1, 3, 4, 8)] [1] 74 95 61 54 > x[-c(1, 3, 4, 8)] [1] 31 76 34 23 96 Notice that we used the minus sign to specify those elements that we do not want. > LETTERS[1:5] [1] "A" "B" "C" "D" "E" > letters[-(6:24)] [1] "a" "b" "c" "d" "e" "y" "z
12 CHAPTER 2.AN INTRODUCTION TO R 2.3.4 Functions and Expressions A function takes arguments as input and returns an object as output.There are functions to do all sorts of things.We show some examples below. >x<-1:5 sum(x) [1]15 length(x) [1]5 >min(x) [1]1 mean(x) sample mean [1]3 sd(x) sample standard deviation [1]1.581139 It will not be long before the user starts to wonder how a particular function is doing its job, and since R is open-source,anybody is free to look under the hood of a function to see how things are calculated.For detailed instructions see the article"Accessing the Sources"by Uwe Ligges [60].In short: 1.Type the name of the function without any parentheses or arguments.If you are lucky then the code for the entire function will be printed,right there looking at you.For instance,suppose that we would like to see how the intersect function works: intersect function (x,y) y <-as.vector(y) unique(y[match(as.vector(x),y,OL)]) <environment:namespace:base> 2.If instead it shows UseMethod("something")then you will need to choose the class of the object to be inputted and next look at the method that will be dispatched to the object. For instance,typing rev says rev function (x) UseMethod("rev") <environment:namespace:base>
12 CHAPTER 2. AN INTRODUCTION TO R 2.3.4 Functions and Expressions A function takes arguments as input and returns an object as output. There are functions to do all sorts of things. We show some examples below. > x <- 1:5 > sum(x) [1] 15 > length(x) [1] 5 > min(x) [1] 1 > mean(x) # sample mean [1] 3 > sd(x) # sample standard deviation [1] 1.581139 It will not be long before the user starts to wonder how a particular function is doing its job, and since R is open-source, anybody is free to look under the hood of a function to see how things are calculated. For detailed instructions see the article “Accessing the Sources” by Uwe Ligges [60]. In short: 1. Type the name of the function without any parentheses or arguments. If you are lucky then the code for the entire function will be printed, right there looking at you. For instance, suppose that we would like to see how the intersect function works: > intersect function (x, y) { y <- as.vector(y) unique(y[match(as.vector(x), y, 0L)]) } <environment: namespace:base> 2. If instead it shows UseMethod("something") then you will need to choose the class of the object to be inputted and next look at the method that will be dispatched to the object. For instance, typing rev says > rev function (x) UseMethod("rev") <environment: namespace:base>
2.3.BASIC R OPERATIONS AND CONCEPTS 13 The output is telling us that there are multiple methods associated with the rev function. To see what these are,type methods(rev) [1]rev.default rev.dendrogram* Non-visible functions are asterisked Now we learn that there are two different rev(x)functions,only one of which being chosen at each call depending on what x is.There is one for dendrogram objects and a default method for everything else.Simply type the name to see what each method does.For example,the default method can be viewed with rev.default function (x) if (length(x))x[length(x):1L]else x <environment:namespace:base> 3.Some functions are hidden by a namespace(see An Introduction to R [85]),and are not visible on the first try.For example,if we try to look at the code for wilcox.test (see Chapter 15)we get the following: wilcox.test function (x,...) UseMethod("wilcox.test") <environment:namespace:stats> methods(wilcox.test) [1]wilcox.test.default*wilcox.test.formula* Non-visible functions are asterisked If we were to try wilcox.test.default we would get a"not found"error,because it is hidden behind the namespace for the package stats (shown in the last line when we tried wilcox.test).In cases like these we prefix the package name to the front of the function name with three colons;the command stats:wilcox.test.default will show the source code,omitted here for brevity. 4.If it shows Internal(something)or Primitive("something"),then it will be nec- essary to download the source code of R(which is not a binary version with an.exe extension)and search inside the code there.See Ligges [60]for more discussion on this. An example is exp: exp function (x).Primitive("exp") Be warned that most of the Internal functions are written in other computer languages which the beginner may not understand,at least initially
2.3. BASIC R OPERATIONS AND CONCEPTS 13 The output is telling us that there are multiple methods associated with the rev function. To see what these are, type > methods(rev) [1] rev.default rev.dendrogram* Non-visible functions are asterisked Now we learn that there are two different rev(x) functions, only one of which being chosen at each call depending on what x is. There is one for dendrogram objects and a default method for everything else. Simply type the name to see what each method does. For example, the default method can be viewed with > rev.default function (x) if (length(x)) x[length(x):1L] else x <environment: namespace:base> 3. Some functions are hidden by a namespace (see An Introduction to R [85]), and are not visible on the first try. For example, if we try to look at the code for wilcox.test (see Chapter 15) we get the following: > wilcox.test function (x, ...) UseMethod("wilcox.test") <environment: namespace:stats> > methods(wilcox.test) [1] wilcox.test.default* wilcox.test.formula* Non-visible functions are asterisked If we were to try wilcox.test.default we would get a “not found” error, because it is hidden behind the namespace for the package stats (shown in the last line when we tried wilcox.test). In cases like these we prefix the package name to the front of the function name with three colons; the command stats:::wilcox.test.default will show the source code, omitted here for brevity. 4. If it shows .Internal(something) or .Primitive("something"), then it will be necessary to download the source code of R (which is not a binary version with an .exe extension) and search inside the code there. See Ligges [60] for more discussion on this. An example is exp: > exp function (x) .Primitive("exp") Be warned that most of the .Internal functions are written in other computer languages which the beginner may not understand, at least initially
14 CHAPTER 2.AN INTRODUCTION TO R 2.4 Getting Help When you are using R,it will not take long before you find yourself needing help.Fortunately, R has extensive help resources and you should immediately become familiar with them.Begin by clicking Help on Rgui.The following options are available. Console:gives useful shortcuts,for instance,Ctrl+L,to clear the R console screen. FAQ on R:frequently asked questions concerning general R operation FAQ on R for Windows:frequently asked questions about R,tailored to the Microsoft Windows operating system. Manuals:technical manuals about all features of the R system including installation,the complete language definition,and add-on packages. R functions (text)...:use this if you know the exact name of the function you want to know more about,for example,mean or plot.Typing mean in the window is equivalent to typing help("mean")at the command line,or more simply,?mean.Note that this method only works if the function of interest is contained in a package that is already loaded into the search path with library. HTML Help:use this to browse the manuals with point-and-click links.It also has a Search Engine Keywords for searching the help page titles,with point-and-click links for the search results.This is possibly the best help method for beginners.It can be started from the command line with the command help.start() Search help...:use this if you do not know the exact name of the function of inter- est,or if the function is in a package that has not been loaded yet.For example,you may enter plo and a text window will return listing all the help files with an alias,con- cept,or title matching 'plo'using regular expression matching;it is equivalent to typing help.search("plo")at the command line.The advantage is that you do not need to know the exact name of the function;the disadvantage is that you cannot point-and-click the results.Therefore,one may wish to use the HTML Help search engine instead.An equivalent way is ??plo at the command line. search.r-project.org...:this will search for words in help lists and email archives of the R Project.It can be very useful for finding other questions that other users have asked. Apropos...:use this for more sophisticated partial name matching of functions.See ?apropos for details. On the help pages for a function there are sometimes "Examples"listed at the bottom of the page,which will work if copy-pasted at the command line (unless marked otherwise).The example function will run the code automatically,skipping the intermediate step.For instance, we may try example(mean)to see a few examples of how the mean function works. 2.4.1 R Help Mailing Lists There are several mailing lists associated with R,and there is a huge community of people that read and answer questions related to R.See here http://www.r-project.org/mail.html
14 CHAPTER 2. AN INTRODUCTION TO R 2.4 Getting Help When you are using R, it will not take long before you find yourself needing help. Fortunately, R has extensive help resources and you should immediately become familiar with them. Begin by clicking Help on Rgui. The following options are available. • Console: gives useful shortcuts, for instance, Ctrl+L, to clear the R console screen. • FAQ on R: frequently asked questions concerning general R operation. • FAQ on R for Windows: frequently asked questions about R, tailored to the Microsoft Windows operating system. • Manuals: technical manuals about all features of the R system including installation, the complete language definition, and add-on packages. • R functions (text). . . : use this if you know the exact name of the function you want to know more about, for example, mean or plot. Typing mean in the window is equivalent to typing help("mean") at the command line, or more simply, ?mean. Note that this method only works if the function of interest is contained in a package that is already loaded into the search path with library. • HTML Help: use this to browse the manuals with point-and-click links. It also has a Search Engine & Keywords for searching the help page titles, with point-and-click links for the search results. This is possibly the best help method for beginners. It can be started from the command line with the command help.start(). • Search help. . . : use this if you do not know the exact name of the function of interest, or if the function is in a package that has not been loaded yet. For example, you may enter plo and a text window will return listing all the help files with an alias, concept, or title matching ‘plo’ using regular expression matching; it is equivalent to typing help.search("plo") at the command line. The advantage is that you do not need to know the exact name of the function; the disadvantage is that you cannot point-and-click the results. Therefore, one may wish to use the HTML Help search engine instead. An equivalent way is ??plo at the command line. • search.r-project.org. . . : this will search for words in help lists and email archives of the R Project. It can be very useful for finding other questions that other users have asked. • Apropos. . . : use this for more sophisticated partial name matching of functions. See ?apropos for details. On the help pages for a function there are sometimes “Examples” listed at the bottom of the page, which will work if copy-pasted at the command line (unless marked otherwise). The example function will run the code automatically, skipping the intermediate step. For instance, we may try example(mean) to see a few examples of how the mean function works. 2.4.1 R Help Mailing Lists There are several mailing lists associated with R, and there is a huge community of people that read and answer questions related to R. See here http://www.r-project.org/mail.html
2.5.EXTERNAL RESOURCES 15 for an idea of what is available.Particularly pay attention to the bottom of the page which lists several special interest groups(SIGs)related to R. Bear in mind that R is free software,which means that it was written by volunteers,and the people that frequent the mailing lists are also volunteers who are not paid by customer support fees.Consequently,if you want to use the mailing lists for free advice then you must adhere to some basic etiquette,or else you may not get a reply,or even worse,you may receive a reply which is a bit less cordial than you are used to.Below are a few considerations: 1.Read the FAQ(http://cran.r-project.org/faqs.html).Note that there are dif- ferent FAQs for different operating systems.You should read these now,even without a question at the moment,to learn a lot about the idiosyncrasies of R. 2.Search the archives.Even if your question is not a FAQ,there is a very high likelihood that your question has been asked before on the mailing list.If you want to know about topic foo,then you can do RSiteSearch("foo")to search the mailing list archives (and the online help)for it. 3.Do a Google search and an RSeek.org search. If your question is not a FAQ,has not been asked on R-help before,and does not yield to a Google (or alternative)search,then,and only then,should you even consider writing to R- help.Below are a few additional considerations. 1.Read the posting guide (http://www.r-project.org/posting-guide.html)be- fore posting.This will save you a lot of trouble and pain. 2.Get rid of the command prompts(>)from output.Readers of your message will take the text from your mail and copy-paste into an R session.If you make the readers'job easier then it will increase the likelihood of a response. 3.Questions are often related to a specific data set,and the best way to communicate the data is with a dump command.For instance,if your question involves data stored in a vector x,you can type dump("x","")at the command prompt and copy-paste the output into the body of your email message.Then the reader may easily copy-paste the message from your email into R and x will be available to him/her. 4.Sometimes the answer the question is related to the operating system used,the attached packages,or the exact version of R being used.The sessionInfo()command collects all of this information to be copy-pasted into an email (and the Posting Guide requests this information).See Appendix A for an example. 2.5 External Resources There is a mountain of information on the Internet about R.Below are a few of the important ones. The R Project for Statistical Computing:(http://www.r-project.org/)Go here first. The Comprehensive R Archive Network:(http://cran.r-project.org/)This is where R is stored along with thousands of contributed packages.There are also loads of con- tributed information (books,tutorials,etc.).There are mirrors all over the world with duplicate information
2.5. EXTERNAL RESOURCES 15 for an idea of what is available. Particularly pay attention to the bottom of the page which lists several special interest groups (SIGs) related to R. Bear in mind that R is free software, which means that it was written by volunteers, and the people that frequent the mailing lists are also volunteers who are not paid by customer support fees. Consequently, if you want to use the mailing lists for free advice then you must adhere to some basic etiquette, or else you may not get a reply, or even worse, you may receive a reply which is a bit less cordial than you are used to. Below are a few considerations: 1. Read the FAQ (http://cran.r-project.org/faqs.html). Note that there are different FAQs for different operating systems. You should read these now, even without a question at the moment, to learn a lot about the idiosyncrasies of R. 2. Search the archives. Even if your question is not a FAQ, there is a very high likelihood that your question has been asked before on the mailing list. If you want to know about topic foo, then you can do RSiteSearch("foo") to search the mailing list archives (and the online help) for it. 3. Do a Google search and an RSeek.org search. If your question is not a FAQ, has not been asked on R-help before, and does not yield to a Google (or alternative) search, then, and only then, should you even consider writing to Rhelp. Below are a few additional considerations. 1. Read the posting guide (http://www.r-project.org/posting-guide.html) before posting. This will save you a lot of trouble and pain. 2. Get rid of the command prompts (>) from output. Readers of your message will take the text from your mail and copy-paste into an R session. If you make the readers’ job easier then it will increase the likelihood of a response. 3. Questions are often related to a specific data set, and the best way to communicate the data is with a dump command. For instance, if your question involves data stored in a vector x, you can type dump("x","") at the command prompt and copy-paste the output into the body of your email message. Then the reader may easily copy-paste the message from your email into R and x will be available to him/her. 4. Sometimes the answer the question is related to the operating system used, the attached packages, or the exact version of R being used. The sessionInfo() command collects all of this information to be copy-pasted into an email (and the Posting Guide requests this information). See Appendix A for an example. 2.5 External Resources There is a mountain of information on the Internet about R. Below are a few of the important ones. The R Project for Statistical Computing: (http://www.r-project.org/) Go here first. The Comprehensive R Archive Network: (http://cran.r-project.org/) This is where R is stored along with thousands of contributed packages. There are also loads of contributed information (books, tutorials, etc.). There are mirrors all over the world with duplicate information