当前位置：和泉文库 > 统计 > 《概率论与数理统计》课程教学资源（电子书）Introduction to Probability and Statistics with R（G. Jay Kerns，First Edition）

《概率论与数理统计》课程教学资源（电子书）Introduction to Probability and Statistics with R（G. Jay Kerns，First Edition）

文件格式：PDF，文件大小：2.31MB，售价：53.4元

文档详细内容（约365页）

16 CHAPTER 2.AN INTRODUCTION TO R R-Forge:(http://r-forge.r-project.org/)This is another location where R packages are stored.Here you can find development code which has not yet been released to CRAN. R Wiki:(http://wiki.r-project.org/rwiki/doku.php)There are many tips and tricks listed here.If you find a trick of your own,login and share it with the world. Other:the R Graph Gallery(http://addictedtor.free.fr/graphiques/)and R Graph- ical Manual (http://bm2.genes.nig.ac.jp/RGM2/index.php)have literally thou- sands of graphs to peruse.RSeek (http://www.rseek.org)is a search engine based on Google specifically tailored for R queries. 2.6 Other Tips It is unnecessary to retype commands repeatedly,since R remembers what you have recently entered on the command line.On the MicrosoftR Windows RGui,to cycle through the previous commands just push the t(up arrow)key.On Emacs/ESS the command is M-p(which means hold down the Alt button and press"p").More generally,the command history()will show a whole list of recently entered commands. To find out what all variables are in the current work environment,use the commands objects()or 1s().These list all available objects in the workspace.If you wish to remove one or more variables,use remove(var1,var2,var3),or more simply use rm(var1,var2,var3),and to remove all objects use rm(list 1s()) Another use of scan is when you have a long list of numbers(separated by spaces or on different lines)already typed somewhere else,say in a text file.To enter all the data in one fell swoop,first highlight and copy the list of numbers to the Clipboard with Edit Copy (or by right-clicking and selecting Copy).Next type the x <-scan()command in the R console,and paste the numbers at the 1:prompt with Edit>Paste.All of the numbers will automatically be entered into the vector x. The command Ctrl+l clears the screen in the MicrosoftR Windows RGui.The compa- rable command for Emacs/ESS is Once you use R for awhile there may be some commands that you wish to run automati- cally whenever R starts.These commands may be saved in a file called Rprofile.site which is usually in the etc folder,which lives in the R home directory (which on MicrosoftR Windows usually is C:\Program Files\R).Alternatively,you can make a file.Rprofile to be stored in the user's home directory,or anywhere R is invoked.This allows for multiple configurations for different projects or users.See "Customizing the Environment"of An Introduction to R for more details. When exiting R the user is given the option to"save the workspace".I recommend that beginners DO NOT save the workspace when quitting.If Yes is selected,then all of the objects and data currently in R's memory is saved in a file located in the working directory called RData.This file is then automatically loaded the next time R starts (in which case R will say [previously saved workspace restored]).This is a valuable feature for experienced users of R,but I find that it causes more trouble than it saves with beginners

16 CHAPTER 2. AN INTRODUCTION TO R R-Forge: (http://r-forge.r-project.org/) This is another location where R packages are stored. Here you can find development code which has not yet been released to CRAN. R Wiki: (http://wiki.r-project.org/rwiki/doku.php) There are many tips and tricks listed here. If you find a trick of your own, login and share it with the world. Other: the R Graph Gallery (http://addictedtor.free.fr/graphiques/) and R Graphical Manual (http://bm2.genes.nig.ac.jp/RGM2/index.php) have literally thousands of graphs to peruse. RSeek (http://www.rseek.org) is a search engine based on Google specifically tailored for R queries. 2.6 Other Tips It is unnecessary to retype commands repeatedly, since R remembers what you have recently entered on the command line. On the Microsoftr Windows RGui, to cycle through the previous commands just push the ↑ (up arrow) key. On Emacs/ESS the command is M-p (which means hold down the Alt button and press “p”). More generally, the command history() will show a whole list of recently entered commands. • To find out what all variables are in the current work environment, use the commands objects() or ls(). These list all available objects in the workspace. If you wish to remove one or more variables, use remove(var1, var2, var3), or more simply use rm(var1, var2, var3), and to remove all objects use rm(list = ls()). • Another use of scan is when you have a long list of numbers (separated by spaces or on different lines) already typed somewhere else, say in a text file. To enter all the data in one fell swoop, first highlight and copy the list of numbers to the Clipboard with Edit ⊲ Copy (or by right-clicking and selecting Copy). Next type the x <- scan() command in the R console, and paste the numbers at the 1: prompt with Edit ⊲ Paste. All of the numbers will automatically be entered into the vector x. • The command Ctrl+l clears the screen in the Microsoftr Windows RGui. The comparable command for Emacs/ESS is • Once you use R for awhile there may be some commands that you wish to run automatically whenever R starts. These commands may be saved in a file called Rprofile.site which is usually in the etc folder, which lives in the R home directory (which on Microsoftr Windows usually is C:\Program Files\R). Alternatively, you can make a file .Rprofile to be stored in the user’s home directory, or anywhere R is invoked. This allows for multiple configurations for different projects or users. See “Customizing the Environment” of An Introduction to R for more details. • When exiting R the user is given the option to “save the workspace”. I recommend that beginners DO NOT save the workspace when quitting. If Yes is selected, then all of the objects and data currently in R’s memory is saved in a file located in the working directory called .RData. This file is then automatically loaded the next time R starts (in which case R will say [previously saved workspace restored]). This is a valuable feature for experienced users of R, but I find that it causes more trouble than it saves with beginners

20 CHAPTER 3.DATA DESCRIPTION 3.1.1 Quantitative data Quantitative data are any data that measure or are associated with a measurement of the quantity of something.They invariably assume numerical values.Quantitative data can be further subdivided into two categories. Discrete data take values in a finite or countably infinite set of numbers,that is,all possible values could (at least in principle)be written down in an ordered list.Examples include:counts,number of arrivals,or number of successes.They are often represented by integers,say,0,1,2,etc.. Continuous data take values in an interval of numbers.These are also known as scale data,interval data,or measurement data.Examples include:height,weight,length,time, erc.Continuous data are often characterized by fractions or decimals:3.82,7.0001,4 etc.. Note that the distinction between discrete and continuous data is not always clear-cut.Some- times it is convenient to treat data as if they were continuous,even though strictly speaking they are not continuous.See the examples. Example 3.1.Annual Precipitation in US Cities.The vector precip contains average amount of rainfall (in inches)for each of 70 cities in the United States and Puerto Rico.Let us take a look at the data: str(precip) Named num[1:70]6754.7748.51417.220.71343.440.2.. attr(*,"names")=chr [1:70]"Mobile""Juneau""Phoenix""Little Rock"... precip[1:4] Mobile Juneau Phoenix Little Rock 67.0 54.7 7.0 48.5 The output shows that precip is a numeric vector which has been named,that is,each value has a name associated with it(which can be set with the names function).These are quantitative continuous data. Example 3.2.Lengths of Major North American Rivers.The U.S.Geological Survey recorded the lengths (in miles)of several rivers in North America.They are stored in the vector rivers in the datasets package(which ships with base R).See ?rivers.Let us take a look at the data with the str function. str(rivers) num[1:141]735320325392524.. The output says that rivers is a numeric vector of length 141,and the first few values are 735,320,325,etc.These data are definitely quantitative and it appears that the measurements have been rounded to the nearest mile.Thus,strictly speaking,these are discrete data.But we will find it convenient later to take data like these to be continuous for some of our statistical procedures

20 CHAPTER 3. DATA DESCRIPTION 3.1.1 Quantitative data Quantitative data are any data that measure or are associated with a measurement of the quantity of something. They invariably assume numerical values. Quantitative data can be further subdivided into two categories. • Discrete data take values in a finite or countably infinite set of numbers, that is, all possible values could (at least in principle) be written down in an ordered list. Examples include: counts, number of arrivals, or number of successes. They are often represented by integers, say, 0, 1, 2, etc.. • Continuous data take values in an interval of numbers. These are also known as scale data, interval data, or measurement data. Examples include: height, weight, length, time, etc. Continuous data are often characterized by fractions or decimals: 3.82, 7.0001, 4 5 8 , etc.. Note that the distinction between discrete and continuous data is not always clear-cut. Sometimes it is convenient to treat data as if they were continuous, even though strictly speaking they are not continuous. See the examples. Example 3.1. Annual Precipitation in US Cities. The vector precip contains average amount of rainfall (in inches) for each of 70 cities in the United States and Puerto Rico. Let us take a look at the data: > str(precip) Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ... - attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ... > precip[1:4] Mobile Juneau Phoenix Little Rock 67.0 54.7 7.0 48.5 The output shows that precip is a numeric vector which has been named, that is, each value has a name associated with it (which can be set with the names function). These are quantitative continuous data. Example 3.2. Lengths of Major North American Rivers. The U.S. Geological Survey recorded the lengths (in miles) of several rivers in North America. They are stored in the vector rivers in the datasets package (which ships with base R). See ?rivers. Let us take a look at the data with the str function. > str(rivers) num [1:141] 735 320 325 392 524 ... The output says that rivers is a numeric vector of length 141, and the first few values are 735, 320, 325, etc. These data are definitely quantitative and it appears that the measurements have been rounded to the nearest mile. Thus, strictly speaking, these are discrete data. But we will find it convenient later to take data like these to be continuous for some of our statistical procedures

3.1.TYPES OF DATA 21 Example 3.3.Yearly Numbers of Important Discoveries.The vector discoveries contains numbers of"great"inventions/discoveries in each year from 1860 to 1959,as reported by the 1975 World Almanac.Let us take a look at the data: str(discoveries) Time-Series[1:100]from1860to1959:5302032361.. discoveries[1:4] [1]5302 The output is telling us that discoveries is a time series (see Section 3.1.5 for more)of length 100.The entries are integers,and since they represent counts this is a good example of discrete quantitative data.We will take a closer look in the following sections. Displaying Quantitative Data One of the first things to do when confronted by quantitative data (or any data,for that matter) is to make some sort of visual display to gain some insight into the data's structure.There are almost as many display types from which to choose as there are data sets to plot.We describe some of the more popular alternatives. Strip charts (also known as Dot plots)These can be used for discrete or continuous data, and usually look best when the data set is not too large.Along the horizontal axis is a numerical scale above which the data values are plotted.We can do it in R with a call to the stripchart function.There are three available methods. overplot plots ties covering each other.This method is good to display only the distinct values assumed by the data set. jitter adds some noise to the data in the y direction in which case the data values are not covered up by ties. stack plots repeated values stacked on top of one another.This method is best used for discrete data with a lot of ties;if there are no repeats then this method is identical to overplot. See Figure 3.1.1,which is produced by the following code. stripchart(precip,xlab "rainfall") stripchart(rivers,method "jitter",xlab "length") stripchart(discoveries,method "stack",xlab "number") The leftmost graph is a strip chart of the precip data.The graph shows tightly clustered values in the middle with some others falling balanced on either side,with perhaps slightly more falling to the left.Later we will call this a symmetric distribution,see Section 3.2.3.The middle graph is of the rivers data,a vector of length 141.There are several repeated values in the rivers data,and if we were to use the overplot method we would lose some of them in the display.This plot shows a what we will later call a right-skewed shape with perhaps some extreme values on the far right of the display.The third graph strip charts discoveries data which are literally a textbook example of a right skewed distribution. The DOTplot function in the UsingR package [86]is another alternative

3.1. TYPES OF DATA 21 Example 3.3. Yearly Numbers of Important Discoveries. The vector discoveries contains numbers of “great” inventions/discoveries in each year from 1860 to 1959, as reported by the 1975 World Almanac. Let us take a look at the data: > str(discoveries) Time-Series [1:100] from 1860 to 1959: 5 3 0 2 0 3 2 3 6 1 ... > discoveries[1:4] [1] 5 3 0 2 The output is telling us that discoveries is a time series (see Section 3.1.5 for more) of length 100. The entries are integers, and since they represent counts this is a good example of discrete quantitative data. We will take a closer look in the following sections. Displaying Quantitative Data One of the first things to do when confronted by quantitative data (or any data, for that matter) is to make some sort of visual display to gain some insight into the data’s structure. There are almost as many display types from which to choose as there are data sets to plot. We describe some of the more popular alternatives. Strip charts (also known as Dot plots) These can be used for discrete or continuous data, and usually look best when the data set is not too large. Along the horizontal axis is a numerical scale above which the data values are plotted. We can do it in R with a call to the stripchart function. There are three available methods. overplot plots ties covering each other. This method is good to display only the distinct values assumed by the data set. jitter adds some noise to the data in the y direction in which case the data values are not covered up by ties. stack plots repeated values stacked on top of one another. This method is best used for discrete data with a lot of ties; if there are no repeats then this method is identical to overplot. See Figure 3.1.1, which is produced by the following code. > stripchart(precip, xlab = "rainfall") > stripchart(rivers, method = "jitter", xlab = "length") > stripchart(discoveries, method = "stack", xlab = "number") The leftmost graph is a strip chart of the precip data. The graph shows tightly clustered values in the middle with some others falling balanced on either side, with perhaps slightly more falling to the left. Later we will call this a symmetric distribution, see Section 3.2.3. The middle graph is of the rivers data, a vector of length 141. There are several repeated values in the rivers data, and if we were to use the overplot method we would lose some of them in the display. This plot shows a what we will later call a right-skewed shape with perhaps some extreme values on the far right of the display. The third graph strip charts discoveries data which are literally a textbook example of a right skewed distribution. The DOTplot function in the UsingR package [86] is another alternative

点击进入文档下载页（PDF格式）

共365页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录