Chapter I Presenting and Summarizing Data 11 1.4.5 Measures of Relative Position Measures of relative position include percentiles and z-value. The percentile value (p)of an observation y in a data set has 100p%of observations smaller than y,and has 100(1-p)%of observations greater than y A lower quartile is the 25th percentile,an upper quartile is 75th percentile,and the median is the 50th percentile. The=-value is the deviation of an observation from the mean in standard deviation units: 业 standard deviation,coefficient of 260260230280290280260270260300 280290260250270320320250320220 Arithmetic mean: n ∑1%=260+260+.+220=5470kg =5470=2735kg 20 Sample variance: 2y-或∑② n-1 n-1 ∑7=(2602+2602+.+2202)=1510700kg 1510700-5470j时 2 19 20=771.3158kg Sample standard deviation: s=5F=√71.3158=27.77kg Coefficient of variation: cy=100%=273 27.77 100%=10.15%
Chapter 1 Presenting and Summarizing Data 11 1.4.5 Measures of Relative Position Measures of relative position include percentiles and z-value. The percentile value (p) of an observation yi, in a data set has 100p% of observations smaller than yi and has 100(1-p)% of observations greater than yi. A lower quartile is the 25th percentile, an upper quartile is 75th percentile, and the median is the 50th percentile. The z-value is the deviation of an observation from the mean in standard deviation units: s y y z i i − = Example: Calculate the arithmetic mean, variance, standard deviation, coefficient of variation, median and mode of the following weights of calves (kg): 260 260 230 280 290 280 260 270 260 300 280 290 260 250 270 320 320 250 320 220 Arithmetic mean: n y y i ∑ i = Σi yi = 260 + 260 + . + 220 = 5470 kg 273.5 20 5470 y = = kg Sample variance: ( ) 1 1 ( ) 2 2 2 2 − − = − − = ∑ ∑ ∑ n n y y n y y s i i i i i i (260 260 . 220 ) 1510700 2 2 2 2 ∑ = + + + = i i y kg2 ( ) 771.3158 19 20 5470 1510700 2 2 = − s = kg2 Sample standard deviation: 771.3158 27.77 2 s = s = = kg Coefficient of variation: 100% 10.15% 273.5 27.77 100% s = = = y CV
12 Biostatistics for Animal Science To find the median the observations are sorted from smallest to the largest: 220230250250260260260260260270270280280280290290 300320320320 Since n=20 is an even number,the median is the average of"h and(=11h observations when the data are sorted.The values of those observations are 270 and 270, ethek The mole 200k 1.5 SAS Example Descriptive statistics for the example set of weights of calves are calculated using sas software.For a more detailed explanation how to use SAS,we recommend the exhaustive SAS literature,part of which is included in the list of literature at the end of this book.This SAS program consists of two parts:1)the DATA step,which is used for entry and transformation of data,2)and the PROC step,which defines the procedure(s)for data analysis.SAS has three basic windows:a Program window(PGM)in which the program is written,an Output window (OUT)in which the user can see the results,and LOG window in which the user can view details regarding program execution or error messages. Returning to the example of weights of 20 calves: SAS program: ht@@ DATAL INES 260260230280290280260270260300 280290260250270320320250320220 PROC MEANS DATA=calves N MEAN MIN MAX VAR STD CV VAR weight; RUN: Explanation:The SAS statements will be written with capital letters to highlight them. although it is not generally mandatory,i.e.the program does not distinguish between smal letters and capitals.Names that user assigns to variables,data files,etc. will be written with small letters.In this program the DATA file that contains data.Here,cah es name of the file.The INPUT statement defines the name(s)of t the variable,and the DATALINES statement indicates that data are on the following lines. Here,the name of the variable is weight.SAS needs data in columns,for example, 260 260 220
12 Biostatistics for Animal Science To find the median the observations are sorted from smallest to the largest: 220 230 250 250 260 260 260 260 260 270 270 280 280 280 290 290 300 320 320 320 Since n = 20 is an even number, the median is the average of n /2 = 10th and (n+2)/2 = 11th observations when the data are sorted. The values of those observations are 270 and 270, respectively, and their average is 270, thus, the median is 270 kg. The mode is 260 kg because this is the observation with the highest frequency. 1.5 SAS Example Descriptive statistics for the example set of weights of calves are calculated using SAS software. For a more detailed explanation how to use SAS, we recommend the exhaustive SAS literature, part of which is included in the list of literature at the end of this book. This SAS program consists of two parts: 1) the DATA step, which is used for entry and transformation of data, 2) and the PROC step, which defines the procedure(s) for data analysis. SAS has three basic windows: a Program window (PGM) in which the program is written, an Output window (OUT) in which the user can see the results, and LOG window in which the user can view details regarding program execution or error messages. Returning to the example of weights of 20 calves: SAS program: DATA calves; INPUT weight @@; DATALINES; 260 260 230 280 290 280 260 270 260 300 280 290 260 250 270 320 320 250 320 220 ; PROC MEANS DATA = calves N MEAN MIN MAX VAR STD CV ; VAR weight; RUN; Explanation: The SAS statements will be written with capital letters to highlight them, although it is not generally mandatory, i.e. the program does not distinguish between small letters and capitals. Names that user assigns to variables, data files, etc., will be written with small letters. In this program the DATA statement defines the name of the file that contains data. Here, calves is the name of the file. The INPUT statement defines the name(s) of the variable, and the DATALINES statement indicates that data are on the following lines. Here, the name of the variable is weight. SAS needs data in columns, for example, INPUT weight; DATALINES; 260 260 . 220 ;
Chapter I Presenting and Summarizing Data 13 reads values of the variable weih.Data can be written in rows if the symbolsare used with the INPUT statement.SAS reads observations one by one and stores them into a column named weight. Ihe program uses the procedure (PROC)MEANS.The option DATA- es the data file hat be used in the calculation of s lowed by the lis statistics to be arithmetic mean,MI minimum,MA maximum,VAR variance,SID -standar coefficient of variation.The VAR statement defines the variable (weight)to be analyzed SAS output: Analysis variable:WEIGHT N Mean Minimum Maximum Variance Std Dev CV 20273.5 220 320 771.3157927.7725710.1545 The SAS output lists th variable that was analyzed (Analysis variable:WEIGHT).The descriptive statistics are then listed Exercises 1.1.The number of eggs laid per month in a sample of 40 hens are shown below: 30 23 24 26 26 26 % 25 29 23 26 30 28 24 26 28 27 337 2227 Calculate descriptive statistics and present a frequency distribution 1.2.Calculate the sample variance given the following sums: =600 (sum of observations)=12656(sum of squared observations):n=30 (number of observations) 1.3.Draw the histogram of the values of a variable y and its frequencies f 1214 16182022242628 f 3 4 911 6 2 Calculate descriptive statistics for this sample
Chapter 1 Presenting and Summarizing Data 13 reads values of the variable weight. Data can be written in rows if the symbols @@ are used with the INPUT statement. SAS reads observations one by one and stores them into a column named weight. The program uses the procedure (PROC) MEANS. The option DATA = calves defines the data file that will be used in the calculation of statistics, followed by the list of statistics to be calculated: N = the number of observations, MEAN = arithmetic mean, MIN = minimum, MAX = maximum, VAR = variance, STD= standard deviation, CV = coefficient of variation. The VAR statement defines the variable (weight) to be analyzed. SAS output: Analysis Variable: WEIGHT N Mean Minimum Maximum Variance Std Dev CV - 20 273.5 220 320 771.31579 27.77257 10.1545 - The SAS output lists the variable that was analyzed (Analysis variable: WEIGHT). The descriptive statistics are then listed. Exercises 1.1. The number of eggs laid per month in a sample of 40 hens are shown below: 30 23 26 27 29 25 27 24 28 26 26 26 30 26 25 29 26 23 26 30 25 28 24 26 27 25 25 28 27 28 26 30 26 25 28 28 24 27 27 29 Calculate descriptive statistics and present a frequency distribution. 1.2. Calculate the sample variance given the following sums: Σi yi = 600 (sum of observations); Σi yi 2 = 12656 (sum of squared observations); n = 30 (number of observations) 1.3. Draw the histogram of the values of a variable y and its frequencies f: y 12 14 16 18 20 22 24 26 28 f 1 3 4 9 11 9 6 1 2 Calculate descriptive statistics for this sample
14 Biostatistics for Animal Science 1.4.The following are data of milk fat yield(kg)per month from 17 Holstein cows: 2717312029224028262834323232302325 will increase b 黑oa y two,t an wil be two tmes ller and the sample variance
14 Biostatistics for Animal Science 1.4. The following are data of milk fat yield (kg) per month from 17 Holstein cows: 27 17 31 20 29 22 40 28 26 28 34 32 32 32 30 23 25 Calculate descriptive statistics. Show that if 3 kg are added to each observation, the mean will increase by three and the sample variance will stay the same. Show that if each observation is divided by two, the mean will be two times smaller and the sample variance will be four times smaller. How will the standard deviation be changed?
Chapter 2 Probability word probability is used to indicate the likelihood that some event will happen. example,there is high probability that it will rain tonight'.We can c cordin o so vations or measurements.If we ount or make a conclusion abo the number of favorable events.we can express the probability of occurrence of an event by using a proportion or percentage of all events.Probability is in portant in drawing inferences ahout a inferences by using rements.and applying the rules of mathema atical probability A probability can be a-priori or osteriori.from a logical deduction on the basis of previous experiences.Our exp rience tells us that if it is cloudy,we can exp t with high probability that it will rain.If an animal has particular symptoms.there is high probability that it has or will have a particular disease.An a- posteriori probability is established by using a planned experiment.For example,assume that changi g a ration will increase milk yield of dairy cows.Only after an experiment was conducted in which numerical differences were measured,it can be concluded with some probability or uncertainty,that a positive response can be expected for other cows as well. Generally,each process of collecting data is an experiment.For example,throwing a die and observing the number is an experiment. Mathematically,probability is: P=m where m is the number of favorable trials and n is the total number of trials An observation of an experiment that cannot be partitioned to simpler vents is called an event or throw a event The set of all possible simple erve the resul nple ne sa ne possibl ntsis called t are he probab iml eventbEuch astrowngthe sth vent is a prot 2.1 Rules about Probabilities of Simple Events Let EEE be the set of all simple events in some sample space of simple events.Then we have: 1.The probability of any simple event occurring must be between 0 and I inclusively: 0≤P(E)≤1,i=1,k 15
15 Chapter 2 Probability The word probability is used to indicate the likelihood that some event will happen. For example, ‘there is high probability that it will rain tonight’. We can conclude this according to some signs, observations or measurements. If we can count or make a conclusion about the number of favorable events, we can express the probability of occurrence of an event by using a proportion or percentage of all events. Probability is important in drawing inferences about a population. Statistics deals with drawing inferences by using observations and measurements, and applying the rules of mathematical probability. A probability can be a-priori or a-posteriori. An a-priori probability comes from a logical deduction on the basis of previous experiences. Our experience tells us that if it is cloudy, we can expect with high probability that it will rain. If an animal has particular symptoms, there is high probability that it has or will have a particular disease. An aposteriori probability is established by using a planned experiment. For example, assume that changing a ration will increase milk yield of dairy cows. Only after an experiment was conducted in which numerical differences were measured, it can be concluded with some probability or uncertainty, that a positive response can be expected for other cows as well. Generally, each process of collecting data is an experiment. For example, throwing a die and observing the number is an experiment. Mathematically, probability is: n m P = where m is the number of favorable trials and n is the total number of trials. An observation of an experiment that cannot be partitioned to simpler events is called an elementary event or simple event. For example, we throw a die once and observe the result. This is a simple event. The set of all possible simple events is called the sample space. All the possible simple events in an experiment consisting of throwing a die are 1, 2, 3, 4, 5 and 6. The probability of a simple event is a probability that this specific event occurs. If we denote a simple event by Ei, such as throwing a 4, then P(Ei) is the probability of that event. 2.1 Rules about Probabilities of Simple Events Let E1, E2,., Ek be the set of all simple events in some sample space of simple events. Then we have: 1. The probability of any simple event occurring must be between 0 and 1 inclusively: 0 ≤ P(Ei) ≤ 1, i = 1,., k