CHAPTER Getting Started Using SAS Software 1.3 The Two Parts of a SAS Progra 1.4 The DATA Steps Built-in Loop 8 1.6 Windows and Commands in the SAS Windowing Environment 12 1.8 Reading the SAS Log 16 1.9 Viewing Your Results in the Output Window 18 1.10 Creating HTML Output 20 1.11 SAS Data Libraries 22 1.12 Viewing Data Sets with SAS Explorer 24 1.13 Using SAS System Options 26
CHAPTER 1 Getting Started Using SAS Software 1.1 The SAS Language 2 1.2 SAS Data Sets 4 1.3 The Two Parts of a SAS Program 6 1.4 The DATA Step’s Built-in Loop 8 1.5 Choosing a Mode for Submitting SAS Programs 10 1.6 Windows and Commands in the SAS Windowing Environment 12 1.7 Submitting a Program in the SAS Windowing Environment 14 1.8 Reading the SAS Log 16 1.9 Viewing Your Results in the Output Window 18 1.10 Creating HTML Output 20 1.11 SAS Data Libraries 22 1.12 Viewing Data Sets with SAS Explorer 24 1.13 Using SAS System Options 26
2 The Little sas book 1.1 The SAs Language Many software applications are either menu driven, or command driven(enter a command-see SAS program. The program communicates what you want to do and is written using the SAs9 the result). SAS is neither. With SAS, you use statements to write a series of instructions called language. There are some menu-driven front ends to SAS, for example SAS Enterprise Guide software, which make SAS appear like a point-and-click program. However, these front ends still use the SAs language to write programs for you. You will have much more flexibility using SAS if you learn to write your own programs using the SAs language. Maybe learning a new language is the last thing you want to do, but be assured that although there are parallels between SAS and languages you know(be they English or FORTRAN), SAS is much easier to SAS Programs A SAS program is a sequence of statements executed in order. A statement gives information or instructions to SAS and must be appropriately placed in the program. An everyday analogy to a Sas program is a trip to the bank. You enter your bank, stand in line, and when you finally reach the teller's window, you say what you want to do The statements you give can be written down in the form of a program I would like to make a withdrawal unt number is 0937 I would like $200 Give me five 20s and two 50 Note that you first say what you want to do, then give all the information the teller needs carry out your request. The order of the subsequent statements may not be important, but you must start with the general statement of what you want to do. You would not, for example, go up to a bank teller and say "Give me five 20s and two 50s. This is not only bad form, but would probably make the tellers heart skip a beat or two. You must also make sure that all the subsequent statements belong with the first. You would not say, "I want the largest box you"t have"when making a withdrawal from your checking account That statement belongs with"I would like to open a safe deposit box. "A SAS program is an ordered set of SAS statements like the ordered set of instructions you use when you go to the bank. sAs statements As with any language, there are a few rules to follow when writing SAS programs. Fortunately for us, the rules for writing SAS programs are much fewer and simpler than those for English. The most important rule is Every sas statement ends with a semicolon This sounds simple enough. But while children generally outgrow the habit of forgetting the period at the end of a sentence, SAS programmers never seem to outgrow forgetting the semi- colon at the end of a SAS statement. Even the most experienced SAS programmer will at least occasionally forget the semicolon. You will be two steps ahead if you remember this simple rule
2 The Little SAS Book 1.1 The SAS Language Many software applications are either menu driven, or command driven (enter a commandsee the result). SAS is neither. With SAS, you use statements to write a series of instructions called a SAS program. The program communicates what you want to do and is written using the SAS language. There are some menu-driven front ends to SAS, for example SAS Enterprise Guide software, which make SAS appear like a point-and-click program. However, these front ends still use the SAS language to write programs for you. You will have much more flexibility using SAS if you learn to write your own programs using the SAS language. Maybe learning a new language is the last thing you want to do, but be assured that although there are parallels between SAS and languages you know (be they English or FORTRAN), SAS is much easier to learn. SAS programs A SAS program is a sequence of statements executed in order. A statement gives information or instructions to SAS and must be appropriately placed in the program. An everyday analogy to a SAS program is a trip to the bank. You enter your bank, stand in line, and when you finally reach the teller’s window, you say what you want to do. The statements you give can be written down in the form of a program: I would like to make a withdrawal. My account number is 0937. I would like $200. Give me five 20s and two 50s. Note that you first say what you want to do, then give all the information the teller needs to carry out your request. The order of the subsequent statements may not be important, but you must start with the general statement of what you want to do. You would not, for example, go up to a bank teller and say, “Give me five 20s and two 50s.” This is not only bad form, but would probably make the teller’s heart skip a beat or two. You must also make sure that all the subsequent statements belong with the first. You would not say, “I want the largest box you have” when making a withdrawal from your checking account. That statement belongs with “I would like to open a safe deposit box.” A SAS program is an ordered set of SAS statements like the ordered set of instructions you use when you go to the bank. SAS statements As with any language, there are a few rules to follow when writing SAS programs. Fortunately for us, the rules for writing SAS programs are much fewer and simpler than those for English. The most important rule is Every SAS statement ends with a semicolon. This sounds simple enough. But while children generally outgrow the habit of forgetting the period at the end of a sentence, SAS programmers never seem to outgrow forgetting the semicolon at the end of a SAS statement. Even the most experienced SAS programmer will at least occasionally forget the semicolon. You will be two steps ahead if you remember this simple rule
Chapter 1: Getting Started Using SAs Software 3 Layout of SAs programs There really arent any rules about how to format your SAS program. While it is helpful to have a neat looking program with each statement on a line by itself and indentions to show the various parts of the program, it isn t necessary SAS statements can be in upper-or lowe Statements can continue on the next line(as long as you don' t split words in two Statements can be on the same line as other statements atements can start in any column So you see, SAS is so flexible that it is possible to write programs so disorganized that no one can read them, not even you. (Of course, we don' t recommend this Comments To make your programs more understandable, you can insert comments into your programs. It doesn't matter what you put in your comments SAS doesn' t look at it. You could put your favorite cookie recipe in there if you want. However, comments are usually used to annotate the program, making it easier for someone to read your program and understand what you have done and wh There are two styles of comments you can use: one starts with an asterisk (")and ends with a semicolon () The other style starts with a slash asterisk (/)and ends with an asterisk slash('/) The following SAS program shows the use of both of these style comments: Read animals,weights from file; DATA PROC PRINT DATA animals: / Print the results * RUN Since some operating environments interpret a slash asterisk(/*)in the first column as the end of a job, be careful when using this style of comment not to place it in the first column. For this reason, we chose the asterisk-semicolon style of comment for this book Errors people who are just learning a programming language often get frustrated because their often come up in bright red letters, and for the poor person whose results turn out more red than programs do not work correctly the first time they write them. To make matters worse, SAS errors black, this can be a very humbling experience. You should expect errors. Most programs simply dont work the first time, if for no other reason than you are human. You forget a semicolon misspell a word, have your fingers in the wrong place on the keyboard. It happens. Often one small mistake can generate a whole list of errors. Dont panic if you see red
Chapter 1: Getting Started Using SAS Software 3 Layout of SAS programs There really aren’t any rules about how to format your SAS program. While it is helpful to have a neat looking program with each statement on a line by itself and indentions to show the various parts of the program, it isn’t necessary. SAS statements can be in upper- or lowercase. Statements can continue on the next line (as long as you don’t split words in two). Statements can be on the same line as other statements. Statements can start in any column. So you see, SAS is so flexible that it is possible to write programs so disorganized that no one can read them, not even you. (Of course, we don’t recommend this.) Comments To make your programs more understandable, you can insert comments into your programs. It doesn’t matter what you put in your commentsSAS doesn’t look at it. You could put your favorite cookie recipe in there if you want. However, comments are usually used to annotate the program, making it easier for someone to read your program and understand what you have done and why. There are two styles of comments you can use: one starts with an asterisk (*) and ends with a semicolon (;). The other style starts with a slash asterisk (/*) and ends with an asterisk slash (*/). The following SAS program shows the use of both of these style comments: * Read animals’ weights from file; DATA animals; INFILE ’c:\MyRawData\Zoo.dat’; INPUT Lions Tigers; PROC PRINT DATA = animals; /* Print the results */ RUN; Since some operating environments interpret a slash asterisk (/*) in the first column as the end of a job, be careful when using this style of comment not to place it in the first column. For this reason, we chose the asterisk-semicolon style of comment for this book. Errors People who are just learning a programming language often get frustrated because their programs do not work correctly the first time they write them. To make matters worse, SAS errors often come up in bright red letters, and for the poor person whose results turn out more red than black, this can be a very humbling experience. You should expect errors. Most programs simply don’t work the first time, if for no other reason than you are human. You forget a semicolon, misspell a word, have your fingers in the wrong place on the keyboard. It happens. Often one small mistake can generate a whole list of errors. Don’t panic if you see red.
The Little sas book 1.2 SAS Data Sets Before you run an analysis, before you write a report, before you do anything with your data, SAS must be able to read your data. Before SAS can analyze your data, the data must be in a special form called a SAs data set. Getting your data into a sas data set is usually quite simple data set, SAS keeps track of what is where and in what form. All you have to do is specify tif on as sas is very flexible and can read almost any data. Once your data have been read into a Si name and location of the data set you want, and SAS figures out what is in it. Variables and observations Data, of course are the primary constituent of any data set.In traditional SAS terminology the data consist of variables and observations. Adopting the terminal- ogy of relational databases, SAS data sets are also called tables, observations are also called rows, nd variables are also called columns. Below you see a rectangular table containing a small data set. Each line represents one observation, while Id, Name, Height, and Weight are variables. The data point Charlie is one of the values of the variable Name and is also part of the second observation. Variables (Also Called Columns) Name H Susie 54 46 55 Observations Calvin (Also Called Dennis Data types Raw data come in many different forms, but SAS simplifies this In SAS there are just two data types: numeric and character. Numeric fields are, well, numbers. They can be added and subtracted, can have any number of decimal places, and can be positive or negative In addition to numerals, numeric fields can contain plus signs(+), minus signs(-), decimal points () or E for scientific notation. Character data are everything else. They may contain numerals, letters, or special characters(such as S or )and can be up to 32, 767 characters long If a variable contains letters or special characters, it must be character data. However, if it contains only numbers, then it may be numeric or character. You should base your decision on how you will use the variable. Sometimes data that consist solely of numerals make more sense as character data than as numeric. ZIP codes, for example, are made up of numerals, but it just doesnt make sense to add, subtract, multiply, or divide ZiP codes. Such numbers make more sense as character data. In the previous data set, Name is obviously a character variable, and Height and Weight are numeric. Id, however, could be either numeric or character. It's your choice. AS/ACCESS soft for more information For SPSS you can use the SPSS ee appendix D discussed in section 10.15, to control the storage size. your decision on storage size. You can use the LENGTH statement, If disk space is a pr
4 The Little SAS Book Id Name Height Weight 1 53Susie 42 41 2 54 Charlie 46 55 355 Calvin 40 35 4 56 Lucy 46 52 5 57 Dennis 44 . 6 58 43 50 Observations (Also Called Rows) Variables (Also Called Columns) 1.2 SAS Data Sets Before you run an analysis, before you write a report, before you do anything with your data, SAS must be able to read your data. Before SAS can analyze your data, the data must be in a special form called a SAS data set.1 Getting your data into a SAS data set is usually quite simple as SAS is very flexible and can read almost any data. Once your data have been read into a SAS data set, SAS keeps track of what is where and in what form. All you have to do is specify the name and location of the data set you want, and SAS figures out what is in it. Variables and observations Data, of course, are the primary constituent of any data set. In traditional SAS terminology the data consist of variables and observations. Adopting the terminology of relational databases, SAS data sets are also called tables, observations are also called rows, and variables are also called columns. Below you see a rectangular table containing a small data set. Each line represents one observation, while Id, Name, Height, and Weight are variables. The data point Charlie is one of the values of the variable Name and is also part of the second observation. Data types Raw data come in many different forms, but SAS simplifies this. In SAS there are just two data types: numeric and character. Numeric fields are, well, numbers. They can be added and subtracted, can have any number of decimal places, and can be positive or negative. In addition to numerals, numeric fields can contain plus signs (+), minus signs (-), decimal points (.), or E for scientific notation. Character data are everything else. They may contain numerals, letters, or special characters (such as $ or !) and can be up to 32,767 characters long. If a variable contains letters or special characters, it must be character data. However, if it contains only numbers, then it may be numeric or character. You should base your decision on how you will use the variable.2 Sometimes data that consist solely of numerals make more sense as character data than as numeric. ZIP codes, for example, are made up of numerals, but it just doesn’t make sense to add, subtract, multiply, or divide ZIP codes. Such numbers make more sense as character data. In the previous data set, Name is obviously a character variable, and Height and Weight are numeric. Id, however, could be either numeric or character. It’s your choice. 1 There are exceptions. If your data are in a format written by another software product, you may be able to read your data directly without creating a SAS data set. For database management systems and spreadsheets, you may be able to use SAS/ACCESS software. See chapter 2 for more information. For SPSS you can use the SPSS data engine. See appendix D. 2 If disk space is a problem, you may also choose to base your decision on storage size. You can use the LENGTH statement, discussed in section 10.15, to control the storage size of variables
Chapter 1: Getting Started Using SAs Software 5 Missing data Sometimes despite your best efforts, your data may be incomplete. The value of a particular variable may be missing for some observations. In those cases, missing character data are represented by blanks, and missing numeric data are represented by a single period () In the preceding data set, the value of Weight for observation 5 is missing, and its place is marked by a period. The value of Name for observation 6 is missing and is just left blank Size of sas data sets Prior to SAS 9. 1, SAS data sets could contain up to 32,767 variables Beginning with SAS 9.1, the maximum number of variables in a SAs data set is limited by the resources available on your computer--but SAs data sets with more than 32,767 variables cannot be used with earlier versions of SAS. The number of observations, no matter which version of Sas you are using, is limited only by your computers capacity to handle and store them Rules for Sas names You make up names for the variables in your data and for the data sets themselves. It is helpful to make up names that identify what the data represent, especially for variables. While the variable names A, B, and C might seem like perfectly fine, easy-to-type names when you write your program, the names Sex, Height, and Weight will probably be more helpful when you go back to look at the program six months later. Follow these simple rules when making up names for variables and data set members ames must be 32 characters or fewer in length.' Names can contain only letters, numerals, or underscores(). No %S: "&c#@, pleas Names can contain upper-and lowercase letters This last point is an important one. SAS is insensitive to case so you can use uppercase, lowercase or mixed case-whichever looks best to you. SAS doesnt care. The data set name heightweight is the same as HEIGHTWEIGHT or HeightWeight. Likewise, the variable name BirthDate is the same as birthdate and birThDaTe. However there is one difference for variable names. sas results. That is why, in this book, we use mixed case for variable names but lowercase for other 8 members the case of the first occurrence of each variable name and uses that case when printin SAS names Documentation stored in Sas data sets In addition to your actual data, SAS data sets contain information about the data set such as its name, the date that you created it, and the version of SAS you used to create it. SAS also stores information about each variable, including its ame, type(numeric or character), length(or storage size), and position within the data set. This information is sometimes called the descriptor portion of the data set, and it makes SAS data sets self-documenting can also be 32 characters long, and informat na character values). Prior to SAS 9, format names could be 8 characters while informat names could be 7 ch including the $). Librefs and filerefs must be 8 characters or fewer in length, and member names for versione ts must be and a name literal of the formuariable-name N See the SAs Help and Documentation for details
Chapter 1: Getting Started Using SAS Software 5 Missing data Sometimes despite your best efforts, your data may be incomplete. The value of a particular variable may be missing for some observations. In those cases, missing character data are represented by blanks, and missing numeric data are represented by a single period (.). In the preceding data set, the value of Weight for observation 5 is missing, and its place is marked by a period. The value of Name for observation 6 is missing and is just left blank. Size of SAS data sets Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. Beginning with SAS 9.1, the maximum number of variables in a SAS data set is limited by the resources available on your computerbut SAS data sets with more than 32,767 variables cannot be used with earlier versions of SAS. The number of observations, no matter which version of SAS you are using, is limited only by your computer’s capacity to handle and store them. Rules for SAS names You make up names for the variables in your data and for the data sets themselves. It is helpful to make up names that identify what the data represent, especially for variables. While the variable names A, B, and C might seem like perfectly fine, easy-to-type names when you write your program, the names Sex, Height, and Weight will probably be more helpful when you go back to look at the program six months later. Follow these simple rules when making up names for variables and data set members: Names must be 32 characters or fewer in length.3 Names must start with a letter or an underscore ( _ ). Names can contain only letters, numerals, or underscores ( _ ). No %$!*&#@, please.4 Names can contain upper- and lowercase letters. This last point is an important one. SAS is insensitive to case so you can use uppercase, lowercase or mixed casewhichever looks best to you. SAS doesn’t care. The data set name heightweight is the same as HEIGHTWEIGHT or HeightWeight. Likewise, the variable name BirthDate is the same as BIRTHDATE and birThDaTe. However, there is one difference for variable names. SAS remembers the case of the first occurrence of each variable name and uses that case when printing results. That is why, in this book, we use mixed case for variable names but lowercase for other SAS names. Documentation stored in SAS data sets In addition to your actual data, SAS data sets contain information about the data set such as its name, the date that you created it, and the version of SAS you used to create it. SAS also stores information about each variable, including its name, type (numeric or character), length (or storage size), and position within the data set. This information is sometimes called the descriptor portion of the data set, and it makes SAS data sets self-documenting. 3 Beginning with SAS 9, format names can also be 32 characters long, and informat names can be 31 characters (including the $ for character values). Prior to SAS 9, format names could be 8 characters while informat names could be 7 characters (also including the $). Librefs and filerefs must be 8 characters or fewer in length, and member names for versioned data sets must be 28 characters or fewer. 4 It is possible to use special characters, including spaces, in variable names if you use the system option VALIDVARNAMES=ANY and a name literal of the form ‘variable-name’N. See the SAS Help and Documentation for details.