Anda di halaman 1dari 173

Handout: SAS

Version: SAS/Handout/0608/1.0 Date: 30-06-08

Cognizant 500 Glen Pointe Center West Teaneck, NJ 07666 Ph: 201-801-0233 www.cognizant.com

Handout - SAS

TABLE OF CONTENTS
Introduction ...................................................................................................................................7 About this Module .........................................................................................................................7 Target Audience ...........................................................................................................................7 Module Objectives ........................................................................................................................7 Pre-requisite .................................................................................................................................7 Session 02: Introduction to SAS / Getting Started .....................................................................8 Learning Objectives ......................................................................................................................8 Introduction to SAS Programming Language ...............................................................................8 BASE SAS Software .....................................................................................................................9 Why SAS? ....................................................................................................................................9 Multi Vendor Architecture (MVA) ................................................................................................10 Applications ................................................................................................................................10 Overview of SAS Products .........................................................................................................10 Getting Started............................................................................................................................12 Steps of a SAS Program ............................................................................................................13 DATA Step vs. PROC Step ........................................................................................................14 Flow Diagram of a SAS Program ...............................................................................................14 Data types in SAS.......................................................................................................................15 Summary ....................................................................................................................................15 Test your Understanding ............................................................................................................15 Session 03: Getting Started.........................................................................................................16 Learning Objectives ....................................................................................................................16 Missing Value Representation in SAS ........................................................................................16 SAS Programming Rules............................................................................................................17 Rules for Creating Variable Names ............................................................................................17 My First SAS Program ................................................................................................................17 SAS Windowing Environment.....................................................................................................18 Try It Out .....................................................................................................................................22 Summary ....................................................................................................................................22 Test your Understanding ............................................................................................................23 Session 04: Basic Concepts .......................................................................................................24 Learning Objectives ....................................................................................................................24

Page 2 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
_N_ & _ERROR_ ........................................................................................................................24 Program Data Vector (PDV) .......................................................................................................24 DATA Step's Built-in Observation Loop......................................................................................27 SAS Program Flow of Execution ................................................................................................27 Reading from External File .........................................................................................................30 Try It Out .....................................................................................................................................35 Summary ....................................................................................................................................36 Test your Understanding ............................................................................................................36 Session 05: Basic Concepts/Working with the DATA Step .....................................................37 Learning Objectives ....................................................................................................................37 Variable Declaration ...................................................................................................................37 Reading same record more than once .......................................................................................38 Scope of DATA and PROC Steps ..............................................................................................39 Operators in SAS ........................................................................................................................40 Commenting in SAS ...................................................................................................................42 SAS Data Libraries .....................................................................................................................42 Reading a SAS Dataset ..............................................................................................................44 Try It Out .....................................................................................................................................46 Summary ....................................................................................................................................47 Test your Understanding ............................................................................................................48 Session 07: Working with the DATA step ..................................................................................49 Learning Objectives ....................................................................................................................49 Dataset Options and Options Statement ....................................................................................49 SAS Informats & Formats ...........................................................................................................50 Working with SAS Date and Time ..............................................................................................52 Styles of input .............................................................................................................................54 Writing to an external file ............................................................................................................56 Try It Out .....................................................................................................................................58 Summary ....................................................................................................................................60 Test your Understanding ............................................................................................................60 Session 09: SAS Procedures ......................................................................................................61 Learning Objectives ....................................................................................................................61 SAS Procedures .........................................................................................................................61 PROC PRINT..............................................................................................................................61 PROC CONTENTS.....................................................................................................................63 PROC SORT ..............................................................................................................................65

Page 3 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
PROC FORMAT .........................................................................................................................65 PROC DATASETS .....................................................................................................................66 Try It Out .....................................................................................................................................69 Summary ....................................................................................................................................70 Test your Understanding ............................................................................................................70 Session 11: SAS Programming Concepts .................................................................................71 Learning Objectives ....................................................................................................................71 Retaining Variable Values ..........................................................................................................71 Automatic Variables ....................................................................................................................72 Titles and Footnotes ...................................................................................................................74 Conditional Processing ...............................................................................................................75 Iterative Processing ....................................................................................................................77 Conditional Iterative Processing: ................................................................................................78 Other Data Step statements .......................................................................................................80 Try It Out .....................................................................................................................................81 Summary ....................................................................................................................................83 Test your Understanding ............................................................................................................83 Session 13: SAS Programming Concepts/Built-in Functions in SAS .....................................84 Learning Objectives ....................................................................................................................84 SAS ODS ....................................................................................................................................84 Arrays in SAS .............................................................................................................................85 Arithmetic Functions ...................................................................................................................87 String Functions ..........................................................................................................................90 Try It Out .....................................................................................................................................98 Summary ..................................................................................................................................102 Test your Understanding ..........................................................................................................102 Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets ...............104 Learning Objectives ..................................................................................................................104 Date Time Functions ................................................................................................................104 Combining Vertically .................................................................................................................108 Concatenating...........................................................................................................................109 Interleaving ...............................................................................................................................109 Combining Horizontally .............................................................................................................110 One-to-one reading ..................................................................................................................110 One-to-one merging .................................................................................................................111 Match merging ..........................................................................................................................111

Page 4 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Updating ...................................................................................................................................112 Performing JOINS in DATA Step..............................................................................................113 Try It Out ...................................................................................................................................114 Summary ..................................................................................................................................117 Test your Understanding ..........................................................................................................117 Session 18: Statistical Procedures...........................................................................................118 Learning Objectives ..................................................................................................................118 PROC FREQ ............................................................................................................................118 Multi-Threaded Processing .......................................................................................................120 PROC MEANS..........................................................................................................................121 PROC SUMMARY ....................................................................................................................124 PROC REPORT .......................................................................................................................124 Try It Out ...................................................................................................................................127 Summary ..................................................................................................................................130 Test your Understanding ..........................................................................................................130 Session 20: PROC SQL ..............................................................................................................131 Learning Objectives ..................................................................................................................131 PROC SQL Basics ...................................................................................................................131 The SELECT Statement and its Clauses .................................................................................132 Creating Output Tables ............................................................................................................133 Summarizing & Grouping Data .................................................................................................134 Querying Multiple Tables ..........................................................................................................134 Limiting no of rows to be read and displayed ...........................................................................135 Using Operators in PROC SQL ................................................................................................135 Calculated Values .....................................................................................................................136 Enhancing Query Output ..........................................................................................................137 CONCLUSION ..........................................................................................................................139 Try It Out ...................................................................................................................................139 Summary ..................................................................................................................................140 Test your Understanding ..........................................................................................................141 Session 22: Introduction to MACROS ......................................................................................142 Learning Objectives ..................................................................................................................142 SAS Macro................................................................................................................................142 Advantages of the SAS Macro Facility .....................................................................................142 Macro variables ........................................................................................................................143 Automatic and User defined macro variables...........................................................................145

Page 5 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Macro Processor and the flow of execution .............................................................................145 Creating macro variables in run time ........................................................................................147 Try It Out ...................................................................................................................................149 Summary ..................................................................................................................................151 Test your Understanding ..........................................................................................................151 Session 23: Introduction to MACROS ......................................................................................152 Learning Objectives ..................................................................................................................152 Macro Programs .......................................................................................................................152 Using Macro Parameters ..........................................................................................................153 Scope of Macro variables .........................................................................................................154 System Options ........................................................................................................................155 Condition execution in Macro ...................................................................................................158 Iterative processing in Macro....................................................................................................159 Built-in Macro Functions ...........................................................................................................159 Try It Out ...................................................................................................................................161 Summary ..................................................................................................................................162 Test your Understanding ..........................................................................................................162 Session 25: Help on SAS ...........................................................................................................163 Learning Objectives ..................................................................................................................163 Debugging SAS Programs .......................................................................................................163 Creating Efficient SAS Codes...................................................................................................166 Summary ..................................................................................................................................171 Test your Understanding ..........................................................................................................171 References ..................................................................................................................................172 Websites ...................................................................................................................................172 Books ........................................................................................................................................172 STUDENT NOTES: ......................................................................................................................173

Page 6 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Introduction
About this Module
This handout document Introduces the SAS programming language Explains the basic concepts in BASE SAS Touches the advanced concepts in BASE SAS

Target Audience
Entry Level Trainees

Module Objectives
After completing this module, you will be able to: Explain the SAS language Describe the basic concepts in SAS Work with the DATA step Explain procedures in SAS Explain SAS programming concepts Describe built-in functions in SAS Work with SAS Data Sets Work with statistical procedures Work with PROC SQL Describe MACROS

Pre-requisite
The trainee needs to have basic knowledge in programming language

Page 7 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 02: Introduction to SAS / Getting Started


Learning Objectives
After completing this session, you will be able to: Describe SAS Programming Language Explain the Multi Vendor Architecture List the Applications of SAS List the different SAS Products Explain what is a SAS Dataset Explain the steps of a SAS Program Describe the Datatypes in SAS

Introduction to SAS Programming Language


The SAS system began as a software system for Data Analysis & statistical work. Since then, SAS has evolved and made its presence in diverse fields. Today, SAS Systems analysis tools range from simple statistics to specialized analysis for econometrics & forecasting, statistical design, computer performance evaluation, Operation Research and Clinical Data Management. SAS finds its highest application in the field of Data Warehousing & Data Mining. SAS used to stand for Statistical Analysis System", now this acronym is not used and it is simply called as SAS. SAS was developed in the early 1970s by SAS Institute Inc., North Carolina. It is the most widely used statistical software. It is a very powerful tool for Data Warehousing and Data Mining. Also widely used in fields of banking, finance, drug development, clinical research, Pharmaceutical Industries, and so on. SAS provides a Complete Application Development Environment to cater to the four (Data centric) basic tasks: ACCESS Data MANAGE Data ANALYZE Data PRESENT Data

Page 8 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The descriptions about the above tasks are given below You can Access data from almost any source and in any format. o You can read and write data from text file or CSV (comma-separated-values) file to powerful database like Oracle, DB2, etc. Manage the contents of the data o SAS manages the contents of the data and stores them in a special form called SAS Dataset. You can use the SAS programming language or the built-in programs (Procedures) to perform different kind of analysis on the data.

Perform different kind of analysis on the data o

Present the analyzed reports in a variety of formats o Finally you can present the analyzed reports in a variety of formats including text or graphical format Many software applications are either totally menu driven, or totally command driven (enter a command -see the result). Base SAS software is neither totally menu driven nor totally command driven. With Base SAS software, you use statements to write a series of instructions called a SAS program, which communicates with the SAS system. This module introduces Base SAS software programming concepts.

BASE SAS Software


The SAS system is an integrated system of software products and the core of the SAS System is BASE SAS software, which consists of SAS language - a programming language that you use to manage your data SAS procedures - software tools for data analysis and reporting Macro facility - a tool for extending and customizing SAS software programs and for reducing repetitive codes. Output Delivery System (ODS) - a system that delivers output in a variety of easy-toaccess formats, such as MS Word, MS Excel, PDF, HTML, SAS data sets, etc,. SAS windowing environment - an interactive, graphical user interface that enables you to easily code, run and test your SAS programs.

Why SAS?
SAS System enables you to access data in almost any format no matter where or how they are physically stored. Can access data stored on different data bases as well as data on different computers - through Engines. Can use its data management facility to update, combine, rearrange, edit or subset data before analysis Its power, flexibility & ease of use enable you to gain strategic control of all your data processing needs. SAS System has a collection of ready-to-use programs called procedures for analyzing and presenting the data in a variety of formats according to the users requirement.

Page 9 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Also it has many statistical procedures for performing statistical analysis. It provides an exhaustive inventory of application development tools.

Multi Vendor Architecture (MVA)


MVA makes SAS Platform Independent. It facilitates applications that run on more than one computing environment. SAS applications work the same, look the same and produce the same results irrespective of your hardware or OS. This is possible because SAS System has a layered structure called Multi Vendor Architecture (MVA). This consists of a host specific component which is specifically written for each environment and the portable component which brings it a universal feel. You can develop SAS applications on one environment and run them in other environments without any changes.

Applications
Applications of SAS are diverse. Some of the fields where SAS finds its applications are given below, Application in the field of Data Warehousing and Data Mining Widely used in Clinical research/trials in developing and testing of drugs. Also used in the fields of Banking, pharmaceuticals. Statistical and mathematical analysis Business forecasting and decision support Operations research and project management Report writing and graphics Applications development SAS Systems analysis tools range from simple statistics to specialized analysis For econometrics and forecasting, statistical design, and Operation Research.

Overview of SAS Products


SAS licenses many different products. And most of the products are integrated, so you don't have to convert datasets (data) or start up another program to use the other products. The following is a partial list of SAS products with brief descriptions. You must have Base SAS software installed on your system to run most of these products. Base SAS Base SAS software includes the DATA step programming for data access, data manipulation and reporting using simple statistical and utility procedures. Must be installed on your system to run most of the other SAS products. SAS/ACCESS Allows you access data used by other software packages. You can read and, in some cases, write data in their native formats without having to leave SAS. Most of the popular database software is supported, and each has its own SAS/ACCESS product.

Page 10 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
SAS/AF Allows you to write your own interactive SAS applications. Applications written with SAS/AF software allow users quick-and-easy access to information without knowing the SAS language. SAS/ASSIST Is a menu-driven front end to SAS software. You make choices from menus, and SAS writes the program for you. Programs can be stored for later use. SAS/CONNECT Connects computers running SAS software. Data can be shared between the computers, and programs developed on one computer or operating environment can be transferred to another for processing. SAS Enterprise Guide Providing a graphical user interface to power SAS. This is a Windows only product, but can be sed to access SAS servers on other systems. SAS Enterprise Miner A data mining tool and it is a complete product in itself. It provides an easy-to-use front-end to the SEMMA (Sample, Explore, Modify, Model, Assess) process for business users. SAS/GRAPH Produces high-resolution plots, charts, and maps. SAS/MDDB Server Allows you to save data in multidimensional database (MDDB) formats for use with online analytical processing (OLAP) (otherwise known as slicing and dicing your data). SAS/STAT Statistical analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis. SAS/Warehouse Administrator Simplifies the creation and maintenance of data warehouses. SAS Enterprise Business Intelligence Server Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with the popular reporting tools like Business Objects and Cognos. SAS Business Intelligence gives you the information when you need it, in the format you need. The SAS Difference Other vendors provide business intelligence solely in the form of historical reports that give you hindsight but limited insight. SAS Business Intelligence allows you to understand the past, monitor the present and predict outcomes as you move your business ahead.

Page 11 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
SAS/ETL (Extraction, Transformation and Loading): Extract, cleanse, transform, load and manage data from a single environment SAS provides integrated ETL capabilities that enable organizations to extract transform and load data from across the enterprise to create consistent, accurate information. SAS is a modular product. That is, it requires a number of modules to run, such as BASE SAS. However, after the BASE SAS module is installed, you have the choice to add whatever additional modules to add functionality to SAS. For example, SAS/STAT module adds the capability for statistical analysis. SAS/GRAPH adds the capability for high-resolution graphics and so forth.

Getting Started
SAS Datasets SAS own way of storing the data Before you can analyse your data and produce a report with SAS software, the data must be in a special form the SAS system can understand. This form is called SAS data set. It consists of two portions: Descriptor Information Data Values

Descriptor Information:
The Descriptor information describes the contents of the SAS dataset to the SAS system. It contains the information like: Dataset name Date created/modified Version no of the SAS system No of variables & Observations Info about each variable Variable name/data type/ length/position within the dataset and etc.

Data Values:
The Data values or the Data portion contains the actual data that have been collected. The data is organized into a rectangular structure containing rows called observations and columns called variables. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.

Page 12 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Steps of a SAS Program


SAS programs are constructed from two basic building blocks: DATA step PROC step

DATA Step:
DATA step reads data from any source. Using Data step you can read data from text or csv file to databases like Oracle, DB2, etc. Combine existing SAS datasets in a DATA step You can transform and analyze the data Write programming statements to modify the data Finally you can write-out the processed data to a SAS Dataset or an external file

PROC Step:
PROC stands for Procedure step. The PROC step recognizes only SAS datasets and not other files. It takes a SAS dataset, analyze the data and generate results / reports. It can also produce the results in graphical form like Graphs / Charts The results can be written to an Output SAS Dataset as well. There can be any number of DATA or PROC steps in a SAS program A typical program starts with DATA step to create a SAS data set and then passes the dataset to a PROC step for processing. Here is a simple program that converts miles to kilometers in a DATA step and prints the results with a PROC step:

Page 13 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

DATA Step vs. PROC Step


The following table differentiates a DATA step from a PROC step: DATA STEP Start with the keyword DATA Ends with RUN Read and modify data I/P: Data from any source O/P: SAS Dataset/file PROC STEP Start with the keyword PROC Ends with RUN Perform specific analysis or function I/P: Only SAS Datasets O/P: Reports / SAS Dataset

Flow Diagram of a SAS Program

RAW data is given an input to the SAS DATA step DATA step reads the data using SAS statements and creates a SAS Dataset as output The created SAS Dataset is given as input to the SAS Procedure step The PROC step generate the Reports

Page 14 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Data types in SAS


There are only two data types available in SAS NUMERIC: By default a variable is considered as Numeric CHARACTER: Character variable should be followed by a $ symbol The default length of Character and Numeric variables is 8 bytes

Summary
SAS provides a Complete Application Development Environment to cater to the four basic tasks: ACCESS, MANAGE, ANALYZE & PRESENT Data The SAS system is an integrated system of software products and the core of the SAS System is BASE SAS software. MVA makes SAS Platform Independent. It facilitates applications that run on more than one computing environment. SAS is used in almost all the fields. SAS licenses many different products. And most of the products are integrated, so you don't have to convert datasets (data) or start up another program to use the other products. Base SAS is the core software. SAS Datasets is SAS own way of storing the data. SAS Datasets consists of two portions: Descriptor Information & Data Values. SAS programs are constructed from two basic building blocks DATA step & PROC step. There are only two data types available in SAS, Numeric & Character

Test your Understanding


1. 2. 3. 4. 5. 6. List down some of the fields where SAS is used. What are the two portions of a SAS Dataset? What are the steps of a SAS program? Does SAS have a data type for storing Date values? List down some of the SAS products. What is MVA and what is its purpose?

Page 15 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 03: Getting Started


Learning Objectives
After completing this session, you will be able to: Explain missing value representation in SAS Explain SAS Programming Rules Code your first SAS program Describe SAS Windowing Environment

Missing Value Representation in SAS


Missing values are nothing but NULL values. Missing values are assigned to a variable when it is not populated or when the user tried to assign a character value to a numeric variable Character missing value is represented by spaces Numeric missing value is represented by a period (.) In the following dataset the value of Salary is missing in the 3rd observation (period - numeric missing value) and the value of Name is missing in the 4th observation (spaces - character missing value)
Dataset Name: WORK.EMP Date Created : 03/25/2008 Date Modified: 03/25/2008 No of Observation : 4 No of Variables :4 Sorted : NO

EMPID 111 222 333 444

NAME RAMESH KUMAR SHANTHI M M F F

GENDER

SALARY 1000 2000 . 4000

Numeric missing value

Character missing value

Page 16 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

SAS Programming Rules


The rules for writing SAS programs are listed below. Every SAS statement must end with a semicolon (;) SAS statements are not case-sensitive (can be in be in Upper or Lower case) SAS is a free-formatted language o o o A SAS statement may begin in any column Several SAS statements may appear on the same line A SAS statement can flow-over multiple lines, i.e., you can begin a statement on one line and continue it on another line

SAS Keywords can be used as variable names. Example

DATA Data; Run = 1; Run;


SAS is intelligent enough to differentiate a variable from a keyword Declaration of variable is not required, but it is always a good practice to declare the variables.

Rules for Creating Variable Names


The rules for naming SAS data sets and variables are the same. must be 1 to 32 characters in length must start with a letter (A-Z) or an underscore (_) Can continue with any combination of numbers, letters, and underscores. No other special characters are allowed except underscore. Default data lengths of Character and Numeric variables are 8 bytes. A character variable can hold up to 32,767 characters of data.

My First SAS Program DATA EMP; INPUT EMPID NAME $ SAL ; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN; PROC PRINT DATA = EMP; RUN;

Page 17 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Explanation: INPUT Statement: The INPUT statement reads data lines (observation) and assigns values to the SAS variables that correspond to the data fields. Since Name is a character variable it is followed by $. DATALINES statement: Use the DATALINES statement with an INPUT statement to read data entered in the program rather than from an external file. The DATALINES statement indicates the end of the DATA step and the beginning of the input data values. DATALINES assumes that the data follows immediately, that is, the data is 'instream' or within the program. You can also use CARDS statement instead of DATALINES. The functionality of both statements are same. Guidelines: 1. Must be the last statement in the DATA step (that is, place the DATALINES statement directly before the first data line.) When the compiler comes across the statement DATALINES; then it reads subsequent lines as data rather than source code. 2. Terminate the data with a semicolon in a new line. OUTPUT Statement Writes the value of the variables EMPID, NAME and SAL to the Dataset EMP PROC PRINT Procedure PRINT procedure prints the contents (data portion) of the SAS dataset

SAS Windowing Environment


SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks we need to do. When you first start SAS, five main SAS windows will be opened Program Editor or Editor Log window Output window Explorer Results

Program Editor
We can use program editor window to enter, edit, and submit SAS programs SAS color codes different parts of the program. Extension of the SAS program is .sas

Page 18 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Log window
The Log window displays: Messages about the SAS session How the SAS program was executed Notes, errors, or warnings thrown during the execution of a SAS program Time taken by SAS system to process the program Extension of the log file is .log

Output window
The Output window displays the output of the SAS programs that we submit. Extension of the output file is .lst

Page 19 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

If we create HTML output, it will be opened in Results Viewer window, which is the internal browser for SAS.

Explorer window
Explorer Window gives easy access to the SAS files and libraries. Use this window to: View and manage SAS files create new SAS libraries and SAS files open any SAS file perform most file management tasks such as moving, copying, and deleting files Create file shortcuts

Page 20 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Results window:
Table of contents for your Output window. The result tree lists each part of your results in an outline form. It helps us to navigate and manage output from SAS programs that we submit. We can view, save, and print individual items of output.

Page 21 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Try It Out Problem Statement


Write a program to read the raw data stored as Instream data and create a SAS System data set called ALL that contains the following variables, in the order listed: ID, HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure). Input Data: A1 68 130 80 B3 101 148 86 C2 . . 72 D1 72 140 88

Code DATA ALL; INPUT ID $ HR SBP DBP; OUTPUT ALL; DATALINES; A1 68 130 80 B3 101 148 86 C2 . . 72 D1 72 140 88 ; RUN;
Refer File Name: 3.1.sas to obtain soft copy of the program code

How It Works
INPUT statement reads the values of ID, HR, SBP & DBP respectively and stores it in the PDV. OUTPUT statement writes the values of the user-defined variables to the dataset.

Summary
Character missing value is represented by spaces Numeric missing value is represented by a period SAS Programming Rules Rules for creating Variable Names SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks we need to do.

Page 22 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Test your Understanding


1. How do you read in the variables that you need? 2. How are numeric and character missing values represented internally?

Page 23 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 04: Basic Concepts


Learning Objectives
After completing this session, you will be able to: Explain the automatic variables Describe the PDV concept Explain DATA Step's built-in Observation loop Describe SAS Program flow of execution Read from external file

_N_ & _ERROR_


During the Data Step execution, SAS creates two automatic variables _N_ & _ERROR_

_N_:
The _N_ variable counts the number of times the DATA step begins to iterate. Initially it is set to 1 and for each iteration it is incremented by 1. It behaves like a record counter, that is, while reading the first record _N_ is set to 1, while reading the nth record _N_ is set to n and so on. DATA step's Iteration No 1 2 10 n _N_ value 1 2 10 n

_ERROR_:
The _ERROR_ variable signals the occurrence of an error caused by the data during execution. By default it is set to 0, if any error occurs, this is set to 1. For Example: If any data error occurs, _ERROR_ is set to 1. Data error occurs when you try to assign a character value in a numeric variable.

Program Data Vector (PDV)


PDV is a temporary memory area where the values of variables are stored during execution time. It contains all the variables created in the Data step statements and the two automatic variables _N_ & _ERROR_.

Page 24 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Initially all the values of variables are set to missing except for _N_ & _ERROR_. _N_ is set to 1 and _ERROR_ is set to 0 initially. All variables are marked as either KEEP or DROP. The automatic variables _N_ & _ERROR_ are always dropped, so they will not be written to the output dataset. When the program encounters the Output statement or when the scope of the data set is reached: All values in PDV, except those marked to be dropped, are written as a single observation to the output data set. System returns to the Data statement to begin the next iteration All the variables are reset to missing except _N_ & _ERROR_. _N_ is incremented by 1. Input Buffer: If the program is reading from an external source, then SAS creates a temporary buffer space called Input buffer. Input buffer holds the current record being processed. Its default length is 256 characters and can be changed using the option LRECL. Understanding the PDV: Consider the below program

DATA EMP; INPUT EMPID NAME $ SAL ; NEWSAL = SAL + 100; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN;
When the above program is submitted, SAS allocates memory for Input buffer and PDV.

Page 25 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Step 1: During the program execution, SAS reads the first record and stores it in the Input buffer. The Input pointer is positioned at the beginning of the Input buffer. The following figure shows the position of the input pointer in the input buffer before SAS reads the data.

Step 2: The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values. After reading the first value the input pointer moves to the beginning of next value in the input buffer, from there the INPUT statement reads the value for second variable and so on. The below figure illustrates the process.

Step 3: After the INPUT statement reads a value for each variable, the next statement is executed. SAS computes a value for the variable NEWSAL from SAL and writes it to the PDV. All the programming statements read and write the values of variables from the PDV.

Page 26 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Step 4: When SAS encounters the OUTPUT statement or when it executes the last statement in the current DATA step, all the values in the PDV except those marked as DROP are written as single observation to the dataset EMP. Step 5: Before reading the next record all the variables are set to missing except the automatic variables _N_ and _ERROR_. _N_ is incremented by 1, so it becomes 2 _ERROR_ is 0, since no error was occurred.

DATA Step's Built-in Observation Loop


The code inside the data step is repeated to read from multiple records. The iteration continues until it reaches the End of File. At the end of every iteration of the observation loop, Values of all variables in PDV are written to the Dataset. Control returns to the top of the DATA step. Next iteration proceeds At the beginning of every iteration of the observation loops: Values of all variables in PDV are set to missing except automatic variables _N_ is incremented by one _ERROR_ is reset to 0

SAS Program Flow of Execution


The SAS System processes the DATA step in two phases: Compilation phase Execution phase

Page 27 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Compilation:

During the compilation phase: SAS checks the syntax of the SAS statements Establishes an area of memory called input buffer, if reading an external source/file. It allocates the memory for Program Data Vector (PDV) Assigns required attributes to variables like, its data type, length, position, etc., Builds the descriptor portion of the new dataset. Converts SAS Code into uppercase.

Execution:

Page 28 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. SAS sets the variables to missing in the program data vector (PDV). SAS reads a data record from a raw data file into the input buffer and then stores in the PDV. SAS executes any subsequent programming statements for the current record. When it encounters a OUTPUT statement or at the end of the DATA step, SAS writes an observation to the SAS data set The system automatically returns to the top of the DATA step. The same steps continue until there is no record to be read Control flow in DATA step:

During the compilation time, SAS builds the descriptor portion of the Dataset EMP. At the beginning of the execution: SAS reads the first observation from the raw file. The observation passes through every observation in the DATA step. When SAS encounters an Output statement or when the scope of the DATA step is reached the values of the variables are written to the Dataset EMP as observation one. When it reaches the RUN statement, the control goes back to the beginning of the DATA step for reading the subsequent observations. SAS reinitializes the PDV Now it checks whether a record is available to read. Since it is available SAS reads the second record and the record follows the same step as mentioned above. Similarly SAS reads the third record and write it to the Dataset. Then SAS checks the availability of the next record in the Input file. Since it is not available, SAS terminates the current DATA step and the control comes after the DATA step for executing the other steps.

Page 29 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Reading from External File INFILE Statement


Purpose: Identifies an external (raw data) file. An INFILE statement is used to specify the source of data read by the INPUT statement. If this statement is omitted then SAS considers it as in-stream data and reads the data from the DATALINES. General form:

INFILE 'raw-data-file' <options>;


Where,

raw-data-file - points to the raw data file being read options - affect how SAS reads the raw data file
Example:

DATA TEMP; INFILE C:\SAS\SASFILES\FILE.TXT; INPUT NAME $ SAL; RUN;


Instead of hard-coding full path of the raw-data-file, we can create a file reference to the Input file using FILENAME statement

FILENAME statement
General form:

FILENAME fileref 'filename';


Where,

fileref is a name that you associate with an external file.


It creates a logical link (short-cut) to the filename. Example:

FILENAME INP C:\SAS\SASFILES\FILE.TXT; DATA TEMP; INFILE INP; INPUT NAME $ SAL; RUN;
The following options of the INFILE statement affect how the data is read from the external file:

Page 30 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

LRECL (Logical RECord Length):


This specifies the maximum length of the record in the data file Its value changes the length of the Input buffer Its default length is 256 bytes Consider your input records maximum length is 1000 characters and you are not using the LRECL option, then SAS reads only the first 256 characters of data and omits the remaining data. So you have to change the value of LRECL to 1000 like, LRECL = 1000

DLM option (Delimiter)


This option is useful when the values of variables in the input files are separated by a delimiter other than blank. Example: Below data values are separated by comma (,). So , is the delimiter. The DLM option should look like:

DLM = , 111,RAMESH,1000 222,KUMAR,2000


The DLM option in INFILE statement should look like

INFILE INP DLM = ,

DSD (Delimiter Sensitive Data)


The DSD option sets the default delimiter to comma treats consecutive delimiters as missing values Ex: If DSD option is used while reading the below record, the value of Name is considered as missing.

111, ,1000
Enables SAS to read values with embedded delimiters if the value is surrounded by double quotes. Example: Consider the value for Name in the first observation is,

RAMESH,KUMAR i.e. with embedded delimiter (,)


When the following dataline:

111,RAMESH, KUMAR,1000

Page 31 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Is read with the INPUT statement, then the value of variables would be,

EMPID = 111 NAME = RAMESH SAL = . (missing value)


Because SAS considers the , in the Name RAMESH,KUMAR as a delimiter and takes RAMESH as the value for NAME KUMAR as the value for SAL. Since SAL is a numeric variable it assigns a missing value to it. If DSD option is used and the data value containing embedded delimiters is enclosed within double-quotes, SAS will consider RAMESH, KUMAR as the value for NAME and before storing the value it removes the enclosed quotes. Example:

111, RAMESH, KUMAR,1000 EMPID = 111 NAME = RAMESH, KUMAR SAL = 1000 END=
The END= option creates and names a temporary variable that acts as an end-of-file indicator. General Form:

INFILE fileref END=variable;


The temporary variable is set to 1 only when the INFILE statement reads the last observation from the input file. For all other observations it is set to 0

RECFM
Specifies the record format of the external file. Usually, the SAS System reads a line of data until a carriage return is encountered. However, sometimes more than one fixed-length record (records with same LRECL) occurs in a single line without carriage return characters. In this case, the option RECFM=F (fixed) needs to be specified to read the data. The default value is RECFM=V (variable) and it considers one record per line.

FLOWOVER / MISSOVER / TRUNCOVER


Consider the Data below,

Page 32 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Here, Each line should contain 4 data values Last and First names, Employee ID and Job Code. The grayed-out area denotes actual line lengths. Program:

DATA Test; INFILE "d:\infile\emplist.dat" <OPTIONS>; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
The code was submitted using different options on the INFILE statement.

FLOWOVER
Causes the INPUT statement to jump to the next record if it doesnt find values for all variables in the current record/line. This is the default option. Program:

DATA Test; INFILE "d:\infile\emplist.dat" FLOWOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
Contents of the Dataset Test (using FLOWOVER):

Page 33 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The INPUT statement is expecting the data for Jobcode in the positions 37-45, but the datavalue in the 2nd record is only till column 41.So the data is considered as incomplete and the INPUT statement goes to the next record and takes SMITH as the value for Jobcode.

MISSOVER
If SAS reaches the end of the line without finding values for all fields, variables without values are set to missing. Program:

DATA Test; INFILE "d:\infile\emplist.dat" MISSOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;
Contents of the Dataset Test (using MISSOVER):

The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. Since the value of Jobcode in the 2nd record is incomplete, SAS assigns a missing value.

TRUNCOVER
This option acts similar to the MISSOVER Also it takes partial values to fill the first unfilled variable. Program:

DATA Test; INFILE "d:\infile\emplist.dat" TRUNCOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;

Page 34 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Contents of the Dataset Test (using TRUNCOVER):

With the TRUCOVER option is place, SAS reads all the columns and all the Observations correctly. The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. So the data is incomplete. In this case: MISSOVER assigns a missing value to Jobcode TRUNCOVER assigns partial value(only 5 chars) to the unfilled variable Jobcode

Try It Out Problem Statement


Write a program to read the raw data stored in an external file C:\SASFILES\VITAL.CSV, and create a SAS System data set called ALL. VITAL contains the following variables, in the order listed: ID, HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure). Create a file reference to the external file. Contents of external file VITAL: A1,68,130,80 B3,101, 148,86 C2,.,.,72 D1, 72, 140 , 88

Page 35 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Code FILENAME INP C:\SASFILES\VITAL.CSV ; DATA ALL; INFILE INP DLM=','; INPUT ID $ HR SBP DBP; OUTPUT ALL; RUN;
Refer File Name: 4.1.sas to obtain soft copy of the program code

How It Works
This is similar to the problem 3.1 It uses a FILENAME statement that creates file reference to the external file VITAL.CSV Since the input file is a csv (comma-separated-value) file, we are using the DLM option.

Summary
_N_ & _ERROR_ are automatic variables created by SAS during program execution. PDV is a temporary memory area where the values of variables are stored during execution time The code inside the data step is repeated to read from multiple records. The iteration continues until it reaches the End of File. The SAS System processes the DATA step in two phases, compilation & execution phase An INFILE statement is used to specify the source of data read by the INPUT statement. The options of the INFILE statement affect how the data is read from the external file.

Test your Understanding


1. What SAS statements would you code to read an external raw data file to a DATA step? 2. Are you familiar with special input delimiters? How are they used? 3. If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? 4. What is the Program Data Vector (PDV)? What are its functions? 5. At compile time when a SAS data set is read, what items are created? 6. What is _n_? 7. What does the RUN statement do? 8. Why SAS is considered self-documenting? 9. What is the purpose of _error_?

Page 36 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 05: Basic Concepts/Working with the DATA Step


Learning Objectives
After completing this session, you will be able to: Declare a variable in SAS Read same record more than once Describe the scope of DATA and PROC Steps Explain Operators in SAS Explain Commenting in SAS Explain SAS Data Libraries Read SAS Datasets

Variable Declaration
Variables can be declared using any of the following statements: LENGTH ATTRIB

LENGTH:
If not specified, SAS assigns a default length of 8 bytes to Character and Numeric Variables. Using LENGTH statement, we can explicitly assign the length and data type of variables. General Form:

LENGTH variable-name <$> length-specification ...;


Example:

Length Name $20 Age;

ATTRIB:
Using ATTRIB statement, we can associate the following attributes to variables in a single statement. Length & Data type Label Informat Format General Form:

ATTRIB variable-name attributes;

Page 37 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Attributes:

LENGTH=<$>length
specifies the length of variable. $ indicates, it is a character variable.

LABEL='label'
Associates a label with a variable.

INFORMAT=informat
associates an informat with a variable

FORMAT=format
associates a format with a variable Example:

ATTRIB Name length=$20 label='Name of Employee ;


Note: Labels, Informat, Format are discussed later

Reading same record more than once


By default each INPUT statement reads a separate record from the Input file.

Single Trailing @
The single trailing @ option holds a raw data record in the input buffer until, SAS executes an INPUT statement with no trailing @ or it reaches the bottom of the DATA step.

The Double Trailing @


The double trailing @ holds the raw data record across iterations of the DATA step until the end of the line is reached.

Page 38 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Single Trailing @ Versus Double Trailing @


The table below lists the difference between Single Trailing @ Versus Double Trailing @

Scope of DATA and PROC Steps


Scope of a DATA step begins with the keyword DATA and ends with one of the following: Keyword RUN Beginning of another data step (DATA keyword). Beginning of another proc step (PROC keyword). End of program Keyword ENDSAS ENDSAS terminates the current SAS program or session. CARDS or CARDS4 statement DATALINES or DATALINES4 statement DATALINES4 / CARDS4 is used to read input values when the data contain semicolons. Implicit Output Statement: By default, every DATA step contains an implicit OUTPUT statement at the end. This tells the SAS System to write the current observation to the data set at the end of every iteration.

Page 39 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Presence of an explicit OUTPUT statement turns-off the implicit one. Fig 1: Observations are written to both the datasets DATA1 and DATA2 Fig 2: Since an explicit OUTPUT statement is present, observations are not written to the dataset DATA2.

Operators in SAS
Operators in SAS are classified into Arithmetic Operators Comparison Operators Logical Operators

Arithmetic Operators

Page 40 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Comparison Operators

** Examples of using the IN operator if VNUM in (1,20,55,79,100,500) then TRUE, if the value of the variable VNUM is found in the given list.

Logical Operators

Other Operators: || --Concatenation >< --Minimum <> --Maximum Concatenation (||) Operator: To concatenate two character strings. Ex: name=Jacob || son MIN (> <) and MAX (< >) Operator: To find the minimum or maximum of two values Ex: x=a >< b; /* x returns minimum of a & b */ x=a <> b; /* x returns maximum of a & b */

Page 41 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Commenting in SAS
There are two styles of commenting available in SAS Multi-line commenting Single line commenting

Multi-line commenting:
For multi-line commenting, enclose the comments in between /* and */ Example:

/* INPUT NAME $ SAL ; SAL = SAL + 1000; */ Single line commenting:


For single line commenting, include the comments in between an asterisk and a semicolon. Example:

*SAL = SAL + 1000; SAS Data Libraries


SAS data library is a collection of one or more SAS data files. It is simply a directory or folder, where we can store SAS Data sets and other SAS files. You can think of a SAS data library as a drawer in a filing cabinet and each data set as a file folder in the drawer.

General form of a SAS Data set: SAS Dataset name is of two levels

<Library Name>.Dataset Name


Where: Library Name represents where the dataset is stored. Library Name is optional; if you omit the library name, then the dataset is stored in the default library WORK.

Page 42 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

WORK is a system defined temporary library. All the datasets stored in WORK library will be deleted at the end of the session. If you want to create a permanent dataset, it needs to be created in a user-defined SAS library. Example: The dataset Admit is stored in the user-defined library Clinic.

LIBNAME Statement
This statement is used to create a user-defined library. General Form: LIBNAME libref 'SAS-data-library'; Where, SAS-data-library - is the path of a directory in a secondary storage device in which, SAS data files are stored. libref - represents a library reference to the above mentioned directory. It creates a logical link (short-cut) to the SAS-data-library Example:

LIBNAME ABC C:\SAS\SASFILES;


Just as you assign a fileref by using a FILENAME statement, you assign a libref by using a LIBNAME statement. Filerefs perform the same function as librefs: they temporarily point to a storage location for data. However, librefs reference SAS data libraries, whereas filerefs reference external files.

Page 43 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Reading a SAS Dataset SET Statement:


SET statement is used to read an existing dataset. General Form:

SET SAS-data-set <options>;


The SET statement points to the SAS dataset(s) to be read. Options in the SET statement affect how the data is read.

KEEP / DROP:
By default, SAS will write all variables and observations to the output dataset. Using the Dataset options KEEP & DROP, you can make SAS to write only specific variables or observations to the Output Dataset. KEEP: The KEEP option names variables you want to read from a dataset. Example:

SET EMP (KEEP = ID NAME);


This statement reads only the variables ID & NAME from the dataset EMP. DROP: Names variables you want to omit from the dataset. Example:

SET EMP (DROP = SAL);


This statement reads all the variables except SAL from the dataset EMP. You can also use the KEEP/DROP option in the Output Dataset also. Example:

DATA ALL (KEEP = ID NAME); SET EMP; RUN;


The below tables shows the difference between using the KEEP/DROP options in Input Dataset versus Output Dataset.

Page 44 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

FIRSTOBS and OBS


Use FIRSTOBS=n to cause processing to begin at the nth observation. Use OBS=n to cause processing to stop at the nth observation. Default value:

FIRSTOBS = 1 OBS = MAX


MAX points to the last observation in a dataset. Example:

DATA ALL; SET EMP FIRSTOBS = 100 OBS = 300; RUN;


There will be 201 observations read from the dataset EMP. Alternative approach: You can also achieve the same result of FIRSTOBS & OBS using _N_ with an IF condition Example:

DATA ALL; SET EMP; IF _N_ >= 100 AND _N_ <= 300 THEN OUTPUT ALL; RUN;
Which method is efficient and why? END =: We can also use the END= <variable> option with the SET statement. WHERE Statement: To filter the observations from the Input Dataset:

Page 45 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

SET EMP; WHERE SAL > 10000;


The above code reads only those records whose SAL value is > 10000.

Try It Out Problem Statement 1


Write a program to read the raw file and create a dataset named EMP. Note that the data values are in two different layouts and Rectyp @1 specifies the layout type. If Rectyp = A - Name ID Sal If Rectyp = B - ID Sal Name Verify whether all the data values were read correctly by printing the contents of the dataset EMP. A ABISHEK 12345 10000 B 67890 20000 DAVID A KANNAN 12367 30000 B 67456 40000 KUMAR

Code DATA EMP; INFILE INP; INPUT RECTYP $ @; IF RECTYP = A THEN DO; INPUT NAME $ ID SAL ; END; ELSE DO; INPUT ID SAL NAME $ ; END; RUN; PROC PRINT DATA = EMP; RUN;
Refer File Name: 5.1.sas to obtain soft copy of the program code

Page 46 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Problem Statement 2


Read the dataset EMP (created in the problem 5.1) into a new dataset EMPNEW and store it in a user-defined library LIB. Also follow the below conditions. 1. Read only the observations whose Salary value is >= 20000 2. Drop the variable Rectyp Verify the contents of the new dataset.

Code LIBNAME LIB C:\SASFILES\; DATA LIB.EMPNEW; SET EMP (DROP = RECTYP); WHERE SAL >= 20000; RUN; PROC PRINT DATA = EMP; RUN;
Refer File Name: 5.2.sas to obtain soft copy of the program code

How It Works
LIBNAME statement creates a permanent library named LIB. DROP option drops the variable RECTYP while reading WHERE statement selects only the observations whose SAL >= 2000

Summary
Variables can be declared using LENGTH or ATTRIB statements Single Trailing @ holds the current record for the next INPUT statement Holds the record until all the values are read from the current record Every DATA step has an implicit OUTPUT statement at the end. Implicit OUTPUT statement will not work if an OUTPUT statement is present. Operators in SAS are classified into Arithmetic, Comparison, Logical and Other operators There are two styles of commenting available in SAS SAS data library is a collection of one or more SAS data files. SET statement is used to read an existing dataset. Options in the SET statement affect how the data is read.

Page 47 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Test your Understanding


1. What is the purpose of the trailing @ and the @@? How would you use them? 2. If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables? 3. How do you control the number of observations and/or variables read or written? 4. How would you create multiple observations from a single observation?

Page 48 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 07: Working with the DATA step


Learning Objectives
After completing this session, you will be able to: Familiarize with the Dataset Options and Options statement Work with SAS Informats and Formats Work with SAS Date and Time Explain the Styles of input Write to an external file

Dataset Options and Options Statement Options Statement:


It is used to change SAS system options. The change(s) will remain in effect for the rest of the job/session or until changed again. The OPTIONS statement can appear at any place in a program, except within datalines or cards statements. General form:

OPTIONS <options>;
Where, options specifies one or more system options to be changed. Using the options statement you can also control the appearance of the output

NUMBER | NONUMBER and DATE | NODATE


The following OPTIONS statement suppresses the printing of both page numbers and the date and time in the output window. Example:

OPTIONS NONUMBER NODATE ;

Page 49 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS PAGESIZE=


The PAGESIZE= option specifies how many lines each page of output should contain.

LINESIZE=
The LINESIZE= option specifies how many characters each output line should contain

FIRSTOBS & OBS


We can also use the FIRSTOBS & OBS options with OPTIONS statement. Example:

OPTIONS OBS = 100;


All the datasets created after this statement will contain only 100 observations

PAGENO=
By default, page no start at 1 and are numbered sequentially throughout the SAS session. If you want to reset the page no or to start the page no with any other no, use this option. In the following example the output pages are numbered sequentially beginning with number 3 Example:

OPTIONS PAGENO=3; SAS Informats & Formats SAS Informat


Informat is used to read data in non-standard form. An Informat is a pattern / instruction that SAS uses to read data values into a variable.SAS interprets the format and converts it into standard character or numeric value. General form:

For example, the numeric value $1,234.56 contains special characters ($ and ,) , so SAS will not recognize it. To read such not standard values we need to use an informat (DOLLAR9.2 in this case) to tell SAS that the input data is in a particular format. Now SAS understands the pattern of the data and converts it into standard numeric value before assigning it to a variable $1,234.56 DOLLAR9.2 1234.56

Page 50 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Few Informats: w.d - Reads standard numeric data $w. - Reads standard character data $CHARw. Reads character data with blanks DOLLARw.d - reads numeric value and removes embedded comma, blanks, dollar sign, percent sign,or right parenthesis COMMAw.d similar to DOLLARw.d Example:

Raw Data Value 1234567 1234567 1234.567 JAMES JAMES $12,567


$CHARw. preserves the leading blanks

Informat 8. 8.2 8.3 $8. $CHAR8. COMMA7.0

SAS Data Value 1234567 12345.67 1234.567 JAMES JAMES 12567

Format
A Format is used to write data in non-standard form. A format is a pattern / instruction that SAS uses to write data values in the output. The General form of Format is same as that of Informat. Name of the Format is also same as that of Informat but the functionality is exactly reverse. For example, to display the value 1234.56 as $1,234.56 in a report, you can use the DOLLAR9.2 format 1234.56 DOLLAR9.2 $1,234.56 Few Formats: w.d Writes standard numeric data $w. - Writes standard character data $CHARw. Writes standard character data. DOLLARw.d Converts standard numeric value to DOLLAR w.d form and prints it in the output/report. COMMAw.d similar to DOLLARw.d format but wont prefix $ sign

Page 51 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Key Concept: Formats alter the external representation of the values of variables stored in SAS data sets. The internal value remains the same, but how we see it, outside of the data set, is controlled by the Format we choose to associate to the variable. Format and Informat statement: Format / Informat Statement is used to associate a format / informat to a variable. General Form:

Format variable format; Informat variable informat;


Example:

Format DOB date9. ; Informat SAL COMMA9.2 ; Working with SAS Date and Time
SAS stores the Date and Time values in Numeric form.

Date
SAS system stores the date values by converting dates into integers representing the number of days between January 1, 1960, and a specified date. SAS system can represent the dates between 1582 A.D and 20,000 A.D

Time
SAS System processes time values by converting it to integer representing number of seconds since midnight of the current day. SAS time values are independent of the date.

Page 52 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Datetime


Combines the Date and Time values as a single value. SAS System processes Datetime values by converting it to integer representing number of seconds since midnight of January 1st, 1960 and a specified Datetime. SAS reads and displays Date, Time and DateTime values through Informats & Formats. There are many Informat and Formats available in SAS for reading Data, Time, and DateTime values. Some commonly used Formats:

SAS Date Value 0 30 30 1 1 -1 366

Format MMDDYY8. DDMMYY10. YYMMDD10. DATE7. DATE9. WORDDATE WEEKDATE.

Displayed Value 01/01/60 31/01/1960 1960/01/31 02JAN60 02JAN1960 December 31, 1959 Sunday,January 1, 1961

YEARCUTOFF Option: This System option specifies the first year of a 100 year span used by Informats & functions. Based on this, the century values of dates are determined by SAS system if the year is specified as a two digit year. Default century is 1920 and can be overridden using the OPTIONS statement.

How it works: When a two-digit year value is read, SAS interprets it based on a 100-year span that starts with the YEARCUTOFF= value.

Page 53 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

DATE & TIME functions: Function TODAY TIME DATETIME DAY HOUR WEEKDAY MONTH MDY DATEPART Typical Use dt = today() ; tt = time() ; datetime = datetime() ; day=day(date); hh = hour() ; wkday=weekday(date); month=month(date); date=mdy(mon,day,yr); dt = datepart(datetime) Result today's date as a SAS date value current time as a SAS time value current time as a SAS DateTime value day of month (1-31) current hour (1 - 24) day of week (1-7) of the date value month (1-12) of the date value Combines mon, day, yr into SAS data value returns the SAS date value from the SAS Datetime value

Styles of input
There are different styles of inputs available. They are List Input Column Input Formatted Input

List input
List input uses a scanning method for locating data values. Example:

DATA EMP; length name $ 13; input Empid name $ Sal; cards; 111 LawrenceJames 2000 222 Martina 3000 ;

Page 54 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
For List input style: Data values must be separated by at least one blank or other defined delimiter. Character values cannot contain embedded blanks when the file is delimited by blanks. Fields must be read in order. Data must be in standard numeric or character format and you should not use Informats for reading. Missing values can be specified only by . If the length of character data is more than 8 characters, then SAS reads only the first 8 characters. This behaviour can be overridden by using the LENGTH statement.

Column Input
Column input enables you to read standard data values that are aligned in columns in the data records. To use column input, data values must be in the same column (field lines) for all the records and in standard numeric or character form. Example:

data scores; input Empid 1-10 Name $ 11-25 cards; 111LawrenceJames 2000 222 Martina 3000 333 George 4000 ;
Features:

Sal

27-35;

Character values can contain embedded blanks and can be from 1 to 32,767 characters long. No period is required for missing data. Input values can be read in any order, regardless of their position in the record. Values do not need to be separated by blanks or other delimiters. Both leading and trailing blanks within the field are ignored. Data must be within same columns on all input lines Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles data values of varying lengths appropriately.

Formatted Input
Formatted input combines the flexibility of using Informats with many of the features of column input. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. This is the most widely used styles of Input.

Page 55 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
General Form: INPUT @n variable-name informat. ...; Where,

@n: moves the pointer to the starting position of the field. variable-name: names the SAS variable being created. Informat Informat Name: Specifies how many positions to read and how to
convert the raw data into a SAS value. Example 1:

data scores; input name $15. +6 score1 comma5. cards; James 1,000 1,220 Martina 1,100 1,210 ; Run;
Example 2:

+8

score2 comma5. ;

data scores; input

@1 @21 @33

name score1 score2

$15. comma5. comma5.

; cards; James Martina ; Run;


Features:

1,000 1,100

1,220 1,210

Can read data in nonstandard form Character values can contain embedded blanks and can be from 1 to 32,767 characters long. No period is required for missing data. With the use of pointer controls to position the pointer, input values can be read in any order regardless of their positions in the record.

Writing to an external file


INFILE & INPUT statements are used to read from an external file. Similarly FILE & PUT statements are used to write the observations to an external file. FILE statement specifies the external output file that will be created.

Page 56 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
General Form:

FILE 'output-file' <options>;


Where, output-file - points to the raw data file being read options - affect how SAS reads the raw data file The following options used with the INFILE statement can also be used with the FILE statement. DLM DSD LRECL RECFM Example:

DATA TEMP; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;
The above program creates an output file and a dataset. But the goal of this SAS program is to create only a raw data file and not a SAS data set. So it is inefficient to list a data set name in the DATA statement. Using the _NULL_ Keyword: Using the keyword _NULL_ as the data set name causes SAS to execute the DATA step without writing observations to a data set. _NULL_ is a dummy dataset and it will not contain any observations in it. The same program can be re-written to create only an output file. Example:

DATA _NULL_; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;

Page 57 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Try It Out Problem Statement


Given the raw data file description below and sample data, write a SAS System DATA step to read this data file, and create a SAS data set called FIRESTATION. Use formatted input; do not use ending columns. Also print the values of DATE in DATE9. format and AMOUNT in COMMA13.2 format Description of the File FIRE.TXT: Variable Starting Name Column Length Format Description ----------------------------------------------------------------------------------------------------------CALL_NO 1 3 Numeric Call number DATE 5 8 MM/DD/YY Date of service TRUCKS 14 2 Numeric Number of trucks ALARM 17 1 Numeric Number of alarms AMOUNT 19 13 Numeric Amount spent Actual Data in File FIRE.TXT: 001 10/21/94 03 2 $12,300.00 002 10/23/94 01 1 $456,678.00 003 11/01/94 11 3 123,456.89

Code DATA FIRESTATION; INFILE 'FIRE.TXT'; INPUT @1 CALL_NO 3. @5 DATE MMDDYY8. @14 TRUCKS 2. @17 ALARM 1. @19 AMOUNT DOLLAR13.2 ; RUN; PROC PRINT DATA = FIRESTATION; FORMAT DATE DATE9. AMOUNT COMMA13.2 ; RUN;
Refer File Name: 7.1.sas to obtain soft copy of the program code

Page 58 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

How It Works
The Informat MMDDYY8. and DOLLAR13.2 converts the date and amount value into standard form and assigns in the variables The FORMAT statement applies the DATE9. and COMMA13.2 formats to the fields DATE and AMOUNT respectively. So the output appears in the specified format. Without the FORMAT statement SAS prints the values of DATE and AMOUNT in standard numeric form.

Problem Statement
Convert the contents of the Dataset FIRESTATION (created in problem 7.1) to a csv (commaseparated-value) file. Apply the following Formats to the variables. DATE - DATE9. format AMOUNT - COMMA13.2 format

Code DATA _NULL_; FORMAT CALL_NO 3. DATE DATE9. TRUCKS 2. ALARM 1. AMOUNT COMMA13.2 ; FILE OUT.CSV DLM = ,; PUT CALL_NO DATE TRUCKS ALARM AMOUNT ; RUN;
Refer File Name: 7.2.sas to obtain soft copy of the program code

How It Works
The FORMAT statement associates the Formats with the variables DLM = , option specifies that the output file is a comma-separated-value file. _NULL_ dataset suppresses the creation of a dataset.

Page 59 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Summary
OPTIONS statement to changes SAS system options. Informat is used to read data in non-standard form Format is used to write data in non-standard form SAS stores the Data & Time values as Numeric Based on YEARCUTOFF value, the century values of dates are determined by SAS system if the year is specified as a two digit year The different styles of inputs are LIST, COLUMN & FORMATTED input. FILE statement specifies the external output file that will be created _NULL_ is a dummy dataset and it will not contain any observations in it

Test your Understanding


What is the difference between an informat and a format? Name three informats or formats. What statement you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file? If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? 6. What is the one statement to set the criteria of data that can be coded in any step? 7. Approximately what date is represented by the SAS date value of 730? 8. Create a program for the following requirement Read the below mentioned raw data into a SAS dataset. prodid 4014 5785 3743 3298 prodname furniture carpet elect goods television quantity 108 322 488 467 1. 2. 3. 4. 5.

prodid field starts from position 1 and of data type numeric. prodname field starts from position 6 and of character data type. quantity field starts from position 18 and numeric.

Page 60 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 09: SAS Procedures


Learning Objectives
After completing this session, you will be able to: Work with the following procedures: o PRINT o CONTENTS o SORT o FORMAT o DATASETS

SAS Procedures
Procedures are a Library of built-in programs or utilities for processing datasets and displaying results. PROC step: It begins with the keyword PROC and consists of a group of SAS statements that call and execute a procedure, with a SAS dataset as input. Procedures can use only datasets, and not other files. The procedures analyze the data and generate output as reports, charts, graphs, datasets, etc. Most of the SAS procedures work with the Data portion of the dataset.

PROC PRINT
PRINT procedure prints observations in a SAS dataset using all or some of the variables. The PRINT procedure can be controlled by the following statements and options. General Form:

PROC PRINT <options>; < Statements>; RUN;


Where, Statements: VAR variable-list; BY variable-list; SUM variable-list; LABEL Options: DATA=SAS-data-set Specifies the SAS data set to use Double - Writes a blank line between observations.

Page 61 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Label - uses variable labels as column headings (variable name is the default heading) Split='split character' PROC PRINT breaks a column heading when it reaches the split character and continues the header on the next line. Statements: VAR: Select variables that appear in the report and determine their order If not used, SAS prints the values of the all the variables. BY: Produce a separate section of the report for each BY group The dataset needs to be sorted using the BY variable before using the BY statement LABEL: LABEL statement is used to assign Labels to the Variables. LABEL option needs to be used with PROC PRINT to print the Labels. SUM: Adds the total values of numeric variables specified Sample Program:

PROC PRINT DATA = EMP LABEL SPLIT = '*'; VAR EMPID NAME SALARY; BY DEPTID; SUM SALARY; LABEL EMPID DEPTID NAME SALARY ; RUN; = = = = 'Employee * ID' 'Department * ID' 'Name of * Employee' 'Salary of * Employee'

Page 62 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Sample Output:
Label

BY DEPTID

SUM

PROC CONTENTS
PROC PRINT prints the Data portion of the Dataset whereas, CONTENTS prints the Descriptor portion. PROC CONTENTS describes the structure of the data set. It displays information at the Data set level and Variable level Data set level: All the below information comes under the Dataset level Name SAS Version no Creation/Modified date Number of observations Number of variables File size (bytes)

Page 63 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Variable level: The following information is under Variable level Name Type Length Formats Position Label General Form:

PROC CONTENTS DATA = <dataset> <options>;


Options: NOPRINT: Suppresses printed output. POSITON: list variables in order of position & not alphabetically (the default). Example:

proc contents data=test1; run;


Sample Program:

PROC CONTENTS DATA = EMP; RUN;


Sample Output:

Page 64 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

PROC SORT
This procedure sorts observations in a SAS dataset by one or more variables. It either modifies the existing dataset or writes into a new one. By default it sorts in ASCENDING order. General form:

PROC SORT DATA=SAS-data-set <OUT=SAS-data-set> options; BY <DESCENDING> BY-variable(s); RUN;


Where, DATA= option specifies the input data set. BY-variable(s) in the required BY statement specifies one or more variables whose values are used to sort the data. DESCENDING option in the BY statement sorts observations in descending order. OUT= option specifies the output data set that contains the data in sorted order. We may also use the following options: NODUPKEY - eliminates observations with duplicate BY values. DUPOUT = Writes only the duplicate observations to a new dataset. This option is available only from SAS v9.1. Example:

Proc Run;

sort data= transfer By empno;

nodupkey out = lib.trans;

PROC FORMAT
This procedure is used to create user-defined Formats and Informats for character and numeric variables. General form:

PROC FORMAT; VALUE format-name range1='label1' range2='label2' ... ; RUN;


Where, format-name names the format that you are creating.

Page 65 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The format name: Format names can be up to 32 characters long must begin with a dollar sign ($) if the format applies to character data cannot be the name of an existing SAS format cannot end with a number does not end with a period in the VALUE statement, but use a period while using it range specifies one or more variable values label is a text string enclosed in quotation marks. Example:

proc format; value $grade 'A'='Good' 'B'-'D'='Fair' 'F'='Poor' 'I','U'='See Instructor' Other = Miscoded ; run;
The keyword Other is similar to else statement. To create user-defined INFORMAT use the keyword INVALUE instead of VALUE. But usually we will not be using user-defined Informats for reading data values.

PROC DATASETS
The DATASETS procedure is used to manage SAS files in a SAS data library. With PROC DATASETS, you can: List the SAS files that are contained in a SAS library Copy SAS files from one SAS library to another Rename SAS files Delete SAS files Modify attributes of SAS data sets and variables within the data sets Create and delete indexes on SAS data sets The DATASETS procedure ends with a RUN statement or QUIT statement. Examples: 1. Prints the descriptor portion of all the datasets in WORK library.

proc datasets lib=WORK; contents data=_all_; quit;

Page 66 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
2. Copies all SAS files from the WORK library to the PERM library

proc datasets ; copy in=WORK out=PERM; quit;


3. Deletes the EMP data set from the PERM library, changes the name of the DEPTA data set to DEPTB

proc datasets library= PERM; delete EMP; change DEPTA = DEPTB; run;
MODIFY Statement: This statement in the DATASETS procedure is used to change specific dataset or variable attributes. This command allows you to specify formats, informats, and labels, rename variables, and create and delete indexes. The MODIFY command only works on one dataset at a time. The following example modifies the dataset income in COMPANY library by: Renaming the variable old to new Adding a label to variable new Setting a format for variable income Example:

PROC DATASETS LIBRARY= COMPANY; MODIFY income; RENAME old=new; LABEL new=originally called old; FORMAT income comma11.2; RUN;
The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Without index, while searching, SAS access and checks all the values in a dataset sequentially.

Page 67 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
INDEX: The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Index is used to quickly search a record from a large dataset For Example, you have to search a table based on the column Name and it does not have an index. In this case SAS begins with the first row and reads through all rows in the table.

An index is a SAS file that stores unique values for a specified column in an order, and includes information about the location of those values in the table that enable you to access a row directly, by value. For example, suppose you have created an index on column Name. Using the index, SAS will access the required row(s) directly, without having to read all the other rows.

Creating an index is useful: When you use a WHERE statement to filter observations? When merging with another dataset? In performing equijoin in PROC SQL, and so on This example uses the DATASETS procedure to create a Simple Index. Example:

proc datasets library=INDRAILWAY; modify TRNTKT; index create PNRNO / UNIQUE NOMISS; run;

Page 68 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
In the example, the TRNTKT SAS data set in the INDRAILWAY SAS data library is having a Simple index created for the PNRNO variable index create The INDEX CREATE statement is used to specify that an index is to be created. In the program PNRNO is the index variable. The UNIQUE option specifies that key variable values must be unique within the SAS data set. The NOMISS option specifies that no index entries are to be built for observations with missing key variable values

Try It Out Problem Statement


You have a SAS data set called HTWT which contains variables ID, HEIGHT, and WEIGHT. HEIGHT and WEIGHT are to be grouped as follows HEIGHT groupings: 0 to 36 = 1 37 to 48 = 2 49 to 60 = 3 > 60 = 4 WEIGHT groupings: 0 to 100 = 1 101 to 200 = 2 > 200 = 3 While printing group the observations by the values of HEIGHT Add all values of WEIGHT values Apply the label Employee ID to the variables ID. Also verify the descriptor portion of the dataset.

Code PROC FORMAT; VALUE HTFMT 036 = '1' 3748 = '2' 4960 = '3' 61HIGH = '4'; VALUE WTFMT 0100 = '1' 101200 = '2' 201HIGH = '3'; RUN; PROC SORT DATA = HTWT; BY HEIGHT ; RUN; PROC PRINT DATA = HTWT LABEL; BY HEIGHT ;

Page 69 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS SUM WEIGHT; LABEL ID = Employee ID; FORMAT HEIGHT = HTFMT. WEIGHT = WTFMT. ; RUN; PROC CONTENTS DATA = HTWT; RUN;
Refer File Name: 9.1.sas to obtain soft copy of the program code

How It Works
BY statement groups the observations by HEIGHT. Since the data needs to be grouped by HEIGHT, the dataset is sorted by HEIGHT . FORMAT statement applies the user-defined formats to LABLEL statement applies the label to the variable ID. Since we are using the LABEL statement, we should use the LABEL option in PROC PRINT to turn on the feature. HEIGHT & WEIGHT. SUM statement adds the values of WEIGHT from all the observations.

Summary
PROC PRINT Prints observations in a SAS dataset using all or some of the variables CONTENTS prints the Descriptor portion of a dataset Sorts observations in a SAS dataset by one or more variables and either modifies the existing dataset or writes into a new one. This procedure allows to define your own formats or informats for character or numeric variables. The DATASETS procedure is used to manage SAS files in SAS data libraries. The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset.

Test your Understanding


1. Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. 2. How would you delete observations with duplicate keys?

Page 70 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 11: SAS Programming Concepts


Learning Objectives
After completing this session, you will be able to: Retain Variable Values Explain Automatic Variables Describe Titles and Footnotes Differentiate Conditional Processing and Iterative Processing

Retaining Variable Values


SAS default behavior is to reset all the variable values to missing during the beginning of next iteration. So SAS will not hold the value of variables from the previous iteration. Using RETAIN statement you can override this default behavior.

RETAIN statement:
The Retain statement retains the value of the variable in the PDV across iterations of the DATA step. It initializes the retained variable to missing before the first execution of the DATA step if an initial value is not specified General Form:

RETAIN variable-name <initial-value> ;


Example: The below statement initializes the variable TOTSAL to 0 and causes it to retain its current value across iterations.

RETAIN TOTSAL 0;
Example:

DATA ALL; RETAIN TOTSAL 0; SET EMP END = EOF; TOTSAL = TOTSAL + SAL; IF EOF = 1 THEN OUTPUT ALL; RUN;
The dataset ALL has one observation and it contains the Total Salary of all the employees in the dataset EMP.

Page 71 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS SUM statement:


When creating an accumulating variable, an alternative to the RETAIN statement is the sum statement. SUM statement is a short-cut to the RETAIN statement. Instead of writing two statements, you can achieve the same task with a single SUM statement.

General form of the sum statement:

variable + expression;
Example: TOTSAL + SAL; In the above example, SAS Creates a variable named TOTSAL, if it is a new variable and initializes to zero Automatically retains the value of TOTSAL Adds the value of SAL to TOTSAL and ignores missing values

Automatic Variables
Finding the First and Last Observations in a Group: When you use the BY statement along with the SET statement, DATA step creates two temporary variables for each BY variable in the form FIRST.variable LAST.variable Their values are either 1 or 0. FIRST.variable and LAST.variable identify the first and last observation in each BY group.

Before using BY statement the Input dataset should be sorted using the BY variable. The BY statement in the DATA step enables you to process your data in groups. The Data Step and the values of the automatic variables are given below.

Page 72 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

DATA temp; SET all; BY dept; RUN;


Dept APTOPS APTOPS APTOPS FINACE FINACE FINACE FINACE SALES SALES Salary 20000 100000 50000 25000 20000 23000 27000 10000 12000 FIRST.Dept 1 0 0 1 0 0 0 1 0 LAST.Dept 0 0 1 0 0 0 1 0 1

Example: To find DEPT wise total salary of all the employees from the below data

The problem can be divided into three steps. 1. Set the accumulating variable to 0 at the start of each BY group. 2. Increment the accumulating variable with a sum statement (automatically retains). 3. Output only the last observation of each BY group.

Page 73 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Titles and Footnotes


To make your report more meaningful and self-explanatory, you can specify TITLE & FOOTNOTE statements. They are similar to Header and Footer in MSWord. The text given in the TITLE & FOOTNOTE statements appears in the Top and Bottom of every output page respectively. You can specify up to 10 TITLE & FOOTNOTE statements. General form, TITLE and FOOTNOTE statements: TITLE<n> 'text'; FOOTNOTE<n> 'text'; where, n is a number from 1 to 10 that specifies the title or footnote line 'text' is the actual title or footnote to be displayed. Example:

PROC PRINT DATA = EMP; TITLE2 Start of PROC PRINT Report ; TITLE4 Contents of the Dataset EMP; Footnote3 End of PROC PRINT Report; RUN;
Canceling Titles and Footnotes: TITLE and FOOTNOTE statements are global statements. That is, after you define a title or footnote, it remains in effect until you modify it, cancel it, or till the end of SAS session. The following statements clear the nth and its following Title/footnote statements.

TITLE<n> ; FOOTNOTE<n>;
To cancel all the titles or footnotes, specify a null TITLE1 or FOOTNOTE1 statement like,

TITLE1; FOOTNOTE1;

Page 74 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Conditional Processing


There are different forms of IF statement. They are given below. Type 1: Simple IF Statement:

IF <condition> THEN <Statement>;


Type 2: IFTHEN-ELSE Statement:

IF <condition> THEN <True Block Statement>; ELSE <False Block Statement>;


Type 3: IFTHEN-ELSE-IF Ladder:

IF <condition1> THEN <Condn1 True Block Statement>; ELSE IF <condition2> THEN < Condn2 True Block Statement>; ELSE <False Block Statement>;
If there is more than one statement in a particular block, then group them in a DO - END loop.

IF <condition> THEN DO; <True Block Statement>; END; ELSE DO; <False Block Statement>; END; SELECT-CASE:
You can also use SELECT groups in DATA steps to perform conditional processing. This is similar to SWITCH-CASE statement in C Language General form, SELECT group:

SELECT <(expression)>; WHEN-1 <(expression)> statement; WHEN-n <(expression)> statement; <OTHERWISE statement;> END;

Page 75 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Where,

Example: The following code assigns a value to variable Title based on the value of designation.

Select (designation); when ("PAT") Title="Programmer Analyst Trainee"; when ("PA") Title ="Programmer Analyst"; when ("A") Title ="Associate"; when ("SA) Title ="Senior Associate"; otherwise Title ="Manager"; end;
Subsetting IF statement: The subsetting IF statement causes the DATA step to continue processing only those raw data records or observations that meet the condition of the expression specified in the IF statement. General form:

IF condition;
if condition is true, continue to execute data step if condition is false, stop processing current observation and return to top of data step. In particular, if condition is false do not output the current observation being formed in the PDV Example:

Data PASS; input ID M1 M2 M3; TOT = M1 + M2 + M3; if TOT > 150; /*output obs only if TOT > 150*/ cards; 50 60 80 40 60 30 70 80 90 ; Run;
Only two observations will be written to the dataset

Page 76 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Iterative Processing


Iterative Statement is used to perform Repetitive calculations Eliminate redundant code Execute SAS code conditionally

DO Loop Processing
Statements within a DO loop executes for a specific number of iterations or until a specific condition stops the loop.

Iterative DO:
TYPE 1: DO index-variable=start TO stop <BY increment>; where, start specifies the initial value of the index variable. stop - specifies the ending value of the index variable. Increment optionally specifies a positive or negative number to control the incrementing of index-variable. If no increment is specified, the index variable is incremented by 1. This iterative DO statement executes statements between DO and END statements repetitively based on the value of an index variable. Example 1:

do i=1 to 12 by 4; <statements>; end;


Example 2:

do m=3.5 to 2.5 by -0.05;


Example 3:

do k = Begindate to Today() by 7;
TYPE 2: DO index-variable=item-1, <item-n>; Item-1 through item-n can be either all numeric or all character constants or they can be variables. The DO loop is executed once for each value in the list.

Page 77 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example for Type 2: 1:

do Month = JAN, FEB, MAR;


2:

do Fib = 1,2,3,4;
3:

do i=var1, var2, var3;

Conditional Iterative Processing:


We can use DO WHILE and DO UNTIL statement to stop the loop when a condition is met, rather than when the index variable exceeds a specific value.

DO WHILE
The DO WHILE statement executes statements in a DO loop while a condition is true. General form:

DO WHILE (expression); <additional SAS statements> END;


Expression is evaluated at the top of the loop. The statements in the loop never execute if the expression is initially false.

DO UNTIL
The DO UNTIL statement executes statements in a DO loop until a condition is true. General form:

DO UNTIL (expression); <additional SAS statements> END;


Expression is evaluated at the bottom of the loop. The statements in the loop are executed at least once. Sample Program:

data invest; do until(Capital > 20000); Year+1; Capital+5000; Capital+(Capital*.075); output;

Page 78 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS end; run; proc print data=invest noobs; run;


Sample Output:

Iterative DO + Condition DO : The DO WHILE and the DO UNTIL statements can be combined with the iterative DO statement. General form:

DO index-variable=start TO stop <BY increment> WHILE | UNTIL (expression); <additional SAS statements> END;
This is one method of avoiding an infinite loop in DO WHILE or DO UNTIL statements. Sample Program:

data invest; do year= 1 to 10 until(Capital > 20000); Capital+5000; Capital+(Capital*.075); output; end; run; proc print data=invest noobs; run;

Page 79 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Sample Output:

Other Data Step statements KEEP and DROP Statements


The KEEP & DROP statements are similar to the KEEP & DROP options. General form, DROP and KEEP statements: DROP variable(s); KEEP variable(s); Where, variable(s) identifies the variables to drop or keep. The DROP statement excludes specified variables from a data set. The KEEP statement includes only the specified variables. DROP & KEEP statement can be used anywhere in the DATA step. Example:

DELETE statement
The DELETE statement deletes observations from the data set being created. General Form: IF condition THEN DELETE; If condition is true, stop processing current observation and return to top of data step.In particular, if condition is true, do not output the current observation being formed in the PDV

Page 80 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

DATA EMP; INPUT ID NAME $ SAL; IF SAL >= 1000 THEN DELETE; SAL = SAL + 500; RUN;
THE SAL= statement is executed only when the SAL value is < 1000.

PUT STATEMENT
If PUT statement is used without a FILE statement, it writes the values of variables to the LOG file. General Form: PUT <variable list> <format specifier>; Use FILE PRINT; statement above the PUT statement to print the values in the OUTPUT window. Special SAS Names (Shortcuts): _NUMERIC_ - refers to all the numeric variables in a Dataset _CHARACTER_ - refers to all the character variables in a Dataset _ALL_ - refers to all the character & numeric variables in a Dataset

Try It Out Problem Statement


You have a SAS data set DIET which contains variables ID, DATE, and WEIGHT. There are four records per ID. The task is to create a new SAS data set DIET2 from DIET which contains only one record per subject, with each record containing the subject ID and the mean weight for the subject. As an additional "learning experience," rewrite the code using a sum statement (not a SUM function.) The sample data of data set DIET is shown below: Data Set DIET ID 1 1 1 1 2 2 DATE WEIGHT 10/01/92 155 10/08/92 158 10/15/92 158 10/22/92 158 09/02/92 200 09/09/92 198

Page 81 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
2 2 09/16/92 09/23/92 196 202

Code PROC SORT DATA = DIET; BY ID; RUN; DATA DIET2; SET DIET; BY ID; RETAIN MEAN_WT; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT = MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN; /**** The solution using a sum statement looks like this ****/ DATA DIET2; SET DIET; BY ID; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN;
Refer File Name: 11.1.sas to obtain soft copy of the program code

How It Works
BY statement reads the observations in groups and created the automatic variables. Use the automatic variables and RETAIN statement to calculate the mean WEIGHT of each Subject. An alternative way to do this problem is to use a SUM statement instead of the RETAIN statement.

Page 82 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Summary
The RETAIN statement retains the value of the variable in the PDV across iterations of the DATA step SUM statement is a short-cut of the RETAIN statement. FIRST.BY-variable and LAST.BY-variable identify the first and last observation in each BY group. The text given in the TITLE & FOOTNOTE statements appears in the Top and Bottom of every page There are different types of Conditional statements available in SAS. DO loop is used to perform repetitive calculations The KEEP & DROP statements are similar to the KEEP & DROP options The DELETE statement deletes observations from the data set being created. PUT statement writes the values of variables to the LOG file

Test your Understanding


1. 2. 3. 4. For what purpose would you use the RETAIN statement? What is the purpose of ODS statement? Explain about FIRST. & LAST. variables? Write a SAS program with minimum data steps for obtaining the following output. Input & Output datasets are as follows: Input Dataset Marks 15 28 78 35 90 67 87 Output dataset Marks Sum 15 15 28 43 78 121 35 156 90 246 67 313 87 400

Page 83 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 13: SAS Programming Concepts/Built-in Functions in SAS


Learning Objectives
After completing this session, you will be able to: Explain the SAS ODS concepts Describe SAS Arrays Work with Arithmetic and String Functions

SAS ODS
SAS Output Delivery System (ODS): ODS is designed to overcome the limitations of the traditional SAS output. ODS allows output from the Data Step & SAS procedures to present in a more useful and colorful way. Using ODS we can create output in a variety of formats, such as: html, xls, pdf, rtf, etc. To start output being delivered to ODS the general syntax is:

ODS <output-format> <options>;


To end output being delivered to ODS:

ODS <output-format> CLOSE;


Where, output-format is your output destination options you can specify the location of the output file. The output file will be created in the specified location and opened with SAS in a separate window called Report Viewer. HTML:

ODS HTML FILE = C:\SASFILES\TEST.HTML; < SAS Procedures> ODS HTML CLOSE;
All output from any procedure that exists between "ods html .... ; " and "ods html close;" statements will be sent to that ODS destination. XLS: Excel File

ODS HTML FILE = C:\SASFILES\TEST.XLS; < SAS Procedures> ODS HTML CLOSE;

Page 84 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
RTF: RTF stands for Rich Text Document and is supported by MS WORD.

ODS RTF FILE = C:\SASFILES\TEST.RTF; < SAS Procedures> ODS RTF CLOSE;
PDF:

ODS PDF FILE = C:\SASFILES\TEST.PDF; < SAS Procedures> ODS PDF CLOSE; Arrays in SAS
Arrays in SAS are different from arrays in other programming languages.A SAS array is a temporary grouping of variables under a single name. It exists only for the duration of current DATA step. An array is not a variable. Each variable in an array is called an element identified by a subscript that represents the position of the element in the array. When you use an array reference, the corresponding variable is substituted for the reference. Why use SAS arrays? To repeat an action or set of actions on each of a group of variables To create many variables with same attributes write shorter programs compare variables Perform table lookup General Form: ARRAY array-name {subscript} <$><length> <array-elements> <(initial-value-list)>; The ARRAY statement defines the elements in an array. These elements will be processed as a group. You can refer to elements of the array by the array name and subscript. The ARRAY statement: Must contain all numeric or all character elements Must be used to define an array before the array name is referenced Creates variables if they do not already exist in the PDV. Example:

ARRAY Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4;


Here, Qtr1 Qtr2 Qtr3 Qtr4 are the existing variables. Contents of the PDV are given below along with the ARRAY references.

Page 85 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Array Name CONTRIB groups the variables Qtr1, Qtr2, Qtr3 & Qtr4. The individual variables can be accessed by using the array name & a subscript. Example: Consider you have a dataset EMP with 50 numeric variables and you have to recode the value of all the numeric variables to 99, if its value is missing. If you are not using Array then you need to repeat the following statement 50 times in the DATA step.

IF Variable = . THEN Variable = 99 ;


With the use of Arrays, we can simplify our SAS program like the following one. Example:

Data

All; Set EMP; array nvar(*) _numeric_; do i=1 to dim(nvar); if nvar(i)= . then nvar(i)= 99; end;

Run;
nvar(*) dynamically calculates the no of elements Dim( ) - is an array function which returns the no of elements in an array. _numeric_ - is a keyword that refers to all the numeric variables

Page 86 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Built-in Functions in SAS: SAS has a number of in-built functions. Broadly they can be classified as : Arithmetic Functions String Functions Date Time Functions Each of these functions is described below.

Arithmetic Functions INT


Returns the integer value of the argument. Syntax

INT(argument)
Example Example X=INT(2.1) X=INT(-2.4) X=INT(3) X=INT(-1.6) X=2 X=-2 X=3 X=-1 Result

MAX
Returns the largest of non-missing values. Syntax:

MAX(argument,argument)
Example X1 = MAX(2,6,.) X2 = MAX(2,-3,1,-1) X3 = MAX(3,.,-3) X4 = MAX(OF X1-X3) Result X1=6.00000 X2=2.00000 X3=3.00000 X4=6.00000

OF keyword includes all the variables between X1 and X3 i.e., X1,X2 & X3

MIN
Returns the smallest of non-missing values. Syntax:

MIN(argument,argument)

Page 87 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example X1 = MIN(2,.,6) X2 = MIN(2,-3,1,-1) X3 = MIN(0,4) X4 = MIN( OF X1-X3) Result X1 = 2.00000 X2 = -3.00000 X3 = 0.00000 X4 = -3.00000

SUM
Returns the sum of the non-missing variables. Syntax:

SUM(argument,argument...)
Example X1 = SUM(4,9,3,8) X2 = SUM (14,9,13,8,.) X3 = SUM(OF X1-X2) Result X1 = 24.00000 X2 = 44.00000 X3 = 68.00000

MEAN
Returns the average of non-missing values. Syntax:

MEAN(argument,argument)
Example X1 = MEAN(2,.,.,6) X2 = MEAN(1,2,3,2) X3 = MEAN(OF X1-X2) Result X1 = 4.00000 X2 = 2.00000 X3 = 3.00000

MOD
Returns the remainder when the integer quotient of argument1 is divided by argument2. Syntax:

MOD(argument1,argument2)
Example X=MOD(6,3) X=MOD(10,3) X=MOD(11,3.5) X=MOD(10,-3) Result X=0.00000 X=1.00000 X=0.50000 X=1.00000

Page 88 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS ROUND


The ROUND function returns a value rounded to the nearest round-off unit. Syntax:

ROUND(argument,<round-off unit>)
Where, round-off unit is numeric and non-negative. If round-off-unit is not provided, argument is rounded to the nearest integer Example X=ROUND(223.456) X=ROUND(223.456,1) X=ROUND(223.456,.01) X=ROUND(223.456,100) Result X=223.00000 X=223.00000 X=223.46000 X=200.00000

CEIL
The CEIL function returns the smallest integer greater than or equal to the argument. Syntax:

NewVar = CEIL(argument);
Example:

X=CEIL(4.4);

X=5

FLOOR
The FLOOR function returns the greatest integer less than or equal to the argument. Syntax:

NewVar=FLOOR(argument);
Example:

Y=FLOOR(3.6);

Y=3;

Page 89 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS String Functions


You can convert data types either implicitly by allowing the SAS System to do it for you or explicitly with these functions: INPUT: Character-to-numeric conversion PUT: Numeric-to-character conversion SAS automatically converts a character value to a numeric value when the character value is used in a numeric context, such as: Assignment to a numeric variable An arithmetic operation Logical comparison with a numeric value A function that takes numeric arguments.

Explicit Conversion: INPUT


The INPUT function is used primarily for converting character values to numeric values. Syntax:

NumVar = INPUT(source,informat);
Example CVar1='32000'; NVar1=input(CVar1,5.); CVar2='32,000'; NVar2=input(CVar2,comma6.); CVar3='03may2008'; NVar3=input(CVar3,date9.); Result Nvar1 = 32000 Nvar2 = 32001 Nvar3 = 17655

PUT:
Converts numeric values to character and writes values with a specific format. Syntax:

CharVar = PUT(source,format);
Example NVar1=614; CVar1=put(NVar1,3.); NVar2=55000; CVar2=put(NVar2,dollar7.); NVar3=366; CVar3=put(NVar3,date9.); Result Cvar1 = 614 Cvar2 = 55000 Cvar3 = 366

Page 90 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The values of Cvar are stored in character form. ** The enclosed quotes are used just to represent that the values are stored in character form.

LENGTH
Returns the length of an argument. Syntax:

LENGTH(argument)
Example len = LENGTH(ABCDEF); len = 6 Result

RIGHT
The RIGHT function returns its argument right aligned. Trailing blanks are moved to the start of the value. Syntax:

RIGHT(argument)
Example:

a b

= =

due date RIGHT(a);

Variable b will hold a string due date shifted right three spaces with leading blanks instead of trailing blanks.

LEFT
Left aligns a SAS character expression. Syntax:

LEFT(argument)
Example:

a b

= =

due date ; LEFT ( a );

Above statements produce a character string due date shifted left three spaces with trailing blanks instead of leading blanks.

TRIM
The TRIM function removes trailing blanks from its argument.If the argument is blank, TRIM returns one blank.

Page 91 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Syntax:

TRIM(argument)
Example part1 = apple ; part2 = sauce; noblank = TRIM(part1) || part2; hasblank = part1 || part2 ; Result part1 = apple ; part2 = sauce; noblank = applesauce hasblank = apple sauce

Leading blanks will not be removed. To remove both leading & trailing blanks use LEFT & TRIM function like,

Variable = TRIM(LEFT(ARGUMENT)); STRIP


Strips leading and trailing blanks from a character variable or character string. This function is an enhancement in SAS v9. Example:

a b

= due date = STRIP(a);

Variable b will contain a character string due date without leading and trailing spaces.

LOWCASE
Converts all letters in its argument to lowercase. It has no effect on digits and special characters. Syntax:

NewVal=LOWCASE(argument);
Example a = STRONG ; b = LOWCASE ( a ); Result a = STRONG b = strong

UPCASE
Converts all letters in its argument to uppercase. It has no effect on digits and special characters. Syntax:

NewVal=UPCASE(argument);
Example a = cognizant b = UPCASE(a); Result a = cognizant b = COGNIZANT

Page 92 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS PROPCASE


Converts Text to proper case First character of each word in upper case All other characters are in lower case Syntax:

NewVar = PROPCASE(char_var);

Example a = cognizant b = PROPCASE(a);

Result a = cognizant b = Cognizant

COMPRESS
Removes specific characters from character expressions. Syntax:

COMPRESS(source<,characters-to-remove>)
where, source: specifies a SAS character expression. characters-to-remove: specifies the character or characters you want to remove from the source expression. If the second argument is omitted, by default it is taken as blank. Example a = AB C D ; b = COMPRESS(a); p = AB CDE; q = COMPRESS(p ,'D) ; Result a = AB C D ; b = ABCD p = AB CDE q = AB CE

REPEAT
Returns a character value consisting of the first argument repeated n + 1 times. Syntax

REPEAT(argument,n)
Example a = abc; b = REPEAT(a,3); Result a = abc; b = 'abcabcabcabc';

Page 93 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS SUBSTR


The SUBSTR function is used to extract or insert characters. Syntax:

NewVar = SUBSTR(string,start<,length>);
Example date = 06MAY89; month = SUBSTR(date,3,3); Result date = 06MAY89; month = MAY

Example: Extract two characters from Location starting at position 11.

INDEX
The INDEX function searches a source string value for the location of a specified Sub-string value and returns its location. Syntax:

Position = INDEX(source-string, sub-string);


The INDEX function returns the starting position of the first occurrence of value within target, if value is found. 0, if value is not found. Example: Example a = ABC.DEF (X=Y) ; b = X=Y; x = INDEX(a,b) ; Result a = ABC.DEF (X=Y); b = X=Y; x = 10

INDEXC
This function is similar to INDEX function, but the sub-string is considered as separate characters. Locates the first occurrence in the source of characters present in any of the excerpts. If the character string specified by any of the excerpts is not found in the source, value 0 will be returned.

Page 94 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
INDEX function searches for a character string in a source string but INDEXC function searches for individual characters Syntax:

INDEXC(source,excerpt-1<,excerpt-n>)
Example a = ABC.DEF (X=Y) ; x=INDEXC(a,0123456789,;( )=.); Result a = ABC.DEF (X=Y) ; x=4

TRANSLATE
It replaces specific character in a character expression. Syntax:

TRANSLATE(source, target-characters, replacement-characters)


Where, source: Specifies the SAS expression containing original character value target-characters: Specifies the characters you want TRANSLATE to use as substitutes. replacement-characters: Specifies characters you want TRANSLATE to replace.

Values of to and from correspond on a character-by-character basis. TRANSLATE changes character one of from to character one of to, and so on. If to have fewer characters than from, TRANSLATE changes the extra from characters to blanks. If to has more characters than from, TRANSLATE ignores the extra to characters. Example d = TRANSLATE ( xyzw,ab ,vw) d = xyzb Result

TRANWRD
The TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string. Syntax:

NewVal=TRANWRD(source,target,replacement);
Example:

Dessert = Pumpkin pie Dessert=tranwrd(Dessert,'Pumpkin','Apple');

Page 95 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Result: Dessert = Apple pie

VERIFY
Returns the position of the first character in the source string that is not in the check-string Syntax:

VERIFY(source,check-string);
Example Result

part1 = apple; check = abcdef; x = VERIFY(part1,check);

x = 2

In this case, the second character p of the string apple is not present in the excerpt abcdef and so the position of p is returned to the variable x.

SCAN
The SCAN function returns the nth word of a character value. It is used to extract words from a character value if they are separated by delimiters Syntax:

NewVar = SCAN (source-string, n <,delimiters>);


Where, source-string: Specifies the character variable or expression to scan n: Specifies which word to read <delimiters>: Delimiters are special characters that must be enclosed in single quotation marks (' '). If no delimiters are specified, SAS treats the following characters as delimiters

<
Example:

( + | & ! $ * ) ; ^

- /

, % > \

Phrase = software and services ; Second=scan(Phrase,2,' ');

Page 96 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The CAT Functions: The Version 9 family of CAT functions reduces complexity when concatenating strings. CAT CATT CATS CATX

CAT Function CAT CATS CATT CATX

What it Does Concatenate two or more character strings, leaving leading or trailing blanks unchanged. Identical to the concatenation operator [ || ]. Same as CAT but also strips both leading and trailing blanks prior to concatenation. Same as CAT but also TRIMS Concatenate two or more character strings, stripping both leading and trailing blanks, and inserting one or more user specified separation characters

Syntax: For CAT, CATS, CATX functions

CAT(string-1, string-2 <,string-n>)


Where,

string-1, string-2 <,string-n> are the character strings to be concatenated.


For CATX function

CATX(separator, string-1, string-2 <,string-n>)


Where,

separator is one or more characters, placed in single or double quotation marks, to be used as
separators between the concatenated strings. Example:

Page 97 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS A = Micky B = Mouse


CAT Function CAT CATS CATT CATX

Usage CAT_FN = CAT(A,B) CATS_FN = CAT(A,B) CATT_FN = CAT(A,B) CATT_FN = CAT(":",A,B) Result CAT_FN = "Micky Mouse " CATS_FN = "MickyMouse" CATT_FN = "Micky Mouse" CATT_FN = "Micky:Mouse"

Try It Out Problem Statement 1


You have a SAS data set SCORES, which contains an ID variable and a variable called STRING which holds five 1-digit scores. Write a SAS program to read this data set and create a new data set which contains an ID and five numeric variables X1 to X5, where the X's are each of the digits in STRING. Following are some sample data: Data Set SCORES ID STRING 1 12345 2 13243 3 53421

Code /* Solution without arrays: */ DATA NEW; SET SCORES; X1 = INPUT (SUBSTR(STRING,1,1),1.); X2 = INPUT (SUBSTR(STRING,2,1),1.); X3 = INPUT (SUBSTR(STRING,3,1),1.); X4 = INPUT (SUBSTR(STRING,4,1),1.); X5 = INPUT (SUBSTR(STRING,5,1),1.); KEEP ID X1X5; RUN; / * Solution using arrays: */ DATA NEW; SET SCORES; ARRAY X[5] X1X5; DO POINTER = 1 TO 5;

Page 98 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS X[POINTER] = INPUT (SUBSTR(STRING,POINTER,1),1.); END; KEEP ID X1X5; RUN;


Refer File Name: 13.1.sas to obtain soft copy of the program code

How It Works
Without using ARRAYs you may need to repeat the same statement multiple times. X1-X5 refers to all the variables between X1 to X5. Since the X variables are not existing ones they are created by SAS. INPUT function is used to convert the value to Numeric.

Problem Statement 2
You have clinical data in a SAS data set called CLINICAL which contains information on patient visits. Included in the data set are patient ID, DATE, BILLING (billing number), and DX (diagnosis code). You also have a list of DX codes and their descriptions. Using the following CLINICAL data and the list of DX codes and descriptions, create a new data set, NEW, which contains all the variables in CLINICAL plus a new variable (DESCRIP) which contains the DX description. Use PROC FORMAT and a PUT function as in Example 2 to solve this problem.

Code PROC FORMAT; VALUE DXCODE 1 = 'Cold' 2 = 'Flu' 3 = 'Asthma' 4 = 'Chest Pain' 5 = 'Maternity' 6 = 'Diabetes'; RUN; DATA CLINICAL;

Page 99 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS INFILE 'CLINICAL'; INPUT ID DATE : MMDDYY8. BILLING DX; RUN; DATA NEW; SET CLINICAL; DESCRIP = PUT (DX,DXCODE.); RUN;
Refer File Name: 13.2.sas to obtain soft copy of the program code

How It Works
Create a format for the values of Dxcode. Assign the description of DX to a new variable using PUT function and the format.

Problem Statement 3
You have a raw data file called TEMPER which contains temperature measurements taken at one hour intervals. Each raw data line contains several pairs of the variables HOUR (hour of the day) and TEMP (temperature). All temperatures are in degrees Fahrenheit unless they are written in the form nC (the number n followed by a C, no spaces), in which case they are expressed in degrees Celsius. In addition, a value of N was coded when a temperature was not obtained. Write a SAS program to read this data file, express all temperatures in degrees Fahrenheit, and convert each N to a numeric missing value. Hint: The conversion from Celsius to Fahrenheit is: F=9*C/5+32 Some sample records from file TEMPER are as follows: 1 68 2 67 3 N 4 20C 5 72 6 23C 7 75 8 N

Code DATA TEMP; INFILE 'TEMPER'; INPUT HOUR DUMMY $ @@; IF DUMMY = 'N' THEN TEMP_F = .; ELSE IF INDEX(DUMMY,'C') NE 0 THEN TEMP_F = 9*INPUT (SUBSTR(DUMMY,1,LENGTH(DUMMY)1),5.)/5 + 32; ELSE TEMP_F = INPUT (DUMMY,5.); DROP DUMMY; RUN;

Page 100 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Refer File Name: 13.3.sas to obtain soft copy of the program code

How It Works
Since more than one observation is in a single line we are using @@. Use INDEX function to find whether C appears in the value of DUMMY. If so extract the numeric part alone and convert it to Fahrenheit by using the given formula. Else convert the value of DUMMY to numeric.

Problem Statement 4
You have an instream raw data file of patient hospital stays with the following file layout: Starting Column Length Format Description _______________________________________________ 1 3 character Subject ID 4 6 mmddyy Admission date 10 6 mmddyy Discharge date 16 8 mmddyyyy Date of birth Here are some sample data: 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952 a) Write a program to create a SAS data set called DATES1, and list the resulting data set with PROC PRINT. Create variables ID, ADMIT, DISCH, and DOB from the given data, and also create the following new variables: i. AGE: Age in years on the date of admission (as of the last birthday) ii. DAY: Numeric day of the week of admission date (1=Sun, 2=Mon, etc.) iii. MONTH: Numeric month of year of admission date (1=Jan, 2=Feb, etc.) iv. NoWeek: Number of weeks patient stayed in the hospital b) Set up the DATA step so that the variables print with the following formats: i. ADMIT mm/dd/yy ii. DISCH mm/dd/yy iii. DOB ddMMMyyyy

Code DATA DATES1; INPUT @1 ID $3. @4 ADMIT MMDDYY6. @10 DISCH MMDDYY6. @16 DOB MMDDYY8.; AGE = INT((ADMITDOB)/365.25);

Page 101 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS DAY = WEEKDAY (ADMIT); MONTH = MONTH (ADMIT); NOWEEK = INTCK(WEEK, ADMIT, DISCH); FORMAT ADMIT DISCH MMDDYY8. DOB DATE9. ; DATALINES; 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952 ; RUN; PROC PRINT DATA=DATES1; RUN;
Refer File Name: 13.4.sas to obtain soft copy of the program code

How It Works
Since date is stored in no of days in SAS, just by subtracting DOB from Admit date and dividing it by 365.25, we get the persons AGE. (.25 = to include the leap year) INTCK function returns the number of intervals (WEEK in this case) between ADMIT date and DISCH date.

Summary
ODS allows output from the Data Step & SAS procedures to present in a more useful and colorful way. A SAS array is a temporary grouping of variables under a single name. SAS has a number of in-built functions. Broadly they can be classified as Arithmetic Functions, String Functions and Date Time Functions

Test your Understanding


Name and describe three SAS functions that you have used, if any? In ARRAY processing, what does the DIM function do? What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);? What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? 6. Which date functions advances a date time or date/time value by a given interval? 7. What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9); 1. 2. 3. 4. 5.

Page 102 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
8. What do the following do? INPUT PUT CATX SCAN SUBSTR TRIM MOD 9. Create a program for the following requirement Following is the data in a file: vinodM24 yahooF22 altavistaF18 googleF20 Read the data into a single variable and use functions to retrieve them into three variables Name Gender Age

Page 103 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets
Learning Objectives
After completing this session, you will be able to: Work with date time functions Describe concatenation Perform One-to-One reading Perform One-to-One merging Perform Match-Merging Perform JOINS in DATA step

Date Time Functions


A SAS date, time or date time variable is a special case numeric variable where the values are stored as number of days or seconds. So it is difficult to extract information manually from a date or time variable. SAS provides a bundle of Date & Time functions for extracting the required information from a SAS date or time variable.

DATE or TODAY
Returns the current date as a SAS date value representing the number of days between January 1, 1960 and the current date Syntax:

DATE( ) TODAY()

Example tday1 = DATE( ); tday2 = TODAY();

Result tday1 & tday2 will hold a value which is equal to the number of days between January 1 , 1960 and the date on which the statement is executed.

TIME
Returns the current time of the day as a SAS time value. Syntax:

TIME( )

Page 104 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Example TT = TIME( )

Result SAS system will assign the variable TT a SAS time value corresponding to 14:32:00 if the following statements is executed exactly at 2:32 p.m.

DATETIME
Returns the current date and time of a day as a SAS datetime value representing the number of seconds between January 1 , 1960 midnight and the current datetime. Syntax:

DATETIME( )
Example dttime = DATETIME( ); Result Variable dttime will hold a SAS value representing the number of seconds between January 1, 1960 midnight and the current datetime.

Extracting the parts of a SAS Date, Time or Datetime Variable: Function DAY MONTH YEAR Usage DAY(<date | datetime>) MONTH(<date | datetime>) YEAR(<date | datetime>) Decription Returns the day of the month from a SAS date or datetime value. Returns the MONTH value from a SAS date or datetime value. Returns the YEAR value from a SAS date or datetime value. Returns the QTR of the year from a SAS date or datetime value. JAN-MAR = 1Q; APR-JUN = 2Q JUL-SEP = 3Q; OCT-DEC = 4Q Returns the HOUR value from a SAS time or datetime value. Hour value ranges from 0 to 23 Returns the MINUTE value from a SAS time or datetime value. Returns the SECOND value from a SAS time or datetime value.

QTR

QTR(<date | datetime>)

HOUR

HOUR(<time | datetime>)

MINUTE SECOND

MINUTE(<time | datetime>) SECOND(<time | datetime>)

WEEKDAY
Returns a numeric value for the day of the week. Syntax: Wkdy = WEEKDAY(<date | datetime>) Returns the day of the week in numeric from a SAS date or datetime variable.

Page 105 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

1 - SUN 5 - THU

2 - MON 6 - FRI

3 - TUE 7 - SAT

4 - WED

MDY
Returns a SAS date value from month, day and year values. Syntax:

MDY(month,day,year)
There are separate variables for month, day and year. MDY function creates a SAS date variable using these values. Where, month: Specifies a numeric expression representing an integer from 1 through 12. day: Specifies a numeric expression representing an integer from 1 through 31. year: Specifies a numeric expression representing a specific year. Example m = 8 ; d = 27 ; y = 90 ; date1 = MDY(m,d,y); Result date1 will hold a value of 11196 which is the number of days between January 1, 1960 and August 27, 1990.

DATEPART / TIMEPART
A SAS System Datetime Variable contains information on both the date and time i.e., the number of seconds since January 1, 1960. To extract the DATE or TIME parts of a SAS datetime variable use, DATEPART function TIMEPART function Syntax:

DATEPART(datetime) TIMEPART(datetime)
Example: Thursday, Oct. 21, 2004 at 1300 hrs is represented in SAS DateTime value as 1413379800

Page 106 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Calculating Time Intervals: There are two ways to calculate the time interval between two dates: 1. Arithmetic operation on SAS date, time or datetime variables, or between a variable and a constant YEARS = (date2-date1)/365.25; MONTHS = (date2-date1)/30.4; 2. Use of the INTCK function

INTCK
Determines the number of interval boundaries which have been crossed between two SAS date, time or date time variables Syntax:

INTCK(interval , from , to)


where, interval - character constant or variable name enclosed in single quotes representing the time period of interest Date Intervals DAY WEEK MONTH QTR YEAR Datetime Intervals DTDAY DTWEEK DTMONTH DTQTR DTYEAR Time Intervals HOUR MINUTE SECOND

From SAS date, time or datetime variable identifying the START of the time interval. To SAS date, time or datetime variable identifying the END of the time interval. INTCK function calculates only the number of interval boundaries crossed between two dates. Example qtr = INTCK (QTR,10OCT88D,01MAR89d); date = INTCK(YEAR,31DEC89D,1JAN90D); year = INTCK(YEAR,1JAN89D,31DEC89D); td = '1dec2008'd; month = INTCK('MONTH','10jan2008'd, td); Result qtr = 1 Description Returns the no of QTR boundaries between two dates, i.e., no of JAN 1, APR 1, JUL 1, OCT 1 No of Year boundaries, i.e., No of JAN 1 No of Year boundaries, i.e., No of JAN 1 No of month boundaries, i.e., first day of a month

date = 1 year = 0

month = 11

Page 107 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS INTNX


Creates a SAS date, time or datetime value that is a given number of time intervals from a starting value. Syntax:

INTNX(interval,from,no);
Where, interval time interval From start date no integer representing the no of time intervals. The result will be as the first date of the time interval. For example, if interval is MONTH then it returns day one of the respective month as SAS date. Example BDATE = 05mar2008d; DT = INTNX(month,BDATE,3); Result the result is a SAS date variable representing the first day of the month which is three months past the BDATE value, i.e., 01JUN2008 as SAS date.

Merging and Combining SAS Data Sets We can create a Dataset from two or more existing data sets by Combining Data Vertically (appends the observations from one data set to another data set) Combining Data Horizontally (joining observations side-by-side) Methods to combine SAS data sets Combining Vertically concatenating interleaving Combining Horizontally one-to-one reading one-to-one merging match merging Updating

Combining Vertically
Appends the observations from one or more data set row-wise to create a resultant dataset.

Page 108 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Concatenating
Concatenating Two Data Sets Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2 Data set COMBINED contains the results of the concatenation. Note that the data sets are processed in the order in which they are listed in the SET statement

Interleaving
Interleaving combines observations from two or more data sets, based on one or more common variables. The resultant dataset COMBINED will be in sorted order. Since we are using a BY statement with SET statement, the Input datasets DATA1 & DATA2 should be sorted by the variable YEAR.

Page 109 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Combining Horizontally
Combining data horizontally refers to the process of merging or joining multiple data sets into one data set

One-to-one reading
In a one-to-one match, key values in both the base table and the lookup table are unique. Therefore, for each observation in the base table, no more than one observation in the lookup table has a matching key value. One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set. Observations are combined based on their relative position in each data set, that is, the first observation in one data set with the first in the other, and so on.

Page 110 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The DATA step stops after it has read the last observation from the smallest data set.

One-to-one merging
Similar to one-to-one reading, with two exceptions you use the MERGE statement instead of multiple SET statements, the DATA step reads all observations from all data sets

Match merging
In a one-to-many match, key values in the base table are unique, but key values in the lookup table are not unique Match-merging combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables. Input datasets DATA1 & DATA2 should be sorted by YEAR before Merging.

Page 111 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Updating
Updating uses information from observations in a transaction data set to delete, add, or alter information in observations in a master data set. You can update a master data set by using the UPDATE statement or the MODIFY statement. If you use the UPDATE statement, your input data sets must be sorted by the values of the variables listed in the BY statement. If you use the MODIFY statement, your input data does not need to be sorted. By default, UPDATE and MODIFY do not replace non-missing values in a master data set with missing values from a transaction data set

Page 112 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Performing JOINS in DATA Step


Identifying Data Set Contributors: When you read multiple SAS data sets in one DATA step, you can use the IN= data set option to detect which data set contributes to the current observation. General form of the IN= data set option:

SAS-data-set (IN=variable)
Where, variable is any valid SAS variable name. It is a temporary numeric variable with a value of: 1 if the data set contributes to the observation 0 if the data set does not contribute to the observation The variable will not be written to the dataset. Example:

DATA three; MERGE one(in=a) two(in=b); BY id; in_x = a ;

Page 113 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS in_y = b; RUN;


For the above example, the contents of dataset ONE, TWO & THREE along with the values of the automatic variables are given below.

Performing JOINS in DATA Step: Using the automatic variables we can perform different join operations Equi-Join Left Outer Join Right Outer Join Full Outer Join Example:

data three; merge one(in=x) two(in=y); by id; <sas join statement> ; run;

Join Operation Equi-Join Left Outer Join Right Outer Join Full Outer Join

SAS Statement IF X AND Y; IF X; IF Y;

Try It Out Problem Statement 1


You have two SAS data sets. Data set DEMOG contains ID, DOB, and GENDER; data set SCORES contains SSN (which is equivalent to ID in data set DEMOG), IQ, and GPA (grade point average). Write a program to perform INNER Join on the two datasets and write the observations into a new dataset BOTH. Verify the contents of the merged dataset by printing its contents.

Page 114 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Note: 1. Data are not in ID or 2. There are some IDs that are in one file only.

Code PROC SORT DATA=DEMOG; BY ID; RUN; PROC SORT DATA=SCORES; BY SSN; RUN; DATA BOTH; MERGE DEMOG (IN=IN_DEMOG) SCORES (IN=IN_SCR RENAME=(SSN=ID)); BY ID; IF IN_DEMOG AND IN_SCR; RUN; PROC PRINT DATA = BOTH; RUN;
Refer File Name: 16.1.sas to obtain soft copy of the program code

How It Works
Sort the two datasets separately. Since the variable name should be the same in both the datasets for performing the merge, Rename the variable SSN in dataset SCORES to ID. For INNER join use the IN= dataset option to find the contributing dataset.

Page 115 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Problem Statement 2
Problem Statement: 16.2 You have a MASTER file which contains PART (part number), NUMBER (number in stock), PRICE, and SIZE. The file is sorted by PART. You want to update this file as follows: For PART 222, you now have 15 in stock. For PART 123, you have a new price of $1,500. For PART 333, you have a new price of $2,000 and 20 in stock. Data set MASTER PART NUMBER 111 34 123 87 124 45 222 19 234 20 333 30

PRICE 8000 1200 800 1300 2000 1800

SIZE A B A C A B

Code DATA NEWDATA; INPUT PART NUMBER PRICE; DATALINES; 222 15 . 123 . 1500 333 20 2000 RUN; PROC SORT DATA=NEWDATA; BY PART; RUN; DATA MASTER; UPDATE MASTER NEWDATA; BY PART; RUN;
Refer File Name: 16.2.sas to obtain soft copy of the program code

How It Works
Verify the contents of the dataset MASTER after updating. You will find that the missing values in the dataset NEWDATA are not updated to the MASTER dataset.

Page 116 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Summary
SAS provides a bundle of Date & Time functions for extracting the required information from a SAS date or time variable We can create a Dataset from two or more existing data sets by Combining Data Vertically or Combining Data Horizontally We can use the IN= data set option to detect which data set contributed to an observation.

Test your Understanding


1. What do the following do? a) INTCK b) DATETIME c) MDY d) WEEKDAY 2. When would you choose to MERGE two datasets together and when would you SET two datasets? 3. How would you code a merge that will keep only the observations that have matches from both sets? 4. How would you code a merge that will write the matches of both to one data set, the nonmatches from the left-most data? 5. How do the IN= variables improve the capability of a MERGE? 6. Create a program for the following requirement Consider the following data Name Month Year Day A 10 1928 19 B 9 1981 10 C 12 1975 25 D 15 1990 18 Create a dataset and add a variable name DOJ that contains the data combined from month, year and day. 7. Create a program for the following requirement Read the following raw data into a SAS dataset. Birth date 23041979 21071985 10061976 13081952 Print the contents of this dataset in the following format. Birth date 23APR1979 21JUL1985 10JUN1976 13SEP1952

Page 117 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 18: Statistical Procedures


Learning Objectives
After completing this session, you will be able to: Work with the following statistical procedures: o PROC FREQ o PROC MEANS o PROC SUMMARY o PROC REPORT

PROC FREQ
The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables. It concisely describes your data by reporting the distribution of variable values. PROC FREQ displays frequency counts of the data values in a SAS data set. It can produce statistics to analyze relationships among variables. By default, PROC FREQ Analyzes every variable in the SAS dataset Displays each distinct data value Calculates the number of observations in which each data value appears and the corresponding percentage Indicates for each variable how many observations have missing values. Creates report on every variable of the data set. Produces percent, cumulative frequency & cumulative percent. Syntax:

PROC FREQ <DATA = dataset>; TABLES <variable list> / options; RUN;


Where, TABLES <variable list>: Specifies the variables to analyze. Similar to the VAR statement in PRINT procedure. If not used, the FREQ procedure creates frequency tables for every variable in your data set. Options: MISSING includes missing values in the frequency report LIST - prints two-way to n-way tables in a list format rather than as cross tabulation tables. Nocol - suppresses printing of column percentages of a crosstab.

Page 118 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Norow - suppresses printing of row percentages of a crosstab. o Nopercent - suppresses printing of cell percentages of a crosstab.

Sample Program and Output: Example:

PROC FREQ DATA = EMP; TABLES DEPTID; TITLE3 'One way Freq of DEPTID'; RUN;

Creating Two-Way Tables To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables.

PROC FREQ DATA=SAS-data-set; TABLES variable1 * variable2; RUN;


In the cross tabular report, the values of the first variable in the TABLES statement form the rows of the frequency table and the values of the second variable form the columns. Sample Program & Output (List Frequency): The LIST option produces List Frequency Example:

PROC FREQ DATA = EMP; TABLES DEPTID * GENDER / LIST; TITLE3 'Two-way Freq of DEPTID Vs GENDER'; RUN;

Page 119 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Box Frequency: Without the LIST option, it produces BOX Frequency:

Multi-Threaded Processing
Multi-threaded processing is a type of parallel processing introduced in SAS System 9. Parallel processing means, multiple units of work are available to be scheduled for concurrent execution by the operating system. This technology takes advantage of hardware that has multiple CPUs, called symmetric multiprocessing machines (SMPs).

Page 120 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Processes suitable for threading are: sorting grouping summarizing The multi-threading capability of SAS improves processing time of the following procedures: SORT SQL MEANS SUMMARY REPORT Threaded processing can be controlled via the SAS system option THREADS | NOTHREADS. General Form: OPTIONS THREADS | NOTHREADS; THREADS enables Multi-threaded processing NOTHREADS disables Multi-threaded processing. This is the default option. The THREADS | NOTHREADS option can also be specified in the PROC statement, which enables or disables multi-threaded processing of the input dataset. When the option is specified in the PROC statement, it overrides the SAS system option THREADS | NOTHREADS. Example: To enable Multi-threading

PROC SORT DATA = EMP THREADS ; PROC SQL THREADS;


To disable Multi-threading

PROC MEANS DATA = DEPT NOTHREADS; PROC MEANS


Computing Statistics Using PROC MEANS: The MEANS procedure displays simple descriptive statistics such as sum, mean, standard deviation, variance, minimum, maximum, etc. for the numeric variables in a SAS data set. General form:

PROC MEANS <DATA=SAS-data-set> CLASS <variable list>; VAR <variable list>; OUTPUT OUT=SAS-data-set <statistic-keyword=variablename(s)>; RUN;

Page 121 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

PROC MEANS DATA = SAS-data-set; RUN;


If PROC MEANS is used without any other statements, by default it analyzes every numeric variable in the SAS data set prints the statistics N, MEAN, STD, MIN and MAX excludes missing values before calculating statistics. CLASS <variable list>; Add the grouping variables in this statement that form the sub groups The CLASS statement groups the observation of the SAS data set for analysis VAR <variable list>; List the analysis variables here Statistics calculated for numeric variables listed here. OUTPUT OUT=SAS-data-set <statistic-keyword=variable-name(s)>; Creates a SAS dataset, in which the computed summary statistics are stored. Where, SAS-data-set specifies the name of the output data set statistic-keyword= specifies the summary statistic to be written out Variable - name(s) specifies the names of the variables that will be created to contain the values of the summary statistic. These variables correspond to the analysis variables that are listed in the VAR statement Example:

VAR SAL; OUTPUT OUT = NEW MEAN = MEANSAL;


Computes the mean value of SAL, stores it in a new variable MEANSAL and writes them to the dataset NEW. Some of the statistics that can be computed using PROC MEANS are, Keyword MIN MAX MEAN SUM Minimum value Maximum value Average Sum Description

Page 122 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Keyword N NMISS STDDEV / STD VAR RANGE Description Number of observations with non-missing values Number of observations with missing values Standard deviation Variance Range

Sample Program & Output 1:

PROC MEANS DATA = EMP; CLASS DEPTID; VAR SALARY; RUN;

Sample Program 2:

PROC MEANS DATA = EMP SUM; CLASS DEPTID; VAR SALARY; RUN;

Page 123 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

PROC SUMMARY
You can create a summarized output data set by using the SUMMARY procedure. PROC SUMMARY is similar to PROC MEANS in syntax and you can do all the analysis that can be done by PROC MEANS. The difference between the two procedures is that PROC MEANS produces a report by default, but PROC SUMMARY does not. By default, PROC SUMMARY creates only an output dataset.

PROC REPORT
PROC REPORT is another powerful display procedure that combines display and statistical analysis capabilities in one procedure.It produces a variety of reports using a single report-writing tool. It combines the features of PROC PRINT PROC MEANS PROC SUMMARY PROC SORT PROC TABULATE Why PROC REPORTS? proc report requires less code and time is easy to learn and use easier to apply ODS style elements in proc report

Page 124 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Features of REPORT Procedure create listing reports create summary reports enhance reports request separate subtotals and grand totals General Form:

PROC REPORT DATA=SAS-data-set <options>; COLUMN column-specifications; DEFINE variable/ <usage> <attribute-list>; RUN;
Options: WINDOWS | WD - invokes the procedure in an interactive REPORT window. This is the default option. NOWINDOWS | NOWD displays the report in the OUTPUT window. COLUMN column-specifications; select and order the variables that appear in your list report This is similar to VAR statement in PROC PRINT It omitted, by default it takes all the variables. DEFINE variable / <usage> <attribute-list>; The DEFINE statement is used to Define how each variable is used in the report Assign formats and labels to variables Change the order of the values in the report Usage: DISPLAY: Displays values in column without ordering or grouping (just like proc print). ORDER: Sorts the report in ascending order, DESCENDING option also available GROUP: Groups observations into summarization lines. ANALYSIS: Returns the requested statistic. Attribute-list: FORMAT = <format name> - assigns a format to a variable report-column-header - defines the column header (Label) for the column GROUP <variables> - produce summary reports DISPLAY & ORDER <variables> - produce listing reports

Page 125 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

DEFINE idptno / DISPLAY "Patient";


Prints the values of idptno. (like PROC PRINT) "Patient" is the LABEL for idptno. DEFINE tx / ORDER "Treatment Group"; prints the values of tx in ascending order DEFINE sal / ANALYSIS MEAN "Mean Severity format=DOLLAR10.2; finds the average salary and prints it in DOLLAR10.2 format Sample Program & Output 1: (Similar to PRINT)

PROC REPORT COLUMN DEFINE DEFINE DEFINE DEFINE RUN;

DATA = EMP ; EMPID NAME GENDER SALARY; EMPID / ORDER 'Employee ID'; NAME / DISPLAY 'Name of Employee'; GENDER / DISPLAY ; SALARY / DISPLAY 'Salary of Employee';

Sample Program & Output 2: (Similar to MEANS)

PROC REPORT COLUMN DEFINE DEFINE DEFINE RUN;

DATA = DEPTID DEPTID GENDER SALARY

EMP ; GENDER SALARY; / GROUP 'Dept ID'; / GROUP 'Gender'; / SUM 'Salary of Employee';

Page 126 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Try It Out Problem Statement 1


You have a SAS Dataset GRADES that has three fields CANDIDATE, EXAMINERA & EXAMINERB. Print the unique values of EXAMINERA along with its count. Do not include the missing values. Print the unique values of the combination of EXAMINERA & EXAMINERB along with its count as LIST frequency and BOX frequency. Include the Missing values in the report in the 2-way frequency. Contents of the Dataset GRADES: 1 1 2 2 0 0 3 0 0 4 2 2 5 0 0 6 4 3 7 0 0 8 0 0 9 0 0 10 2 3 11 1 2 12 2 . 13 0 1 14 4 3 15 4 3 16 1 2 17 0 . 18 1 2

Page 127 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
19 20 2 0 3 0

Code PROC FREQ DATA = GRADES; TABLES EXAMINERA / MISSING; TITLE3 ST FREQ OF EXAMINERA; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / LIST MISSING; TITLE3 2-WAY LIST FREQ OF EXAMINERA * EXAMINERB; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / MISSING; TITLE3 2-WAY BOX FREQ OF EXAMINERA * EXAMINERB; RUN;
Refer File Name: 18.1.sas to obtain soft copy of the program code

How It Works
By default, PROC FREQ produces n-way frequency as BOX frequency. By including the option LIST, it generates a List frequency. MISSING option includes the missing values also in the report.

Problem Statement 2
Use the dataset BOTH created in Problem 1 and compute the mean IQ and GPA for each value of GENDER. Do this for all the data and then for employees born before January 1, 1972.

Code PROC MEANS N MEAN DATA=BOTH; CLASS GENDER; VAR IQ GPA; RUN; PROC MEANS N MEAN DATA=BOTH; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; CLASS GENDER; VAR IQ GPA; RUN;
Refer File Name: 18.2.sas to obtain soft copy of the program code

Page 128 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS How It Works


Here the grouping variable is GENDER and the analysis variable is IQ and GPA. N option prints the no of observations in each group MEAN options prints the Average value of each group. '01JAN72'D is the date constant. WHERE statement prints only the observations whose 01JAN72 and is not missing. DOB value is less than

Problem Statement 3
Generate the report mentioned in problem 18.2 using REPORT procedure.

Code PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; RUN; PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN;
Refer File Name: 18.3.sas to obtain soft copy of the program code

How It Works
Here the grouping variable is GENDER and the analysis variable is IQ and GPA. MEAN options prints the Average value of each group. '01JAN72'D is the date constant. WHERE statement prints only the observations whose 01JAN72 and is not missing. N prints the number of observations DOB value is less than

Problem Statement 4
Export the output of problem 18.3 to a RTF file.

Code ODS RTF FILE = C:\SASFILES\REPORT.RTF; PROC REPORT DATA=BOTH;

Page 129 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS COLUMN DEFINE DEFINE DEFINE RUN; GENDER IQ GPA N; GENDER / GROUP; IQ /ANALYSIS MEAN; GPA / ANALYSIS MEAN;

PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN; ODS RTF CLOSE;
Refer File Name: 18.4.sas to obtain soft copy of the program code

Summary
The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables. To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables in the TABLES statement Multi-threaded processing is a type of parallel processing introduced in SAS System 9 Threaded processing can be controlled via the SAS system option THREADS | NOTHREADS. The MEANS procedure displays simple descriptive statistics PROC SUMMARY is similar to PROC MEANS PROC REPORT is another very powerful display procedure that combines display and statistical analysis capabilities in one procedure. It produces a variety of reports using a single report-writing tool

Test your Understanding


PROC FREQ: Code the tables statement for a single-level frequency Code the tables statement for a multi-level frequency Name the option to produce a frequency line items rather than a table. Name the option that allows to include missing numeric data to be included in the report PROC MEANS: Code a PROC MEANS that shows both summed and averaged output of the data What is the differences between PROC SUMMARY and PROC MEANS?

Page 130 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 20: PROC SQL


Learning Objectives
After completing this session, you will be able to: Work with PROC SQL procedure Explain SELECT statement and its clauses Create output tables Summarize data Group data Query multiple tables Limit no of rows to be read and displayed Use Operators in PROC SQL Calculate values Enhance query output

PROC SQL Basics


PROC SQL is a powerful SAS Procedure that combines the functionalities of DATA and PROC steps in a single tool. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step. The SQL procedure provides an easy, flexible way to query and combine your data. PROC SQL is SAS' implementation of Structured Query Language (SQL), which is similar to ANSI SQL. Most of the statements and options in PROC SQL have the same syntax as their ANSI SQLs counterparts. PROC SQL can often be used as an alternative to other SAS procedures or the DATA step. You can use PROC SQL to Retrieve data and manipulate SAS tables Add or modify data values in a table Add, modify, or drop columns in a table Create tables and views Join multiple tables Generate reports Note: In this section SAS Datasets are named as tables. The Difference: PROC SQL differs from most other SAS procedures in several ways. Unlike other PROC statements, many statements in PROC SQL are composed of clauses.

Page 131 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
For example, the following PROC SQL step contains two statements: the PROC SQL and the SELECT statement. The SELECT statement contains several clauses: SELECT, FROM, and WHERE.

proc sql; select empid, jobcode, salary, salary*.06 as bonus from sasuser.payrollmaster where salary<32000 ;
The PROC SQL step does not require a RUN statement. It executes each query automatically. It ends with a QUIT statement. The variables, datasets in the queries are separated by comma and not by spaces like other SAS statements.

The SELECT Statement and its Clauses


The SELECT statement, which follows the PROC SQL statement, retrieves and displays data. It is composed of clauses, each of which begins with a keyword and is followed by one or more components. General Form:

PROC SQL options; SELECT column(s) FROM table-name | view-name WHERE expression GROUP BY column(s) HAVING expression ORDER BY column(s); QUIT;
A SIMPLE PROC SQL: Example:

PROC SQL; SELECT * FROM USSALES; QUIT;


It prints the contents of the dataset USSALES and the output will be similar to that of PROC PRINT. An asterisk on the SELECT statement selects all columns from the data set. If you want to print only specific fields in the report, list and separate the variables in the SELECT statement.

Page 132 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

PROC SQL NUMBER; SELECT STATE, SALES FROM USSALES; QUIT;


To subset data based on a condition, use a WHERE clause in the SELECT statement. To sort rows by the values of specific columns, you can use the ORDER BY clause. CREATING NEW VARIABLES: Variables can be dynamically created in PROC SQL using the keyword AS. Any of the DATA step functions can be used in an expression, to create a new variable Example:

PROC SQL; SELECT SUBSTR(STORE,1,3) AS STORENO, SALES, (SALES * .05) AS TAX, (SALES * .05) * .01 FROM USSALES; QUIT;
There can be any number of SQL statements in a PROC SQL procedure.

Creating Output Tables


To create a new table from the results of a query, use a CREATE TABLE statement that includes the keyword AS and the clauses that are used in a PROC SQL query: General Form:

PROC SQL; CREATE TABLE table-name AS SELECT statement.. ; Quit;


Example: The following query creates a table named NEW that is similar to EMP. It will not print anything in the Output window. Example:

PROC SQL; CREATE TABLE NEW AS SELECT * FROM EMP; QUIT;

Page 133 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Summarizing & Grouping Data


To group data for summarizing, you can use the GROUP BY clause. The GROUP BY clause is used in queries that include one or more summary functions. Summary functions produce a statistical summary for each group that is defined in the GROUP BY clause. Example: Suppose you want to determine the total number of miles traveled by frequent-flyer program members in each of three membership classes (Gold, Silver, and Bronze). Example:

proc sql; select membertype, sum(milestraveled) as TotalMiles from sasuser.frequentflyers group by membertype; Quit;
Here, the SUM function adds the values of the MilesTraveled column to create the TotalMiles column. The GROUP BY clause groups the data by the values of MemberType. The results show total miles by membership class (MemberType). You can use most of the SAS functions in the SQL statements:

Querying Multiple Tables


A join is used to combine information from multiple files. One advantage of using PROC SQL to join files is that, it does not require sorting the datasets Example:

PROC SQL; SELECT * FROM JANSALES, FEBSALES; QUIT;


Here a Cartesian Join combines all rows from one file with all rows from another file. INNER JOIN: An Inner Join combines datasets only if an observation is in both the datasets. This type of join is similar to a DATA step merge using the IN Data Set Option and IF logic

Page 134 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

PROC SQL; SELECT U.STORENO, U.STATE, F.SALES AS FEBSALES FROM USSALES U, FEBSALES F WHERE U.STORENO=F.STORENO; QUIT; Limiting no of rows to be read and displayed OUTOBS= option
To indicate the maximum number of rows to be displayed, you can use the OUTOBS= option in the PROC SQL statement. OUTOBS= is similar to the OBS= data set option. General Form:

PROC SQL OUTOBS=n;


The OUTOBS= option restricts the rows that are displayed, but not the rows that are read.

INOS= option
The INOBS= option restricts the number of rows that PROC SQL takes as input from any single source. General Form:

PROC SQL INOBS=n;


Example Program:

proc sql inobs=5; select * from work.all quit;


Since we are limiting the input records to 5, SAS will print the a similar information in the Log file Log File:

WARNING: Only 5 records were read from WORK.ALL due to INOBS= option. Using Operators in PROC SQL
Comparison, logical, and concatenation operators are used in PROC SQL in the WHERE clause as they are used in other SAS procedures:

Page 135 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

For example, the following WHERE clause contains the logical operator AND, which joins multiple conditions and two comparison operators: an equal sign (=) and a greater than symbol (>). Example:

proc sql; select ffid, name, state, pointsused from sasuser.frequentflyers where membertype = 'GOLD' AND pointsused > 0 order by pointsused;
You can also use the following conditional operators. All of these operators can also be used in other SAS procedures.

Calculated Values
The following PROC SQL query creates the new column Total by adding the values of three existing columns: Boarded, Transferred, and Nonrevenue Example:

select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where total < 100;

Page 136 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

If you use the newly created field Total in the where clause, SAS throws an error message. Log file:

from sasuser.marchflights where total < 100; ERROR: The following columns were not found in the contributing tables: total
This error message is generated because, in SQL queries, the WHERE clause is processed prior to the SELECT clause Using the Keyword CALCULATED: When you use a column alias in the WHERE clause to refer to a calculated value, you must use the keyword CALCULATED along with the alias. The CALCULATED keyword informs PROC SQL that the value is calculated within the query. Example:

select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where calculated total < 100;
This query executes successfully and produces the following output.

Enhancing Query Output


By default, the output of PROC SQL is not formatted. But you can improve the appearance of your query output by using Column labels and formats Titles and footnotes Columns that contain a character constant.

Page 137 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
To control the formatting of columns in output, you can specify SAS data set options, such as LABEL= and FORMAT=, after any column name specified in the SELECT clause

Note: The data set options LABEL= and FORMAT= are not part of the ANSI standard. These options are SAS enhancements. Example:

proc sql outobs=5; title 'Current Bonus Information'; title2 'Employees with Salaries > $75,000'; select empid label='Employee ID', jobcode label='Job Code', salary, salary * .10 as Bonus format=dollar12.2 from sasuser.payrollmaster where salary>75000 order by salary desc

The first two columns have new labels, the Bonus values are consistently formatted, and two title lines are displayed at the top of the output.

Page 138 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

CONCLUSION
PROC SQL is a powerful tool. It can make your life much easier. For beginner SQL users, remember the following points: Be careful about many to many table joins in SQL. When joining tables that have multiple records per matching ids, the output table may be a Cartesian product. For example, 3 rows joining 5 rows of same id variable will produce 15 rows, as compared to the DATA Step MERGE where only 5 rows will be created. PROC SQL is code-saving, but not always time-saving.

Try It Out Problem Statement 1


Consider you have a dataset EMP. Print the contents of the dataset in sorted order by the value of jobcode. Filter the observations with the values of salary<32000 Create a new column BONUS and its value is 6% of the Salary value.

EmpID 1970 1422 1658 1113 1094 1789 1422 1564 1354 1094 1101

JobCode FA1 FA1 SCP FA1 FA1 SCP FA1 SCP SCP FA1 SCP

Salary $31,661 $31,436 $25,120 $31,314 $31,175 $25,656 $31,436 $26,366 $25,669 $31,175 $26,212

Code proc sql; select empid, jobcode, salary, salary*.06 as bonus from emp where salary<32000 order by jobcode; Quit;
Refer File Name: 20.1.sas to obtain soft copy of the program code

Page 139 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS How It Works


Order by clause sorts the dataset Where clause filters the observations salary*.06 as bonus is the new column

Problem Statement 2
With the dataset EMP, Determine the total salary for each jobcode. Apply a format to the new column. Limit the output observations to 10

Code proc sql outobs = 10; select jobcode, sum(salary) as Totsal format = dollar13.2, from emp group by jobcode; Quit;
Refer File Name: 20.2.sas to obtain soft copy of the program code

How It Works
Group by clause Summarizes the observations by jobcode SUM function finds the sum of Salary for each group. Outobs is similar to OBS option and it prints only the specified number of observations

Summary
PROC SQL is a powerful SAS Procedure that combines the functionality of DATA and PROC steps into a single step PROC SQL is SAS' implementation of Structured Query Language (SQL), which is similar to ANSI SQL. Is composed of clauses To group data for summarizing, you can use the GROUP BY clause. A join is used to combine information from multiple files. To indicate the maximum number of rows to be displayed, you can use the OUTOBS= option in the PROC SQL statement. Comparison, logical, and concatenation operators are used in PROC SQL in the WHERE clause as they are used in other SAS procedures When you use a column alias in the WHERE clause to refer to a calculated value, you must use the keyword CALCULATED along with the alias. You can improve the appearance of your query output by using column labels and formats titles and footnotes columns that contain a character constant

Page 140 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Test your Understanding


1. What is the use of Proc SQl? 2. What is the use of the keyword CALCULATED? 3. How will you limit the number of observations read and written in PROC SQL?

Page 141 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 22: Introduction to MACROS


Learning Objectives
After completing this session, you will be able to: Explain SAS Macro List the advantages of the SAS Macro Facility Work with Macro variables Describe Automatic and User defined macro variables Explain Macro triggers Explain Macro Processor and the flow of execution Create macro variables in run time

SAS Macro
The macro facility is one of the most powerful features of BASE SAS. SAS macros enable you to substitute text in your SAS programs. When you reference a macro, SAS replaces the reference with the text value that has been assigned to that macro. This makes your programs more reusable and dynamic. In simple terms, the SAS macro facility is a tool for text substitution. Macros allow users to: Write more flexible code Pass information between data or proc steps Generate SAS statements based on the data. There are two main components of the SAS macro facility: Macro variables Macro programs Macro variables are like parameters passed on to a SAS program. Macro programs use macro variables and macro programming statements to build SAS programs.

Advantages of the SAS Macro Facility


Macros can help in several ways With macros you can make one small change in your program and have SAS echo that change throughout your program. Macros can allow you to write a piece of code and use it over and over again in the same program or in different programs. You can make your programs data driven, letting SAS decide what to do based on actual data values

Page 142 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Using the SAS Macro facility, SAS programs can become reusable, shorter, and easier to follow. accomplish repetitive tasks quickly and efficiently Without changing the code, we can customize the results by passing parameters to the macro program conditionally execute SAS code perform repetitive tasks Debugging is easier Automatically insert the date and other session information into your code Write more flexible code, and pass data between DATA/PROC steps during execution time

Macro variables
Macro variables belong to the SAS macro language and are different from Data step variables. You can define and use macro variables anywhere in a SAS program, except in DATALINES or CARDS. The %LET statement enables you to define a macro variable and to assign a value to it. General form:

%LET variable = value;


Where, variable is any name that follows the SAS naming convention. value can be any string from 0 to 65,534 characters. if either variable or value contains a reference to another macro variable (such as &macvar), the reference is evaluated before the assignment is made. If variable already exists, value replaces the current value. Rules for creating Macro variables: All values are stored as character strings. Mathematical expressions are not evaluated. The case of the value is preserved. Quotation marks that enclose literals are stored as part of the value. Leading and trailing blanks are removed from the value before the assignment is made. You can reference a macro variable by preceding it with an ampersand (&). Note: The macro processor resolves references in double quotes but not in single quotes. %LET Statement %let name= Ed Norton ; %let name2=' Ed Norton '; %let title="Joan's Report"; Variable Name name name2 title Variable Value Ed Norton ' Ed Norton ' "Joan's Report" Length 9 13 15

Page 143 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
%LET Statement %let start=; %let sum=4+3; %let total=0+&sum; %let x=varlist; %let &x=name age height; Variable Name start sum total x varlist 4+3 0+4+3 varlist name age height Variable Value Length 0 3 5 7 15

DATA Step Variable Vs Macro Variable: The following table illustrates the difference between DATA Step Variable and Macro Variable DATA Step Variable DATA step variable belongs to the SAS language Its value depends on the observation being processed. Is part to the SAS Dataset Macro Variable Macro variables belong to the SAS macro language Contains one value that remains constant until explicitly changed. Is independent of the SAS Data set

Where we can use Macro Variable: In your SAS programs, you might find that you need to reference the same text string multiple times. Example:

DATA sales; Set DEPT; where Dept = sales; run; proc print data = sales; title List of employees in sales department; run;
Then, you might need to change the references in your program in order to reference a different text string. If your programs are lengthy, updating them manually can take a lot of time, also chances of manual errors are more. If you use a macro variable in your program, you only need to make the change in one place and SAS will echo its value in all the places where it is referenced. Example:

%let dept = sales; DATA &dept; Set ALL_DEPT; where Dept = &dept; run; proc print data = &dept; title List of employees in &dept department; run;

Page 144 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Automatic and User defined macro variables


There are two types of macro variables: Automatic macro variables User-defined macro variables Both types of macro variables are independent of the SAS dataset. Automatic macro variables: SAS creates and defines several automatic macro variables. Automatic macro variables contain information about your computing environment and the date and time of the session. They are created when SAS is invoked and are with global scope. Usually its value is assigned by SAS. Some of the automatic macro variables are given below. Name SYSDATE SYSDATE9 SYSDAY SYSTIME SYSVER SYSLAST Value the date of the SAS invocation (DATE7.) the date of the SAS invocation (DATE9.) the day of the week of the SAS invocation the time of the SAS invocation the release of SAS that is being used the name of the most recently created SAS data set.

User-defined macro variables: The macro variables created by the user are user-defined macro variables. Example: you can create a user-defined macro variable with %LET statement and CALL SYMPUT routine.

Macro Processor and the flow of execution


SAS Program flow of execution (without Macros):

When you submit a program, it goes to an area of memory called the input stack.

Page 145 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Word scanner reads the program from Input Stack and divides program text into fundamental units called tokens o Tokens are passed on demand to the compiler. o The compiler requests tokens until it receives a semicolon. o SAS stops sending statements to the compiler when it reaches a step boundary. e.g. RUN statement Compiler checks the syntax of tokens received from the word scanner. After it completes checking the syntax, the code is sent for execution. Executor executes the code and prints the result to the Log and the Output files. Terms used in Macro Processing: Term input stack word scanner Description Holds a SAS program after it is submitted. Scans the text it takes from the input stack and breaks the text into tokens. Determines the destination of the token: DATA step compiler, macro processor, etc. Fundamental unit in the SAS language. Tokens are the actual keywords in the SAS statements as well as the literal strings, numbers, and symbols. Ex: DATA, 1234, +, - , =, variable Checks the syntax of tokens received from the word scanner. After it completes checking the syntax, the code is sent for execution. Processes macro language references and statements. The symbols & and %, when followed by a letter or underscore, that signal the word scanner to transfer the current statement to the macro processor.

token

compiler macro processor macro trigger

Macro Facility: The macro facility includes a macro processor that is responsible for handling all macro language elements. When a macro trigger is detected, the word scanner passes it to the macro processor for evaluation. The Compiler does not recognize the macro statements. Macro Trigger: The word scanner recognizes the following token sequences as macro triggers: % followed immediately by a name token (such as %let) & followed immediately by a name token (such as &dept).

Page 146 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
SAS Program flow of execution with Macro statements:

SAS Program flow of execution with Macro statements: When the Word Scanner encounters a macro trigger it sends the statement to the Macro Processor. Macro Processor processes macro language references and macro statements and returns the SAS codes (without macro statements) The resolved SAS codes are returned to the Input Stack and the execution continues. Combining Macro variable reference with text (Concatenation): When you place a macro variable reference adjacent to text, then SAS interprets the entire text as a macro variable. Example:

%let month = APR PROC PRINT DATA = WORK.&MONTHDATA; RUN;


Here SAS interprets &MONTHDATA as a macro variable and throws a warning message, stating It cannot load the macro variables &MONTHDATA. To avoid this, use a period (.) at the end of the macro variable reference.

PROC PRINT DATA = WORK.&MONTH.DATA;


Now &MONTH. resolves to Apr and the dataset name becomes WORK.APRDATA

Creating macro variables in run time


Consider the values of A & B are as follows in the below program Example:

A = 2000; B = 1000; if A > B then do;

Page 147 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS %let txt=A is greater than B; end; else do; %let txt= A is lesser than B; end;
Any guesses what will be the value of the macro variable txt. A is greater than B? No, it is not, the value is A is lesser than B. This is because the macro facility performs its task before SAS program executes, but SAS assigns the values of A and B only during the execution time. So the condition will not be evaluated and both the %let statements are sent to the macro processor. The macro processor first executes %let txt=A is greater than B; Then the next statement is executed %let txt= A is lesser than B; The latest value A is lesser than B is assigned to the macro variable txt. So we cannot use or assign SAS variable values with the macro variables. The SYMPUT Routine: The DATA step provides functions and CALL routines that enable you to transfer information between an executing DATA step and the macro processor. SYMPUT routine creates a macro variable during execution time and assigns a value. General form:

CALL SYMPUT (macro-variable, text);


If quotes are not used it is considered as a variable & its value is substituted in its place.

CALL SYMPUT ('macro-variable', DATA-step-variable);


This form of the SYMPUT routine creates the macro variable named macro-variable and assigns to it the current value of DATA-step-variable. When you use a DATA step variable as the second argument, a maximum of 32767 characters can be assigned to the receiving macro variable. Any leading or trailing blanks that are part of the DATA step variable's value are stored in the macro variable. Caution: When you use the SYMPUT routine to create a macro variable in a DATA step, the macro variable is actually created only at the end of the DATA step execution. Therefore, you cannot reference a macro variable within the same DATA step where it is created.

Page 148 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

The SYMGET Function: To obtain a macro variable's value during DATA step execution, use the SYMGET function. The SYMGET function returns the value of an existing macro variable. General form:

SYMGET (macro-variable)
Where macro-variable is the name of an existing macro variable. If quotes are not used it is considered as a variable & its value is substituted in its place.

Try It Out Problem Statement 1


A company that manufactures bicycles maintains a dataset Models listing all their models. For each model they record its name, class (Road, Track, or Mountain), list price, and frame material. Here is a subset of the data:

Create a macro variable bikeclass and assign a value to it and print only those observations with the value of the macro variable. Also use a TITLE statement to display the value of the macro variable.

Code %LET bikeclass = Mountain; * Use a macro variable to subset;

PROC PRINT DATA = models; WHERE Class = "&bikeclass"; TITLE "Current Models of &bikeclass Bicycles"; RUN;

Page 149 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Refer File Name: 22.1.sas to obtain soft copy of the program code

How It Works
If you are using macro variables, use only double-quotes and not single-quotes.

Problem Statement 2
A company maintains a dataset with information about every order they receive. For each order, the data include the customer ID number, date the order was placed, model name, and quantity ordered. Here is the data: 287 15OCT03 Delta Breeze 15 287 15OCT03 Santa Ana 274 16OCT03 Jet Stream 1 174 17OCT03 Santa Ana 174 17OCT03 Nor'easter 5 174 17OCT03 Scirocco 347 18OCT03 Mistral 1 287 21OCT03 Delta Breeze 30 287 21OCT03 Santa Ana

15 20 1

25

Every Monday the president of the company wants a detail-level report showing all the current orders. On Friday the president wants a report summarized by customer. Write a SAS program for the above requirement.

Code %MACRO reports; %IF &SYSDAY = Monday %THEN %DO; PROC PRINT DATA = orders; FORMAT OrderDate DATE7.; TITLE "&SYSDAY Report: Current Orders"; %END; %ELSE %IF &SYSDAY = Friday %THEN %DO; PROC MEANS DATA = orders; CLASS CustomerID; VAR Quantity; TITLE "&SYSDAY Report: Summary of Orders"; %END; RUN; %MEND reports;
Refer File Name: 22.2.sas to obtain soft copy of the program code

Page 150 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS How It Works


SYSDAY has the value of System Day. IF the day is Monday PROC PRINT code is returned. IF it is Friday PROC MEANS code is returned.

Summary
The macro facility is one of the most powerful features of base SAS. There are two main components of the SAS macro facility o o Macro variables Macro programs

The %LET statement enables you to define a macro variable and to assign a value to it There are two types of macro variables: o o automatic macro variables user-defined macro variables

% and & are considered as macro triggers SYMPUT routine to creates a macro variable during execution tine and assign a value The SYMGET function returns the value of an existing macro variable

Test your Understanding


1. What are macro triggers? 2. What is the scope of the Macro Variables A and B? %let A = 10; %macro abc; %let B = 20; %put _user_; %mend abc; 3. What is the value of the macro variables Total and Sum? %let Total = 3+6; %let Sum = 3+6; 4. What are all the debugging options in Macros? 5. Which of the following TITLE statements correctly references the macro variable month? a) title "Total Sales for &month "; b) title Total Sales for &month; c) title "Total Sales for &month"; d) title Total Sales for "&month"; 6. How would you include common or reuse code to be processed along with your statements? 7. How do you identify a macro variable? 8. For what purposes do you use SAS macros? 9. Tell about call symput? 10. What are SYMGET and SYMPUT? 11. Describe how would you create a macro variable during compile time & run time.

Page 151 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 23: Introduction to MACROS


Learning Objectives
After completing this session, you will be able to: Macro Programs Using Parameters to macro programs Scope of Macro variables Macro System Options Condition execution in Macro Iterative processing in Macro Built-in Macro Functions

Macro Programs
A macro is a group of SAS statements that is identified by a name. It is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables. General form of a macro:

%MACRO macro-name; macro-text %MEND macro-name;


Starts with %MACRO statement followed by a macro name. Ends with %MEND The macro name can also appear after %MEND for clarity, but it is optional. Macro-text represents the SAS statements that you include in your macro. To invoke a macro, place a % in front its name, as:

%macro-name
Example:

%MACRO printit; PROC PRINT DATA = EMP (OBS = 10); TITLE CONTENTS OF DATASET EMP; RUN; %MEND printit; %printit PROC SORT DATA = EMP; BY EMPID; RUN; %printit

Page 152 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The program calls the macro twice; first without sorting the data, and then after executing a PROC SORT by Empid. The SAS statements inside the macro is substituted in the place %printit Macro variables Vs Macros: Macro Variables Starts with an ampersand (&) Defined using %LET statement Is like a standard data variable except that it does not belong to a data set and has only a single value which is always character Macros Starts with a percent sign (%) Defined using %MACRO and %MEND statements Is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables

Using Macro Parameters


The next step is to introduce parameters to the macro programs, which will make them more flexible and creates data driven programs. Parameters are values that are passed to the macro at the time of invocation. They are defined in a set of parentheses following the macro name. Parameters to macro programs are macro variables, so when referring inside the definition they need preceding ampersands. There are two styles for coding the parameters: Positional Keyword

Positional Parameter
The following example shows the positional style: Example:

%MACRO printit(dsname, noobs); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE CONTENTS OF DATASET &dsname; RUN; %MEND printit;
To invoke the macro use the following syntax with the parameters substituted in the right position %printit(emp, 100)

Keyword Parameter
The following example shows the keyword style: Example:

%MACRO printit(dsname = &syslast, noobs = 100); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE CONTENTS OF DATASET &dsname; RUN; %MEND;

Page 153 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Where, &syslast and 100 are the default arguments for dsname and noobs respectively. &syslast refers to the most recently created dataset. The above macro can be invoked in different ways.

%printit(dsname = dept, noobs = 50)


Takes the parameter values as dept for dsname and 50 for noobs.

%printit(noobs = 50)
Since the value of dsname is not provided, it takes the default parameter &syslast.

%printit()
Takes the default arguments for both the parameters. In Positional style, the parameters should be given in the same order as in the macro definition. But in Keyword style, the parameters can be given in the any order.

Scope of Macro variables


Scope of Macro variables: Macro variables come in two varieties: LOCAL GLOBAL

LOCAL Macro Variable


A macro variables scope is LOCAL, if it is defined inside a macro. Example:

%MACRO TEST; %LET A = HAI; <Macro statements>; %MEND;


When a %LET statement is found within a %MACRO definition then the variable is LOCAL to that macro and is not available outside of that macro.

GLOBAL Macro Variable


A macro variables scope is GLOBAL, if it is defined in open code which is everything outside a macro. You can reference a global macro variable anywhere in your program. Example:

%LET A = HAI;

Page 154 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
When the %LET statements are placed in open code (outside of any DATA step or %MACRO definition) the variables they define are with GLOBAL scope To create a macro variable with GLOBAL scope inside a macro, use the %GLOBAL statement. Example:

%GLOBAL A;
Place this statement above the %LET statement.

System Options

The SYMBOLGEN Option


When a macro variable is referenced, the macro processor resolves the reference and passes the value directly back to the input stack. Therefore, we cannot see the value of macro variables returned by the macro processor. To debug the programs, it might be useful to view the value of the macro variables. SYMBOLGEN system option is used to print the value of the macro variables. General form:

OPTIONS NOSYMBOLGEN | SYMBOLGEN;


Where, NOSYMBOLGEN specifies that log messages about macro variable references will not be displayed. This is the default. SYMBOLGEN specifies that log messages about macro variable references will be displayed in the log file Example Program:

set sasuser.all; where fee>&amount; A = &city;

SAS Log 110 where fee>&amount; SYMBOLGEN: Macro variable AMOUNT resolves to 975 111 A = &city; SYMBOLGEN: Macro variable CITY resolves to Dallas

Page 155 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS MPRINT


When the MPRINT option is specified, the text that is sent to the SAS compiler as a result of macro program execution is printed in the SAS log. General form: OPTIONS MPRINT | NOMPRINT; Where, NOMPRINT Turns off the option. This is the default. MPRINT Turns on the option Example: Consider you want to call the macro printit and use the MPRINT system option. Macro Definition:

%MACRO printit(); PROC PRINT DATA = DEPT (OBS = 75); TITLE CONTENTS OF DATASET DEPT ; RUN; %MEND printit; OPTIONS MPRINT; %printit()
Log FILE:

101 %printit MPRINT(PRINTIT): proc print data= DEPT (obs=75); MPRINT(PRINTIT): title " CONTENTS OF DATASET DEPT"; MPRINT(PRINTIT): run; MLOGIC
The MLOGIC option prints messages that indicate macro actions that were taken during macro execution General form: OPTIONS MLOGIC | NOMLOGIC; Where, MLOGIC specifies that messages about macro actions are printed to the log during macro execution. NOMLOGIC is the default setting, and specifies that messages are not printed to the SAS log

Page 156 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Example:

options mlogic; %printit ()


Log file:

107 %printit MLOGIC(PRINTIT): Beginning execution. NOTE: There were 1 observations read from the dataset WORK.EMP. NOTE: PROCEDURE PRINT used: real time 0.02 seconds cpu time 0.02 seconds MLOGIC(PRINTIT): Ending execution
All the options SYMBOLGEN, MPRINT and MLOGIC options are typically turned on for development and debugging purposes. Turned off when the application is in production mode.

%PUT statement
Another way of verifying the values of macro variables. The %PUT statement writes text and values of macro variables to the SAS log. General form: %PUT text; Where, text is any text string or macro variable. It may be used virtually anywhere in the program and it will write to the SAS Log, the values of user defined or system defined macro variables To print the values of macro variables using %PUT statement use Argument _ALL_ _AUTOMATIC_ _USER_ Result in SAS Log Lists the values of all macro variables Lists the values of all automatic macro variables Lists the values of all user-defined macro variables

Option SYMBOLGEN MPRINT MLOGIC %PUT

Description Writes a message for the resolution of each macro variable Displays the SAS statements returned by the Macro Processor Traces the beginning/ending of macro execution and any parameter values assigned Prints the values of the macro variables and text specified

Page 157 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS Condition execution in Macro


You can use macros to control conditional execution of statements. Here are the general forms of statements used for conditional logic in macros:

%IF condition %THEN action; %ELSE %IF condition %THEN action; %ELSE action; %IF condition %THEN %DO; action; %END;
These statements are similar to the standard SAS IF statement. Each keyword starts with a % sign to differentiate it from the standard IF statement. These statements can only be used inside a macro. The conditions and actions can include other macro statements or even complete DATA and PROC steps. If there is multiple statements in an action block, use the %DO-%END block. %IF Vs IF statement: The following table lists the difference between Macro IF statement and standard IF statement. Macro %IF-%THEN-%ELSE statement is used only in a macro program executes during macro execution uses only macro variables in logical expressions and cannot refer to DATA step variables Determines the text/SAS statements to be copied to the input stack. In the below example, Standard IF-THEN-ELSE statement is used only in a DATA step program executes during DATA step execution uses DATA step variables & macro variables in logical expressions Determines the DATA step statement(s) to be executed.

IF the parameter is used as PRINT then, PROC PRINT code is substituted in the place of macro invocation (%reportit) ElSE PROC CONTENTS code is substituted.
Example:

%MACRO reportit(request); %IF &request = PRINT %THEN %DO; PROC PRINT DATA = EMP; RUN; %END; %ELSE %DO

Page 158 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS PROC CONTENTS DATA = EMP; RUN; %END; %MEND; %reportit Iterative processing in Macro
We can also use the Iterative processing in Macros using iterative %DO statements. These statements are similar to the standard SAS statements and are used to repeat a set of SAS statements specific number of times. The %DO statement has various forms %DO-%WHILE %DO-%UNTIL Iterative %DO Example:

%MACRO arrayme; %DO i = 1 %to 5; file&i %END; %MEND arrayme; DATA one; SET %arrayme; RUN;
The macro evaluates to the following during execution time: Example:

DATA one; SET file1 file2 file3 file4 file5; RUN;


This macro generated a list of 5 dataset names. The values of 1 to 5 are substituted in the expression file&i to produce (file1 file2 file3 file4 file5). The above code will write five datasets into dataset one.

Built-in Macro Functions


Macro character functions have the same basic syntax as the corresponding DATA step functions and they yield similar results. Although they might be similar, macro character functions are different from DATA step functions. Macro functions work with Macro variables whereas, Data step functions work with Data step variables. Let us discuss about some of the basic macro functions.

Page 159 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS The %UPCASE Function


The %UPCASE function enables you to change the value of a macro variable from lowercase to uppercase. General form:

%UPCASE ( argument );
Where, argument is a character string. Example:

%LET NAME = raju; %LET NAME = %UPCASE(&NAME);


Now the value of the macro variable NAME is RAJU

The %SUBSTR Function


The %SUBSTR function enables you to extract part of a character string from the value of a macro variable. General form:

%SUBSTR ( argument, position <,n> )


Where, argument is a character string or a text expression position specifies the position of the first character in the substring. n - Specifies the number of characters in the substring. Example:

%let date = 05JAN2002; %substr(&date,3,7) will return the value JAN2002. %substr(&date,3,3) will return the value JAN.; %LENGTH statement
Returns the length of the string. Example:

%LENGTH(&date) returns 9 The %SYSFUNC Function


You can use the %SYSFUNC function to execute other DATA step functions as part of the macro facility.

Page 160 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
General form:

%SYSFUNC ( function ( argument(s) ) <,format> )


Where, function is the name of the SAS function to execute. argument(s) is one or more arguments that are used by function. format is an optional format to apply to the result of function. Example: Suppose the following code was submitted on Friday, June 7, 2002: Example:

title "%sysfunc(today(),weekdate.) - SALES REPORT";


%SYSFUNC executes the DATA step function TODAY() and formats the result using the format WEEKDATE. The title on the next report would be: Friday, June 7, 2002 - SALES REPORT.

Try It Out Problem Statement


Write a PRINT procedure in a macro program has three parameters DATA naming the data set OBS specifying how many records to print TL specifying the title line for the print. Assign default values to all the parameters. Also use Macro System options to get the information returned by the macro processor

Code OPTIONS SYMBOLGEN MLOGIC MPRINT; %macro testprnt ( data = &syslast , obs = 90 , tl = 3) ; proc print data = &data (obs=&obs) ; title&tl Contents of Dataset &data with &obs observations; run ; title&tl ; %mend testprnt ; %testprnt(data = all, obs = 100, tl = 5); / * macro reference */
Refer File Name: 23.1.sas to obtain soft copy of the program code

Page 161 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

How It Works
&syslast returns the most recently created dataset. title&tl - clears the title statement

Summary
A macro is a group of SAS statements that is identified by a name Parameters are values that are passed to the macro at the time of invocation Macro variables come in two varieties Local & Global SYMBOLGEN, MPRINT , MLOGIC, %PUT statements are used for debugging macro code. You can use macros to control conditional execution of statements We can also use the Iterative processing in Macros Macro character functions have the same basic syntax as the corresponding DATA step functions and they yield similar results.

Test your Understanding


1. 2. 3. 4. 5. 6. 7. How would you invoke a macro? How do you define the end of a macro? What is the difference between %PUT and SYMBOLGEN? What is the difference between %LOCAL and %GLOBAL? How are parameters passed to a macro? How would you code a macro statement to produce information on the SAS log? What %put do?

Page 162 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Session 25: Help on SAS


Learning Objectives
After completing this session, you will be able to: Debug SAS Programs Create Efficient SAS Codes

Debugging SAS Programs


Error Handling: Errors are classified into Syntax Error Data Error Logic Error

Syntax Error:
Syntax errors occur when program statements do not conform to the rules of the SAS language. Examples of syntax errors include misspelling a SAS keyword Uninitialized variable Variable not found using unmatched quotation marks forgetting a semicolon specifying an invalid statement option Specifying an invalid data set option. Example: In the below program, DATA statement is misspelled, and SAS prints a warning message to the log. Program:

date temp; x=1; run;


SAS Log:

Syntax Error (misspelled key word) date temp; WARNING 14-169: Assuming the symbol DATA was misspelled as date.

Page 163 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS x=1; run; NOTE: The data set WORK.TEMP has 1 observations and 1 variables.
Because SAS could interpret the misspelled word, the program runs successfully and produces the output. SAS interprets the misspelled keywords only in some cases.

Data errors:
Missing values are generated when Data error occurs. Data error occurs during the following scenarios Numeric to character conversion Invalid data Character field truncated Data errors occur when some data values are not appropriate for the SAS statements that you have specified in the program. For example, if you define a variable as numeric and assigns a character value to it, SAS generates a data error. SAS detects data errors during program execution. When a data error is encountered, SAS does the following and continues to execute the program. Writes an invalid data note to the SAS log Prints the input line and column numbers that contain the invalid value in the SAS log. SAS prints a rule line above the observation Sets the automatic variable _ERROR_ to 1 for the current observation and continue the execution. Example Program:

DATA EMP; INPUT EMPID NAME $ SALARY ; DATALINES; 1000 RAJU 1000 1001 KUMAR $2,561.00 1002 ABISHEK 4586 ; RUN; DATA EMP;

Page 164 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Log file:

Logic Errors
Wrong result, but no error message

Determining Logic Errors: Use the DEBUG option in the DATA statement to help identify logic problems. The DEBUG option is an interactive interface to the DATA step during DATA step execution. This option is useful to determine Which piece of code is executing Which piece of code is not executing The current value of a particular variable When the value of a variable changes. General form of the DEBUG option:

DATA data-set-name / DEBUG;


Common commands used with the DEBUG option.
Command STEP EXAMINE WATCH LIST WATCH QUIT Abbreviation ENTER key E variable(s) W variable(s) LW Q Action Steps through a program one statement at a time. Displays the value of the variable. Suspends execution when the value of the variable changes. List variables that are watched. Halts execution of the DATA step.

Page 165 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
The DATA step is the most problematic part of SAS debugging. First rule of debugging o Always check the SAS log o Always start at the beginning For DATA step Debugging you can use, o o o PUT statements Automatic variables (_ALL_, _INFILE_) IN data set option

Dont limit DATA step debugging strictly to DATA step tools. Also use Procedures to Debug DATA Steps like, o FREQ o MEANS o PRINT o REPORT o CONTENTS o DATASETS If your program is well documented and aligned neatly, debugging is very easy.

Creating Efficient SAS Codes


What is Efficiency? Minimizing the use of the following resources generally characterizes programming efficiency CPU time (the time your computer takes to perform calculations) I/O time (the time your computer takes to read data into memory and write data from the memory to your hard drive) Memory Data storage Programming time Avoid Unnecessary Data Steps: Example: Inefficient Efficient

data new; set old; where x > 10; run; proc means data=new; var x y z; run;

proc means data=old; where x > 10; var x y z; run;

Here, a new dataset is created for the sole purpose of performing a procedure on a subset of data. Instead, use a where statement in the procedure to do this. Where statements can be used with all procedures.

Page 166 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Sub-setting data from one dataset into multiple datasets can be achieved in one data step instead of many.

The datasets procedure can perform many housekeeping operations on a dataset, including copying, deleting, and renaming datasets, renaming variables, adding labels or changing formats. It does these operations much more efficiently than using data step programming because, it modifies only the descriptor portion of the Dataset whereas, the DATA step reads all the data from the dataset. Store Data in SAS Datasets: Instead of storing data in a raw data file and reading it again and again, store the raw data file in a permanent SAS dataset for later use. SAS reads a Dataset faster than an external file. Keeping only the required variables: When inputting a flat file, input only the variables needed. When inputting a SAS dataset, use a KEEP statement to keep only the variables needed. (Note: DROP will work, but KEEP provides good documentation.) DROP intermediate variables used for calculations. Example:

DATA X; DO I= 1 to 3; DO J=1 to 5; TEMPVAR = I;

Page 167 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS NEWVAL=TEMPVAR * J; END; END: DROP I J TEMPVAR; RUN;


When outputting a dataset, KEEP only the variables needed. Example:

DATA X(KEEP=A1 A2 A3 NEWVAR1 NEWVAR2);


Specify minimum length for variables: While creating a dataset, define the smallest possible length for variables. This can be done by using the LENGTH, INFORMAT & ATTRIB statements. This will reduce any unwanted blank spaces in the variable values and thus reduces the disk space usage. Example:

Data work.dsn1; Set work.dsn2; length var1 $4. var2 $5. var3 6.2; run;
Use WHERE statement for conditional processing: Use the WHERE statement instead of the sub-setting IF statement to filter data, if the dataset is large. The WHERE statement filters the data before it gets loaded into the PDV whereas, the IF statement filters the data only after the data is loaded into the PDV. Inefficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa; run; Efficient Method Data work.dsn1 ; set work.dsn2 ; where Product = Sofa; run;

Use IF-THEN/ELSE instead of multiple IF statements: Use the IF-THEN / ELSE statement instead of a series of IF-THEN statements. IF-THEN / ELSE statement skips the remaining conditions, if a condition is met whereas, the separate IF-THEN statements checks all the conditions for all the observations.

Page 168 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

Inefficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa then Discount=0.08; if Product = Bed then Discount=0.10; if Product = Chair then Discount=0.12; Run;

Efficient Method Data work.dsn1 ; set work.dsn2 ; if Product = Sofa then Discount=0.08; else if Product = Bed then Discount=0.10; else if Product = Chair then Discount=0.12; run;

IF/THEN/ELSE
When using a series of IF ... THEN ... ELSE ... statements, list the conditions in descending order of probability. This will save CPU time., Example:

IF YEAR LT THISYR THEN OUTPUT OUTOLD; ELSE IF YEAR EQ THISYR THEN OUTPUT OUTCUR; ELSE OUTPUT OUTBAD; SORT
Sort only the variables needed. It is faster. Example:

PROC SORT DATA=X (KEEP=A B C);


When sorting a permanent dataset or a large file, sort it into another dataset. Sorting into a permanent dataset takes more I/O. Sorting a large file requires more space. Example:

PROC SORT DATA=PERMLIB.X (KEEP= A B) OUT=XSORT;


Use the right operator to select records: Use the IN operator rather than OR operator to select a list of values. Inefficient Method Data work.dsn1 ; set work.dsn2 ; if Product in (Sofa, Bed, Chair) then Type=Furniture; Run; Efficient Method Data work.dsn1 ; set work.dsn2 ; if Product =(Sofa or Bed or Chair) then Type=Furniture; run;

Page 169 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Place the selection criteria in the right position: Imply the selection criteria first on the columns to delete unwanted observations before reading or processing rest of the fields. Inefficient Method Data work.dsn1 ; Set work.dsn2 ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 ); if Product = Computer ; run; run; Efficient Method Data work.dsn1 ; Set work.dsn2 ; if Product = Computer ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 );

Use a subset of data for testing codes: For testing a piece of SAS code on a large dataset use a part of the dataset using OBS= or OUTOBS= options rather than using the whole dataset. Date work.dsn1; set work.dsn2 ( obs=1000); A=mean(salary); run; Proc sql outobs=1000; create table work.dsn1 as select mean( salary) as A from work.ds2; Quit;

Compressing large Datasets: Use the COMPRESS= option while creating large datasets to store the datasets in compressed format. Use OPTIONS COMPRESS=YES; statement at the beginning of any SAS codes. Index the variables used for conditional processing: Create index on key columns or columns used for conditional processing i.e., columns used by WHERE or IF statements. Searching is faster if the column is indexed Index the variables used for conditional processing: Create index on key column, columns which are used for conditional processing i.e., columns used by WHERE or IF statements. Delete Unneeded Datasets: At the end of the program or at strategic points, it is a good practice to use PROC DATASETS to delete unneeded data sets from the work or permanent library. This will make room for the new datasets. This not only will improve performance, but more importantly will show the intention to the reader as well.

PROC DATASETS LIBRARY = WORK; DELETE TEMP EMP; QUIT;

Page 170 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS
Consolidate program steps using Proc SQL: Consolidate programming steps using the SQL procedure in order to save process time and resources. Inefficient Data work.dsn1; Set work.dsn2; Run; Proc sort data=work.dsn1; By products; Run; Efficient Proc sql; Create table work.dsn2 as Select * from work.dsn1 Order by products; Quit;

Summary
Errors are classified into: o o o Syntax Error Data Error Logic Error

First rule of debugging: o Always check the SAS log o Always start at the beginning Minimizing the use of the following resources generally characterizes programming efficiency: o o o o o CPU time I/O time Memory Data storage Programming time

Test your Understanding


1. How do you debug and test your SAS programs? 2. What can you learn from the SAS log when debugging? 3. What system options would you use to help debug a macro?

Page 171 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

References

Websites
www.sas.com www.support.sas.com http://v8doc.sas.com/sashtml/ http://support.sas.com/onlinedoc/913/docMainpage.jsp SUGI Papers http://www2.sas.com/proceedings/sugi30/toc.html http://www.SierraInformation.com http://www.cpc.unc.edu/services/computer/presentations/sasclass99 http://www.sasforum.co.nr/ http://www.ats.ucla.edu/stat/sas/ http://www.datasavantconsulting.com/roland/sastips.html http://en.wikipedia.org/wiki/SAS_System#Early_history_of_SAS http://www.nber.org/~veronica/sastips.htm http://www.ats.ucla.edu/STAT/sas/library/nesug00/bt3005.pdf

Books
SAS Programming by Example - By Ron Cody & Ray Pass The Little SAS Book: A Primer, Third Edition - By Lora D. Delwiche & Susan J. Slaughter SAS Certification Prep Guide: Base Programming for SAS9 - By SAS Publishing SAS Certification: Advanced Programming - By SAS Publishing SAS Macro Programming Made Easy, Second Edition - By Michele M. Burlew PROC SQL: Beyond the Basics Using SAS - By Kirk Paul Lafler SAS For Dummies by Stephen McDaniel & Chris Hemedinger Learning SAS by Example: A Programmer's Guide by Ron Cody

Page 172 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Handout - SAS

STUDENT NOTES:

Page 173 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected