Anda di halaman 1dari 34

Introducing

SAS software

Acknowlegements to
David Williams
Caroline Brophy


Statistics
in
Science
Need to know

SAS environment

SAS files (datasets, catalogs etc) & libraries

SAS programs

How to:

Get data in

Manipulate data

Get results out


Statistics
in
Science
SAS software environment


Statistics
in
Science
SAS Windows (SAS 9)


Statistics
in
Science
Some (!) SAS windows
Editor
Where code is written or imported, and submitted
Log
What happened, including what went wrong
Output
Results of program procedures that produce output
Explorer
Shows libraries (SAS & Windows), their files, and where you can see
data, graphs
Results
Shows how the output is made up of tables, graphs, datasets etc
Notepad
A useful place to keep bits of code


Statistics
in
Science
SAS software programs


Statistics
in
Science
SAS Programs
data one;
input x y;
datalines;
-3.2 0.0024 DATA step
-3.1 0.0033 creates SAS data set
. . .
;
run;

proc print data = one (obs = 5);


run;
PROC steps
proc means data = one; process data in data set
run;


Statistics
in
Science
Step Boundaries
SAS steps begin with a
DATA statement
PROC statement.

SAS detects the end of a step when it encounters


a RUN statement (for most steps)
a QUIT statement (for some procedures)
the beginning of another step (DATA statement or
PROC statement).
Recommendation: use RUN; at end of each step

Statistics
in
Science
Step Boundaries
data seedwt;
input oz $ rad wt;
datalines;
Low 118.4 0.7
High 109.1 1.3
Low 215.2 2.9
run;

proc print data = two;

proc means data = seedwt;


class oz;
var rad wt;
run;


Statistics
in
Science
Submitting a SAS Program

When you execute a SAS program, the output generated


by SAS is divided into two major parts:

SAS log contains information about the processing of


the SAS program, including any warning and
error messages.

SAS output contains reports generated by SAS


procedures and DATA steps.


Statistics
in
Science
Recommended steps!
1) Submit all (or selected) code by
F4
Click on the runner in the toolbar
2) Read log
3) Look in output window
if you expect code to produce output
4) Problems
Bad syntax
Missing ; at end of line
Missing quote at end of title (nasty!)


Statistics
in
Science
Improved output - HTML

Tools Options Preferences Results

Do this & resubmit code


Check HTML output in Results Window


Statistics
in
Science
SAS data sets


Statistics
in
Science
SAS data sets

SAS procedures (PROC ) process data from SAS


data sets

Need to know (briefly!)


What a SAS data set looks like

How to get out data into a SAS data set


Statistics
in
Science
SAS data sets

live in libraries

have a descriptor part (with useful info)

have a data part which is a rectangular table


of character and/or numeric data values
(rows called observations)

have names with syntax


<libname.>datasetname
libname defaults to work if omitted


Statistics
in
Science
work library

SAS data sets with a single part name like


oz, wp or mybestdata99
1) are stored in the work library

2) can be referenced e.g. as


mybestdata99 or work.mybestdata99

3) are deleted at end of SAS session!


Statistics
in
Science
Dont loose your data!

Keep the SAS program that read the data from its
original source

. . . More later!


Statistics
in
Science
Viewing descriptor & data

/* view descriptor part */


proc contents data = wp;
run;
/* view data part */
proc print data = work.wp;
run;

Alternatively:
Use SAS Explorer: Open (for data) Properties (for descriptor)
Properties is not as clear as CONTENTS


Statistics
in
Science
SAS variables
There are two types of variables:
character contain any value: letters, numbers, special
characters, and blanks.
Character values are stored with a length of 1 to 32,767
bytes (default is 8).
One byte equals one character.

numeric stored as floating point numbers in 8 bytes


of storage by default.
Eight bytes of floating point storage provide space for 16 or
17 significant digits.
You are not restricted to 8 digits.
Dont change the 8 byte length!


Statistics
in
Science
SAS variables

OUTPUT

The CONTENTS Procedure

Alphabetic List of Variables and Attributes

# Variable Type Len


1 oz Char 8
2 rad Num 8
3 wt Num 8


Statistics
in
Science
SAS names
for data sets & variables
can be 32 characters long.

can be uppercase, lowercase, or mixed-case


but are not case sensitive!

must start with a letter or underscore. Subsequent characters can


be letters, underscores, or numeric digits
- no %$!*&#@ or spaces.


Statistics
in
Science
Missing Data Values
A value must exist for every variable for each observation.
Missing values are valid values.
LastName FirstName JobTitle Salary

TORRES JAN Pilot 50000


LANGKAMM SARAH Mechanic 80000
SMITH MICHAEL Mechanic .
WAGSCHAL NADJA Pilot 77500
TOERMOEN JOCHEN 65000

A character missing A numeric


value is displayed as missing value
a blank. is displayed as
a period.

Statistics
in
Science
SAS syntax
Not case sensitive
Each line usually begins with keyword
and ends with ;
Common Errors:
Forget ;
Miss-spelt or wrong keyword
Missing final quote in title

title Woodpecker Habitat; /* quote mark missing */


title Woodpecker Habitat;


Statistics
in
Science
Comments

1. Type /* to begin a comment.


2. Type your comment text.
3. Type */ to end the comment.
To comment selected typed text remember: Ctrl+/
Alternative:
* comment ;


Statistics
in
Science
SAS

Creating a SAS data set


Statistics
in
Science
Getting data in!

Consider 2 methods

1) Data in program (briefly!)

2) Data in Excel workbook


Statistics
in
Science
Getting data in!
Data in program file:

data oz;
input oz $ rad wt;
datalines;
Low 118.4 0.7
High 109.1 1.3
Low 215.2 2.9
. . . Note:
;
1. oz is text variable so requires $
run;
2. No missing values
3. Values of oz
dont contain spaces
are at most 8 character long


Statistics
in
Science
Getting data in!
from Excel

Use IMPORT wizard


saving program to reduce future clicking!


Statistics
in
Science
Creating new variables
Adding a new variable to an existing SAS data
set (say work.old)
1. Use set
2. Give definition of new variable

data new;
/* read data from work.old */
set old;
y2 = y**2;
ly = log(y);
ly_base10 = log10(y);
t1 = (treat = 1);
run;


Statistics
in
Science
Data set: work.new

Obs treat y ysquared logy logy_base10 t1


1 A 10.0 100.00 2.30259 1 0
2 A 100.0 10000.00 4.60517 2 0
3 B -10.0 100.00 . . 1
4 B 0.0 0.00 . . 1
5 B 0.1 0.01 -2.30259 -1 1


Statistics
in
Science
Data Screening


Statistics
in
Science
Data Screening
checking input data for gross errors
Use PRINT procedure to scan for obvious anomalies
Use MEANS procedure & examine summary table
MAXIMUM, MINIMUM reasonable?
MEAN - near middle of range?
MISSING VALUES - input or calculation error e.g.
log(0)?
CV (= 100*std.dev/mean) - < 10% for plant growth,
between 12 & 30% for animal production variables, >
50% implies skewness for any positive variable


Statistics
in
Science
SAS syntax

MEANS syntax

What else should go here?


Statistics
in
Science
Dealing with data errors
Check original records
Change mistakes in recording where the correct
value is beyond question
Regenerate observations where possible e.g.
reweigh sample, redo chemical analysis
With a large body of data in an unbalanced
design err on the side of omitting questionable
data

Do not proceed until data has been


properly cleaned if necessary
perform a number of screening runs


Statistics
in
Science

Anda mungkin juga menyukai