Anda di halaman 1dari 77

Sort And

Accumulating
Totals

Last Updated : 29June, 2004

Center of
Excellence
Objectives
 Understand how the SAS System
initializes the value of a variable in the
PDV.
 Prevent reinitialization of a variable in
the PDV.
 Create an accumulating variable.
Creating an Accumulating
Variable
SaleDate SaleAmt
The SAS data set
01APR2001 498.49 prog2.daysales
02APR2001 946.50 contains daily sales
03APR2001 994.97 data for a retail store.
04APR2001 564.59 There is one
05APR2001 783.01 observation for each
06APR2001 228.82 day in April showing
07APR2001 930.57 the date (SaleDate)
08APR2001 211.47 and the total receipts
09APR2001 156.23 for that day (SaleAmt).
10APR2001 117.69
11APR2001 374.73
12APR2001 252.73
Creating an Accumulating
Variable
The store manager also wants to see a
running total of sales for the month as of
each day.

Partial Output Sale


SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 946.50 1444.99
03APR2001 994.97 2439.96
04APR2001 564.59 3004.55
05APR2001 783.01 3787.56
Creating Mth2Dte
By default, variables created with an
assignment statement are initialized to
missing at the top of the DATA step.

Mth2Dte=Mth2Dte+SaleAmt;

An accumulating variable must retain its


value from one observation to the next.
The RETAIN Statement
General form of the RETAIN statement:

RETAIN
RETAINvariable-name
variable-name<initial-value>
<initial-value> …
…;;

The RETAIN statement prevents SAS from re-


initializing the values of new variables at the top of
the DATA step.
Previous values of retained variables are available
for processing across iterations of the DATA step.
The RETAIN Statement
The RETAIN statement
 retains the value of the variable in the PDV
across iterations of the DATA step
 initializes the retained variable to missing
before the first execution of the DATA step if
an initial value is not specified
 is a compile-time-only statement.
Retain Mth2Dte and Set an
Initial Value
If you do not supply an initial value, all
the values of Mth2Dte will be missing.

retain Mth2Dte 0;
Creating an Accumulating
Variable

data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
run;
Compile data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49 run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MTH2DTE

...
...
Execute data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

. . 0

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

.
15066 .
498.49 0

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

0+498.49 R

SALEDATE SALEAMT MNTH2DTE

.
15066 .
498.49 498.49
0

...
...
data mnthtot;
SaleDate SaleAmt set prog2.daysales;
retain Implicit
Mth2Dte 0;
Implicit Return Output
15066 498.49 Mth2Dte=Mth2Dte+SaleAmt;
15067 946.50 run;
15068 994.97
15069 564.59
R
15070 783.01
SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

SALEDATE SALEAMT MNTH2DTE

15066
15067 498.49
946.50 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

498.49+946.50 R

SALEDATE SALEAMT MNTH2DTE

15066
15067 498.49
946.50 1444.99
498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Implicit Return Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068
15067 946.50
994.97 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

1444.99+994.97 R

SALEDATE SALEAMT MNTH2DTE

15068
15067 946.50
994.97 2439.96
1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Implicit Return Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96

Write out observation to mnthtot.


...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

Continue processing until R


end of SAS data set
SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96


Creating an Accumulating
Variable
proc print data=mnthtot noobs;
format SaleDate date9.;
run;

Partial PROC PRINT Output


Sale
SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 946.50 1444.99
03APR2001 994.97 2439.96
04APR2001 564.59 3004.55
05APR2001 783.01 3787.56
Accumulating Totals: Missing
Values
What happens if there are missing values
for SaleAmt?

data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2dte=Mth2Dte+SaleAmt;
run;
Undesirable Output

Sale
SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 . .
03APR2001 994.97 .
04APR2001 564.59 .
05APR2001 783.01 .
Subsequent
values of
Missing value Mth2Dte are
missing
...
...
The Sum Statement
When creating an accumulating variable, an
alternative to the RETAIN statement is the
sum statement.

General form of the sum statement:

variable
variable ++expression;
expression;
The Sum Statement
The sum statement
 creates the variable on the left side of the plus
sign if it does not already exist
 initializes the variable to zero before the first
iteration of the DATA step
 automatically retains the variable
 adds the value of the expression to the
variable at execution
 ignores missing values.
Accumulating Totals: Missing
Values

data mnthtot2;
set prog2.daysales2;
Mth2Dte+SaleAmt;
run;
Accumulating Totals: Missing
Values
proc print data=mnthtot2 noobs;
format SaleDate date9.;
run;

Partial PROC PRINT Output

SaleDate SaleAmt Mth2Dte

01APR2001 498.49 498.49


02APR2001 . 498.49
03APR2001 994.97 1493.46
04APR2001 564.59 2058.05
05APR2001 783.01 2841.06
c03s1d1.sas
Objectives
 Define First. and Last. processing.
 Calculate an accumulating total for
groups
of data.
 Use a subsetting IF statement to output
selected observations.
Accumulating Totals for Groups
EmpID Salary Div
The SAS data set
prog2.empsals contains
E00004 42000 HUMRES each employee’s
E00009 34000 FINACE identification number
(EmpID), salary (Salary),
E00011 27000 FLTOPS and division (Div). There
E00036 20000 FINACE is one observation for
E00037 19000 FINACE each employee.

E00048 19000 FLTOPS


E00077 27000 APTOPS
E00097 20000 APTOPS
E00107 31000 FINACE
E00123 20000 APTOPS
E00155 27000 APTOPS
E00171 44000 SALES
Desired Output
Human resources wants a new data set
that shows total salary paid for each
division.

Div DivSal

APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000
Grouping the Data
A
B
You must group
the data in the
E SAS data set
D
C before you
can perform
processing.
Review of the SORT Procedure
You can rearrange the observations into
groups using the SORT
procedure.

General form of a PROC SORT step:

PROC
PROCSORT
SORTDATA=input-SAS-data-set
DATA=input-SAS-data-set
<OUT=output-SAS-data-set>;
<OUT=output-SAS-data-set>;
BY
BY<DESCENDING>
<DESCENDING>BY-variable
BY-variable ...;
...;
RUN;
RUN;
The SORT Procedure
 The SORT procedure
 rearranges the observations in a DATA set
 can sort on multiple variables
 creates a SAS data set that is a sorted copy of
the input SAS data set
 replaces the input data set by default.
Sorting by Div

proc sort data=prog2.empsals out=salsort;


by Div;
run;
Processing Data in Groups

Div Salary DivSal


APTOPS 20000
APTOPS 100000 170000
APTOPS 50000
FINACE 25000
FINACE 20000
95000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000 22000

...
...
BY-Group Processing

DATA
DATAoutput-SAS-data-set;
output-SAS-data-set;
SET
SETinput-SAS-data-set;
input-SAS-data-set;
BYBYBY-variable
BY-variable …
…;;
<additional
<additionalSAS
SAS statements>
statements>
RUN;
RUN;
BY-Group Processing

data divsals(keep=Div DivSal);


set salsort;
by Div;
additional SAS statements
run;
BY-Group Processing
A BY statement in a DATA step creates
temporary variables for each variable
listed in the BY statement.

General form of the names of BY variables


in a DATA step:

First.BY-variable
First.BY-variable
Last.BY-variable
Last.BY-variable
First. and Last. Values
 The First. variable has a value of 1 for the first
observation in a BY group; otherwise, it equals
0.
 The Last. variable has a value of 1 for the last
observation in a BY group; otherwise, it equals
0.
Use these temporary variables to
conditionally process sorted, grouped, or
indexed data.
First. / Last. Example

Look Ahead
Div Salary First.Div
APTOPS 20000 1
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Look Ahead
Div Salary First.Div
APTOPS 20000 0
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


Look Ahead
APTOPS 20000 0
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 1
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


APTOPS 20000 1
Look Ahead
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


APTOPS 20000 1
Look Ahead
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000
What Must Happen When?
 There is a three-step process for accumulating
totals:

 1. Set the accumulating variable to 0 at the


start of each BY group.

 2. Increment the accumulating variable with


a sum statement (automatically retains).

 3. Output only the last observation of each


BY group.
Accumulating Totals for Groups
1. Set the accumulating variable to 0 at the
start of each BY group.

data divsals(keep=Div DivSal);


set salsort;
by Div;
if First.Div then DivSal=0;
additional SAS statements
run;
Accumulating Totals for Groups
2. Increment the accumulating variable with a
sum statement (automatically retains).

data divsals(keep=Div DivSal);


set salsort;
by Div;
if First.Div then DivSal=0;
DivSal+Salary;
additional SAS statements
run;
First. / Last. Example

Div Salary DivSal


APTOPS 20000 20000
APTOPS 100000 120000
APTOPS 50000 170000
FINACE 25000 25000
FINACE 20000 45000
FINACE 23000 68000
FINACE 27000 91000
SALES 10000 10000
SALES 12000 22000
Subsetting IF Statement
The subsetting IF defines a condition that the
observation must meet to be further processed by
the DATA step.
General form of the subsetting IF statement:

IF
IFexpression;
expression;
 If the expression is true, the DATA step continues
processing the current observation.
 If the expression is false, SAS returns to the top of the
DATA step.
Accumulating Totals for Groups
3. Output only the last observation of each BY group.
data divsals(keep=Div DivSal);
set salsort;
by Div;
if First.Div then DivSal=0;
DivSal+Salary;
if Last.Div;
run;
Subsetting IF Statement
(Review)
Initialize
InitializePDV.
PDV.

Execute
Executeprogram
program
statements.
statements. NO

Is the
If condition; condition
true?

Execute
Executeadditional
additional YES
program
programstatements.
statements.

Output
Outputobservation
observationto
to
SAS
SASdata
dataset.
set.
...
...
Accumulating Totals for Groups

Partial Log
NOTE: There were 39 observations read
from the data set WORK.SALSORT.
NOTE: The data set WORK.DIVSALS has 5
observations and 2 variables.
NOTE: DATA statement used:
real time 0.74 seconds
cpu time 0.33 seconds
Accumulating Totals for Groups

proc print data=divsals noobs;


run;

PROC PRINT Output


Div DivSal

APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000

c03s2d1.sas
Input Data

EmpID Salary Region Div The SAS data


E00004 42000 E HUMRES
set prog2.regsals
E00009 34000 W FINACE contains each
E00011 27000 W FLTOPS employee’s ID
E00036 20000 W FINACE number (EmpID),
E00037 19000 E FINACE
E00077 27000 C APTOPS
salary (Salary),
E00097 20000 E APTOPS region (Region),
E00107 31000 E FINACE and division
E00123 20000 NC APTOPS (Div). There is
E00155 27000 W APTOPS
E00171 44000 W SALES
one observation
E00188 37000 W HUMRES for each
E00196 43000 C APTOPS employee.
E00210 31000 E APTOPS
E00222 250000 NC SALES
E00236 41000 W APTOPS
Desired Output
Human resources wants a new data set
that shows the total salary paid and the
total number of employees for each
division in each region.
Partial Output Num
Region Div DivSal Emps

C APTOPS 70000 2
E APTOPS 83000 3
E FINACE 109000 4
E FLTOPS 122000 3
E HUMRES 178000 5
NC APTOPS 37000 2
NC FLTOPS 28000 1
Sorting by Region and Div
The data must be sorted by Region and
Div. Region is the primary sort variable.
Div is the secondary sort variable.

proc sort data=prog2.regsals out=regsort;


by Region Div;
run;
Sorting by Region and Div

proc print data=regsort noobs;


run;

Partial PROC PRINT Output


Region Div Salary

C APTOPS 27000
C APTOPS 43000
E APTOPS 20000
E APTOPS 31000
E APTOPS 32000
E FINACE 19000
E FINACE 31000
Multiple BY Variables

data regdivsals;
set regsort;
by Region Div;
additional SAS statements
run;
Multiple BY Variables: Example
Look Ahead
Region Div First.Region
C APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example

Look Ahead
Region Div First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example

Region Div
Look Ahead First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
1
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
...
...
Multiple BY Variables: Example

Region Div First.Region


C Look Ahead
APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
...
...
Multiple BY Variables: Example

Region Div First.Region


C Look Ahead
APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
Multiple BY Variables
When you use more than one variable in
the BY statement, a change in the primary
variable forces Last.BY-variable=1 for the
secondary variable.
First. Last. First.
Region Div Region Region Div Last.Div

C APTOPS 1 0 1 0
C APTOPS 0 1 0 1
E APTOPS 1 0 1 0
E APTOPS 0 0 0 0
E APTOPS 0 0 0 1
E FINACE 0 0 1 0
Multiple BY Variables
/*Summarize salaries by division*/
data regdivsals(keep=Region Div
DivSal NumEmps);
set regsort;
by Region Div;
if First.Div then do;
DivSal=0;
NumEmps=0;
end;
DivSal+Salary;
NumEmps+1;
if Last.Div;
run;
Multiple BY Variables
Partial Log
NOTE: There were 39 observations read
from the data set WORK.REGSORT.
NOTE: The data set WORK.REGDIVSALS has
14 observations and 4 variables.
NOTE: DATA statement used:
real time 0.07 seconds
cpu time 0.07 seconds
Multiple BY Variables

proc print data=regdivsals noobs;


run;

Partial PROC PRINT Output

Region Div DivSal

C APTOPS 70000
E APTOPS 83000
E FINACE 109000
E FLTOPS 122000

c03s2d2.sas
Questions