Anda di halaman 1dari 32

SASTechies

info@sastechies.com
http://www.sastechies.com
 Character data with specified lengths

 Standard numeric data values can only contain


numbers
decimal points
numbers in scientific, or E, notation (23E4)
minus signs.

 Nonstandard numeric data include


values that contain special characters, such as
percent signs (%), dollar signs ($), and commas (,)
date and time values
data in fraction, integer binary and real binary, and
hexadecimal forms.

SAS Techies 2009 11/02/21 2


External File Data
Raw data can be organized in several different ways.
>----+----10---+----20

 BARNES NORTH 360.98


This external file contains data that is free-format,  FARLSON WEST 243.94
meaning data that is not arranged in columns. Notice  LAWRENCE NORTH 195.04
 NELSON EAST 169.30
that the values for a particular field do not begin and end  STEWART SOUTH 238.45
 TAYLOR WEST 318.87
in the same columns. Column input can not be used to
read data organized in this way.

>----+----10---+----20
This external file contains data that is arranged
in columns or fixed fields. You can specify a  2810 61 MOD F
 2804 38 HIGH F
beginning and ending column for each field.  2807 42 LOW M
 2816 26 HIGH M
Let's look at how column input can be used to  2833 32 MOD F
 2823 29 HIGH M
read this data.

SAS Techies 2009 11/02/21 3


 Column Input
To use column input, your data must be standard character or
numeric values in fixed fields.
input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;
>----+----10---+----20

 2810 61 MOD  F
 2804 38 HIGH F
 2807 42 LOW  M
 2816 26 HIGH M
 2833 32 MOD  F
 2823 29 HIGH M

 One of the features of column input is the capability to read


fields in any order.
 Character variables values can be up to 32K and can contain
embedded blanks.
 No placeholder is required for missing data. A blank field is read
as missing and does not cause other fields to be read incorrectly.
 Fields or parts of fields can be reread.
 Fields do not have to be separated by blanks or other delimiters.

SAS Techies 2009 11/02/21 4


 You can use formatted input, which combines the features
of column input with the ability to read nonstandard, as
well as standard data.

 Whenever you encounter raw data that is organized into


fixed fields, you can use
 column input to read standard data only
 formatted input to read both standard and nonstandard
data.

SAS Techies 2009 11/02/21 5


 INPUT pointer-control variable  The @n is an absolute
informat.; pointer control that
moves the input pointer
>----+----10---+----20---+-- to a specific column
  number.
 ENVELOPE   $13.25   500   4
 
 you can use the @n to
move a pointer forward
 DISKETTES $29.50   10   3
 BANDS     $2.50   600   2
 RIBBON     $94.20   12   1
or backward when
reading a record.
 PAPER       $15.95   250   10

 The
input Name $14. @16 Amount comma6.2 damout var +n is a relative
pointer control that
input Name $14. +2 Amount comma6.2 damout var moves the input pointer
forward to a column
number relative to the
current position.

SAS Techies 2009 11/02/21 6


>----+----10---+----20---+--  The $w. informat
 ENVELOPE   $13.25  500   4 enables you to read
character data.
 DISKETTES  $29.50   10   3
 BANDS     $2.50  600   2
 RIBBON     $94.20   12   1
 PAPER       $15.95  250  10
 The w represents the
field width of the
input Name $ 1-14 +2 Amount data value
or
Difference !!!
the total number of
input Name $14. +2 Amount columns that contain
the raw data field.

SAS Techies 2009 11/02/21 7


 The informat for reading standard numeric
data is the w.d informat.

34.0008   
  7.4   
  34.0008

SAS Techies 2009 11/02/21 8


 The COMMAw.d informat is
used to read numeric values
and remove embedded
 blanks
 commas
 dashes
 dollar signs $34,000      Comma7.   
  34000
 percent signs
 right parentheses
 left parentheses, which
are converted to minus
signs.

SAS Techies 2009 11/02/21 9


 External files with a fixed-length record format have an
end-of-record marker after a predetermined number of
columns.

 A typical record length is 80 columns.

>----+----10---+----20---+---------------
 BIRD FEEDER   LG088   3 20
 GLASS MUGS    SB082   6 12
 GLASS TRAY    BQ049 12 6
 PADDED HANGRS MN256 15 20
 JEWELRY BOX   AJ498 23  0
 RED APRON     AQ072 9 12
 CRYSTAL VASE   AQ672 27  0
 PICNIC BASKET LS930 21   0

SAS Techies 2009 11/02/21 10


input Department $ 1-11 @13  Files with a variable-
TotalReceipts comma8.;
>----+----10---+---V20-------------
length record format
have an imaginary
 BED/BATH     1,354.93*
 HOUSEWARES   2,464.05*
 GARDEN       923.34*
end-of-record
 GRILL      
 SHOES    
598.34*
  1,345.82* marker after the last
field in each record.
 SPORTS*
 TOYS        6,536.53*

◦ Beware of Errors
◦ infile receipts pad;

SAS Techies 2009 11/02/21 11


 raw data that is free- >----+----10---+----20---+----

format; that is, it is   ABRAMS*L.*MARKETING*$8,209


 BARCLAY*M.*MARKETING*$8,435

not arranged in fixed  COURTNEY*W.*MARKETING*$9,006


 FARLEY*J.*PUBLICATIONS*$8,305

fields
 HEINS*W.*PUBLICATIONS*$9,539

 The fields may be >V---+----10---+----20

separated by blanks  MALE 27 1 8 0 0


 FEMALE 29 3 14 5 10

or some other  FEMALE 34 2 10 3 3

delimiter
infile credit dlm=‘ ‘; input
Gender $ Age Bankcard
FreqBank
Deptcard FreqDept;

SAS Techies 2009 11/02/21 12


 Limitations
◦ Missing data values must be specified with a period (.)
for both character and numeric data.
◦ Although the width of a field can be greater than eight
characters, both character and numeric variables have
a default length of 8. Character values longer than
eight characters will be truncated.
◦ Data must be in standard numeric or character format.
◦ Character values cannot contain embedded blanks.

SAS Techies 2009 11/02/21 13


>V---+----10---+----20  Missover option is
 MALE 27 1 8 * *
used to handle
 FEMALE 29 3 14 5 10
 FEMALE 34 2 10 3 3 missing values at the
end of a record
data perm.survey;
infile credit missover;
input Gender $ Age Bankcard
 If the missing value
FreqBank Deptcard FreqDept; is in the middle of
the record then edit
the raw data file
>V---+----10---+----20

 MALE 27 1 8 92 39
 FEMALE * 3 14 5 10
 FEMALE 34 2 10 3 3

SAS Techies 2009 11/02/21 14


 You can make list input more
data perm.cityrank;
versatile by using modified
infile topten;
list input. There are two
input Rank City & $12.
modifiers that can be used
Pop86 : comma.;
with list input.

 The ampersand (&) modifier


>----+----10---+----20---+-- is used to read character
values that contain
embedded blanks.
  1 NEW YORK  7,262,700
  2 LOS ANGELES  3,259,340
  3 CHICAGO  3,009,530
  4 HOUSTON  1,728,910
  5 PHILADELPHIA  1,642,900  The colon (:) modifier is used
  6 DETROIT  1,086,220
  7 SAN DIEGO  1,015,190
to read nonstandard data
  8 DALLAS  1,003,520 values and character values
  9 SAN ANTONIO  914,350
 10 PHOENIX  894,070
longer than eight characters,
but without embedded
blanks.

SAS Techies 2009 11/02/21 15


 When you read a date using a SAS informat, SAS software
converts it to a numeric date value. A SAS date value is the
number of days from January 1, 1960, to the given date.

Date Expression   SAS Date Informat   SAS Date Value

02Jan00 DATEw. 14611


01-02-2000 MMDDYYw. 14611
02/01/00 DDMMYYw. 14611
2000/01/02 YYMMDDw. 14611

SAS Techies 2009 11/02/21 16


 SAS software stores time
values similar to the way
it stores date values. A
SAS time value is stored
as the number of
seconds since midnight.
 A SAS datetime is a
special value that
combines both date and
time information. A SAS
datetime value is stored
as the number of
seconds between
midnight on January 1,
1960, and a given date
and time.

SAS Techies 2009 11/02/21 17


 When a two-digit year value
 Date7. Informat is read, SAS software
 Mmddyyn8. defaults to a year within a
100-year span determined
by the YEARCUTOFF=
system option.
 The value of the
YEARCUTOFF= system
Date Expression Interpreted As option only affects two-
digit year values. A date
12/07/41 12/07/1941 value that contains a four-
digit year value will be
18Dec15 18Dec2015 interpreted correctly even if
04/15/30 04/15/1930 it does not fall within the
100-year span set by the
15Apr95 15Apr1995 YEARCUTOFF= system
option.

SAS Techies 2009 11/02/21 18


Since dates are stored as numerics any
meaningful arithmetic calculations can be
performed on them.
Ex: Days=dateout-datein+1;

SAS Techies 2009 11/02/21 19


 Write multiple Input statements
>----+----10---+----

ABRAMS THOMAS input Lname $ 1-8 Fname $ 10-


  MARKETING     SR01 15;
input Department $ 1-12
$25,209.03

BARCLAY ROBERT JobCode $ 15-19;


input Salary comma10.;
EDUCATION     IN01
$24,435.71

COURTNEY MARK
PUBLICATIONS  TW01  one INPUT statement that

$24,006.16
You use the forward slash (/) line contains a line pointer control to
pointer control to read multiple specify the record(s) from which
records in sequential order. values are to be read
input Lname $ 1-8 Fname $ 10- input
15 / Department $ 1-12
JobCode $ 15-19 / Salary #1 Lname $ 1-8 Fname $ 10-15
comma10.; #2 Department $ 1-12 JobCode $
#3 Salary comma10.;

SAS Techies 2009 11/02/21 20


 repeating blocks of data
that represent separate
observations
>----+----10---+----20---+----30--

an ID field followed by an


01APR90 68 02APR90 67 03APR90 78
04APR90 74 05APR90 72 06APR90 73
07APR90 71 08APR90 75 09APR90 76

equal number of
>----+----10---+----20---+----30--
repeating fields that
 001 WALKING AEROBICS CYCLING
 002 SWIMMING CYCLING SKIING represent separate
observations
 003 TENNIS SWIMMING AEROBICS

 an ID field followed by a
varying number of
>----+----10---+----20---+----30--

 001 WALKING
 002 SWIMMING CYCLING SKIING
 003 TENNIS SWIMMING repeating fields that
represent separate
observations.

SAS Techies 2009 11/02/21 21


 The SAS System provides two line-hold specifiers.

The trailing @ enables the next INPUT statement to read


from the current record in the same iteration of the DATA
step.

Ex: input name $20. @;

The double trailing at sign (@@) enables the next INPUT


statement to read from the current record across further
iterations of the DATA step.

input name $20. @@;

SAS Techies 2009 11/02/21 22


 Normally, each time a DATA step
executes, the INPUT statement
reads a new record. But when
you use the @@, the INPUT
statement holds the current
record and reads the next value.

 A record held by the double


trailing at sign (@@) is not
released until

◦ the input pointer moves past the


end of the record. Then the input
input ID $4. @@; pointer moves down to the next
record.
.
. ◦ an INPUT statement without a
input Department 5.; line-hold specifier executes.

SAS Techies 2009 11/02/21 23


data perm.april90;
infile tempdata;
input Date : date. HighTemp @@;
format date date7.;
run;

SAS Techies 2009 11/02/21 24


 Like the @@, the single trailing @
◦ enables the next INPUT statement to read from the
same record
◦ releases the current record when a subsequent INPUT
statement executes without a line-hold specifier.

 Unlike the @@, the single @ also releases a record when


control returns to the top of the DATA step for the next
iteration.

SAS Techies 2009 11/02/21 25


data perm.sales97;
infile data97;
input ID $4. @;
do Quarter=1 to 4;
input Sales : comma. @;
output;
end;
run;

SAS Techies 2009 11/02/21 26


Raw Data File
>----+----10---+----
 H indicates a header record
H
P
 321 S. MAIN ST
 MARY E    21 F
that contains a street address
P
P
 WILLIAM M 23 M
 SUSAN K    3 F and P indicates a detail record
that contains information
H  324 S. MAIN ST
P  THOMAS H  79 M

about a person living at that


  P  WALTER S  46 M
P  ALICE A   42 F
P  MARYANN A 20 F
P
H
 JOHN S    16 M
 325A S. MAIN ST address.

SAS Data Set

Obs  Address          Name       Age Gender

 1   321 S. MAIN ST   MARY E     21    F


 2   321 S. MAIN ST   WILLIAM M  23    M
 3   321 S. MAIN ST   SUSAN K     3    F
 4   324 S. MAIN ST   THOMAS H   79    M
 5   324 S. MAIN ST   WALTER S   46    M
 6   324 S. MAIN ST   ALICE A    42    F
 7   324 S. MAIN ST   MARYANN A  20    F
 8   324 S. MAIN ST   JOHN S     16    M
 9   325A S. MAIN ST  JAMES L    34    M
10  325A S. MAIN ST  LIZA A     31    F
11  325B S. MAIN ST  MARGO K    27    F

SAS Techies 2009 11/02/21 27


 you want to keep the header record as a part of each observation until
the next header record is encountered.
 RETAIN variable1 variable2; If no variable is mentioned then applies to
ALL variables.
 When a RETAIN statement specifies variables, new variables are created.
Therefore, you must name any variables used in a RETAIN statement
exactly as you want them stored in the data set. You might need to drop
the extra variables.

>----+----10---+----

data perm.people;  H   321 S. MAIN ST


infile census;
retain Address;  P
MARY E     21 F
 P
 P WILLIAM M 23 M
SUSAN K     3 F

SAS Techies 2009 11/02/21 28


data perm.people (drop=type);
infile census;
retain Address;
input type $1. @;
if type='H' then input @3 Address $15 @@.;
if type='P‘ then
input @3 Name $10. @13 Age 3. @15 Gender
$1.; run;

SAS Techies 2009 11/02/21 29


Raw Data File
>----+----10---+---20
  H 321 S. MAIN ST SAS Data Set
P MARY E    21 F
P WILLIAM M 23 M Address Total
P SUSAN K    3 F 321 S. MAIN ST 3
324 S. MAIN ST 5
H 324 S. MAIN ST 325A S. MAIN ST 2
325B S. MAIN ST 3
P THOMAS H  79 M
P WALTER S  46 M
P ALICE A   42 F
P MARYANN A 20 F
P JOHN S    16 M

H 325A S. MAIN ST

P JAMES L 34 M
P LIZA A 31 F

H 325B S. MAIN ST

P MARGO K 27 F
P WILLIAM R 27 M
P ROBERT W 1 M

SAS Techies 2009 11/02/21 30


>----+----10---V----20
it's important to
1802 JOHNSON2123 specify a w value
1803 BARKER2142 that is large enough to
1804
1805
EDMUNDSON2325
RIVERS2543
accommodate the
  1806
1807
MASON2646
JACKSON2049
longest value.
1808 LEVY2856
1809 THOMAS2222

data perm.phones;
infile phondat length=reclen;
input ID 4. @;
namelen=reclen-9;
input Name $varying10. namelen PhoneExt;

SAS Techies 2009 11/02/21 31


                                15             15             15
       |     14     | |     14     | |     14     |
>----+----10---+----V0---+----30---V----40---+----V0

1234 13MAR89 120/80


1443 12FEB89 120/70 03FEB90 125/80 07OCT90 125/99
 
1681 11JAN90 120/80 05JUN90 110/70
2034 19NOV88 130/70 12MAY89 150/90 23MAR90 130/80

data perm.health;
infile bpdata length=reclen;
input ID 4. @;
do index=6 to reclen by 15;
input Date : date. BP $ @;
output;
end;
run;

SAS Techies 2009 11/02/21 32

Anda mungkin juga menyukai