Anda di halaman 1dari 42

An Introduction to SAS

Character Functions (including


some new SAS® 9 functions)
Ron Cody, Ed.D.
Some Functions We Will Discuss
• LENGTH • TRIM
• SUBSTR • UPCASE
• COMPBL • LOWCASE
• COMPRESS • INDEX
• VERIFY • INDEXC
• INPUT • INDEXW
• PUT • SPEDIS
• TRANWRD • LENGTH
• SCAN
Some SAS® 9 Functions
• ANYDIGIT STRIP
• ANYALPHA COUNT
• NOTDIGIT COUNTC
• NOTALPHA PROPCASE
FIND
• CATX and CATS FINDC
• COMPARE
• LENGTHC
• LENGTHN
Character Storage Lengths
data chars1;
length string $ 7;
string = 'abc';
length = length(string);
storage_length = lengthc(string);
display = ":" || string || ":";
put storage_length= /
length= /
display=;
run;
SAS Log
11 data chars1;
12 length string $ 7;
13 string = 'abc';
14 storage_length = lengthc(string);
15 length = length(string);
16 display = ":" || string || ":";
17 put storage_length= /
18 length= /
19 display=;
20 run;

storage_length=7
length=3
display=:abc :
Moving the LENGTH Statement
data chars2;
string = 'abc';
length string $ 7;
storage_length = lengthc(string);
length = length(string);
display = ":" || string || ":";
put storage_length= /
length= /
display=;
run;
SAS Log
1 data chars2;
2 string = 'abc';
3 length string $ 7;
WARNING: Length of character variable string has already been set.
Use the LENGTH statement as the very first statement in the DATA
STEP to declare the length of a character variable.
4 storage_length = lengthc(string);
5 length = length(string);
6 display = ":" || string || ":";
7 put storage_length= /
8 length= /
9 display=;
10 run;

storage_length=3
length=3
display=:abc:
The INPUT Function
data special;
***INPUT is a special function often used
for character to numeric conversion;
length c_date $ 10 numeral $ 3;
input c_date numeral;
sas_date = input(c_date,mmddyy10.);
number = input(numeral,3.);
datalines;
11/12/1950 123 Listing of Data Set SPECIAL
9-15-2004 99
c_date numeral sas_date number

11/12/1950 123 -3337 123


9-15-2004 99 16329 99
The PUT Function
data special;
***PUT is a special function often used
for numeric to character conversion;
input sas_date number ss;
c_date = put(sas_date,date9.);
money = put(number,dollar8.);
ss_char = put(ss,ssn.);
datalines;
0 1234 123456789
;
Listing of Data Set SPECIAL

sas_date number ss c_date money ss_char

0 1234 123456789 01JAN1960 $1,234 123-45-6789


Converting Multiple Blanks to a
Single Blank
data multiple;
input #1 @1 Name $20.
#2 @1 Address $30. Multiple
#3 @1 City $15.
@20 State $2. Name Address
@25 Zip $5.; Ron Cody 89 Lazy Brook Rd.
name = compbl(name); Bill Brown 28 Cathy Street
address = compbl(address);
city = compbl(city);
datalines; City State Zip
Ron Cody Flemington NJ 08822
89 Lazy Brook Rd. North City NY 11518
Flemington NJ 08822
Bill Brown
28 Cathy Street
North City NY 11518
;
How to Remove Characters
from a String
data phone;
input phone $15.;
phone1 = compress(phone);
phone2 = compress(phone,'(-) ');
datalines;
(908)235-4490
(201) 555-77 99
; Phone

phone phone1 phone2

(908)235-4490 (908)235-4490 9082354490


(201) 555-77 99 (201)555-7799 2015557799
Another COMPRESS Example
data social;
input ss_char $11.;
ss = input(compress(ss_char,'-'),9.);
easy_ss = input(ss_char,comma11.);
datalines;
123-45-6789
;

ss = 123456789 (numeric)
ss_easy = 123456789 (numeric)
Compress Function (SAS® 9 changes)
COMPRESS (char_value <, comp_string> <,modifiers>)

char_value is a SAS character value


comp_string is a character value containing the
characters to remove from char_value.
modifiers add additional characters to the list of
characters to remove or modify the way the function
works (see next slide).
Compress Function Modifiers (SAS®
9)
Selected list of COMPRESS modifiers (upper- or
lowercase)
• a adds upper- and lowercase letters
• d adds numerals (digits)
• i ignores case
• k keeps listed characters instead of removing
them
• s adds space (blank, tabs, lf, cr) to the list
• p adds punctuation
Examples
For these examples, char = "A C123XYZ"
,
phone = "(908) 777-1234"
Using the Compress Modifiers
data phone;
input phone $15.;
number = compress(phone,,'kd');
datalines;
(908)235-4490
(201) 555-77 99
; Listing of Data Set PHONE

phone number

(908)235-4490 9082354490
(201) 555-77 99 2015557799
The VERIFY Function
data verify;
input @1 id $3.
@5 answer $5.;
position = verify(answer,'abcde');
datalines;
Verify
001 acbed
002 abxde id answer position
003 12cce 001 acbed 0
004 abc e 002 abxde 3
; 003 12cce 1
004 abc e 4
Watch Out for Trailing Blanks
data trailing;
length string $ 10;
string = 'abc';
position = verify(string,'abcde');
run;

String = 'abc '


Position = 4 (the position of the first trailing
blank)
Watch Out for Trailing Blanks
data trailing;
length string $ 10;
string = 'abc';
position =
verify(trim(string),'abcde');
run;

Position = 0
Using VERIFY for Data Cleaning
data clean;
input id $;
***Valid ID's contain letters X,Y, or Z
and digits;
if verify(trim(id),'XYZ0123456789') eq 0
then valid = 'Yes';
else valid = 'No';
datalines; Listing of Data Set CLEAN
12X67YZ
67WXYZ id valid
;
12X67YZ Yes
67WXYZ No
Substring Example
data pieces_parts;
input Id $9.;
length State $ 2;
state = substr(Id,3,2);
Num = input(substr(Id,5),4.);
datalines; Listing of Data Set PIECES_PARTS
XYNY123
XYNJ1234 Id State Num
; XYNY123 NY 123
XYNJ1234 NJ 1234
The SUBSTR Function on the Left-
Hand Side of the Equal Sign
data pressure;
input sbp dbp @@;
length sbp_chk dbp_chk $ 4;
sbp_chk = put(sbp,3.);
dbp_chk = put(dbp,3.);
if sbp gt 160 then
substr(sbp_chk,4,1) = '*';
if dbp gt 90 then
substr(dbp_chk,4,1) = '*';
datalines;
120 80 180 92 200 110
;
The SUBSTR Function on the Left-
Hand Side of the Equal Sign

Listing of Data Set PRESSURE

sbp dbp sbp_chk dbp_chk

120 80 120 80
180 92 180* 92*
200 110 200* 110*
Parsing a String
data take_apart;
input @1 Cost $10.;
Integer = input(scan(Cost,1,' /'),8.);
Num = input(scan(Cost,2,' /'),8.);
Den = input(scan(Cost,3,' /'),8.);
if missing(Num) then Amount = Integer;
else Amount = Integer + Num/Den;
datalines;
1 3/4 Listing of Data Set TAKE_APART

12 1/2 Cost Integer Num Den Amount


123
1 3/4 1 3 4 1.75
; 12 1/2 12 1 2 12.50
123 123 . . 123.00
Using the SCAN Function to
Extract a Last Name
data first_last;
length last_name $ 15;
input @1 name $20.
@22 phone $13.;
***extract the last name from name;
last_name = scan(name,-1,' ');
*** minus value scans from the right;
datalines;
Jeff W. Snoker (908)782-4382
Raymond Albert (732)235-4444
Alfred Edward Newman (800)123-4321
Steven J. Foster (201)567-9876
Using the SCAN Function to
Extract a Last Name
Names and Phone Numbers in Alphabetical
Order (by Last Name)

Name Phone Number


Raymond Albert (732)235-4444
Steven J. Foster (201)567-9876
Alfred Edward Newman (800)123-4321
Jose Romerez (516)593-2377
Jeff W. Snoker (908)782-4382
Locating the Position of One String
Within Another String
data locate;
input string $10.;
first = index(string,'xyz');
first_c = indexc(string,'x','y','z');
/*Equivalent indexc(string,'xyz') */
datalines;
string first first_c
abczyx1xyz
1234567890 abczyx1xyz 8 4
abcx1y2z39 1234567890 0 0
XYZabcxyz abcx1y2z39 0 4
; XYZabcxyz 7 7
Locating the Position of One String
Within Another String
data locate;
input string $10.;
first = find(string,'xyz','i');
first_c = findc(string,'xyz','i');
/* i means ignore case */
datalines;
string first first_c
abczyx1xyz
1234567890 abczyx1xyz 8 4
abcx1y2z39 1234567890 0 0
XYZabcxyz abcx1y2z39 0 4
; XYZabcxyz 1 1
Locating One Word in a String
Function INDEXW
data _null_;
string = 'anything goes any where';
index = index(string,'any');
indexw = indexw(string,'any');
put index= indexw=;
run;
index = 1
indexw = 15

Note: You can specify delimiters for indexw in a third argument


Changing Case
Data case;
input name $15.;
upper = upcase(name);
lower = lowcase(name);
proper = propcase(name);
Datalines;
gEOrge SMITH
The end
;
Listing of Data Set CASE

name upper lower proper

gEOrge SMITH GEORGE SMITH george smith George Smith


The end THE END the end The End
Substituting One Word for Another
in a String
data convert;
input @1 address $20. ;
*** Convert Street, Avenue and
Boulevard to their abbreviations;
Address = tranwrd(address,'Street','St.');
Address = tranwrd(address,'Avenue','Ave.');
Address = tranwrd(address,'Road','Rd.');
datalines;
89 Lazy Brook Road Listing of Data Set CONVERT
123 River Rd.
Obs Address
12 Main Street
; 1 89 Lazy Brook Rd.
2 123 River Rd.
3 12 Main St.
Spelling distance
data compare;
length string1 string2 $ 15;
input string1 string2;
points = spedis(string1,string2);
datalines;
same same Listing of Data Set COMPARE
same sam string1 string2 points
first xirst
last lasx same same 0
same sam 8
receipt reciept first xirst 40
; last lasx 25
receipt reciept 7
The "ANY" Functions
data find_alpha_digit;
input string $20.;
first_alpha = anyalpha(string);
first_digit = anydigit(string);
datalines;
Listing of Data Set FIND_ALPHA_DIGIT
no digits here
first_ first_
the 3 and 4 string alpha digit
123 456 789 no digits here 1 0
; the 3 and 4 1 5
123 456 789 0 1
The "NOT" Functions
Beware of Trailing Blanks
length string $ 10;
string = '123';
position = notdigit(string);
pos_trim = notdigit(trim(string));

position = 4 (position of first blank)


pos_trim = 0
The "NOT" Functions
data data_cleaning;
input string $20.;
not_alpha = notalpha(trim(string));
not_digit = notdigit(trim(string));
datalines;
abcdefg Listing of Data Set DATA_CLEANING
1234567
not_ not_
abc123 string alpha digit
1234abcd
; abcdefg 0 1
1234567 1 0
abc123 4 1
1234abcd 1 5
Concatenation Functions
data join_up;
length cats $ 6 catx $ 17;
string1 = 'ABC ';
string2 = ' XYZ ';
string3 = '12345';
cats = cats(string1,string2);
catx = catx('***',string1,string2,string3);
run;
cats = 'ABCXYZ'
catx = 'ABC***XYZ***12345

Without the length statement, cats and catx would have a length of
200
Some LENGTH Functions
data how_long;
one = 'ABC ';
miss = ' '; /* char missing value */
3 length_one = length(one);
3 lengthn_one = lengthn(one);
6 lengthc_one = lengthc(one);
1
0 length_two = length(miss);
1 lengthn_two = lengthn(miss);
lengthc_two = lengthc(miss);
run;
The COMPARE Function
COMPARE(string1, string2 <,'modifiers'>)
I ignore case
L remove leading blanks
: truncate the longer string to the length of the shorter
string. The default is to pad the shorter string with blanks
before a comparison.
(Note: similar to the =: comparison operator)

If string1 and string2 are the same, COMPARE returns a value of 0.


If the arguments differ, the sign of the result is negative if string1 precedes
string2 in a sort sequence, and positive if string1 follows string2 in a sort
sequence
The magnitude of the result is equal to the position of the leftmost
character at which the strings differ.
The COMPARE Function
data compare;
input code $ @@;
value = 'V30.450';
c1 = compare(code,value);
c2 = compare(code,value,':i');
c3 = compare(trim(code),value,':i');
datalines;
V30 V30.450 v30.4
; Listing of Data Set COMPARE
code value c1 c2 c3
V30 V30.450 -4 -4 0
V30.450 V30.450 0 0 0
v30.4 V30.450 1 -6 0
The STRIP Function
data _null_;
length concat $ 8;
file print;
one = ' ABC ';
two = ' XYZ ';
one_two = ':' || one || two || ':';
strip = ':' || strip(one) || strip(two) || ':';
concat = cats(':',one,two,':');
put one_two= /
strip= /
concat=;

one_two=: ABC XYZ :


strip=:ABCXYZ:
concat=:ABCXYZ:
COUNT and COUNTC Functions
data Dracula; /* Get it Count Dracula */
input string $20.;
count_abc = count(string,'abc');
countc_abc = countc(string,'abc');
count_abc_i = count(string,'abc','i');
datalines;
xxabcxABCxxbbbb
cbacba
Listing of Data Set DRACULA
;
count_ countc_ count_
string abc abc abc_i

xxabcxABCxxbbbb 1 7 2
cbacba 0 6 0
Contact Information
Author: Ron Cody
You may download copies of the Powerpoint
presentation from:

www2.umdnj.edu/codyweb/biocomputing

Anda mungkin juga menyukai