Anda di halaman 1dari 48

An Introduction to SAS Character Functions (including some new SAS 9 functions)

Some Functions We Will Discuss


LENGTH SUBSTR COMPBL COMPRESS VERIFY INPUT PUT TRANWRD SCAN TRIM UPCASE LOWCASE INDEX INDEXC INDEXW SPEDIS LENGTH

Some SAS 9 Functions


CATX and CATS COMPARE LENGTHC LENGTHN STRIP COUNT COUNTC PROPCASE

FIND FINDC

Functions That Compute the Length of Strings: Purpose: To determine the length of a character value, not counting trailing blanks. Syntax: LENGTH(character-value)
Example: For these examples CHAR = "ABC
FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ") RETURNS 3 3 1

Function: LENGTHC
Purpose: To determine the length of a character value, including trailing blanks.

Syntax: LENGTHC(character-value) Examples: For these examples CHAR = "ABC


FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ") RETURNS 3 6 1

Function: LENGTHM
Purpose: To determine the length of a character variable in memory. Syntax: LENGTHM(character-value)

Examples For these examples CHAR = "ABC "

FUNCTION LENGTH("ABC") LENGTH(CHAR) LENGTH(" ")

RETURNS 3 6 1

Function: LENGTHN
Purpose: To determine the length of a character value, not counting trailing blanks.

Syntax: LENGTHN(character-value)

Examples For these examples CHAR = "ABC

FUNCTION
LENGTH("ABC") LENGTH(CHAR) LENGTH(" ")

RETURNS
3 3 0

The COMPARE Function


data compare; input code $ @@; value = 'V30.450'; c1 = compare(code,value); c2 = compare(code,value,':i'); c3 = compare(trim(code),value,':i'); datalines; V30 V30.450 v30.4 ; Listing of Data Set COMPARE
code
V30 V30.450 v30.4

value
V30.450 V30.450 V30.450

c1
-4 0 1

c2
-4 0 -6

c3
0 0 0

Character Storage Lengths


data chars1; length string $ 7; string = 'abc'; length = length(string); storage_length = lengthc(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;

SAS Log
11 12 13 14 15 16 17 18 19 20 data chars1; length string $ 7; string = 'abc'; storage_length = lengthc(string); length = length(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;

storage_length=7 length=3 display=:abc :

Moving the LENGTH Statement


data chars2; string = 'abc'; length string $ 7; storage_length = lengthc(string); length = length(string); display = ":" || string || ":"; put storage_length= / length= / display=; run;

SAS Log
1 data chars2; 2 string = 'abc'; 3 length string $ 7; WARNING: Length of character variable string has already been set. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable. 4 storage_length = lengthc(string); 5 length = length(string); 6 display = ":" || string || ":"; 7 put storage_length= / 8 length= / 9 display=; 10 run;

storage_length=3 length=3 display=:abc:

Function: SUBSTR
Purpose: To extract part of a string. When the SUBSTR function is used on the left side of the equal sign. Syntax: SUBSTR(character-value, start <,length>) Examples For these examples, let STRING = "ABC123XYZ"

Function

Returns

SUBSTR(STRING,4,2) SUBSTR(STRING,4)

"12" "123XYZ"

The INPUT Function


data special; ***INPUT is a special function often used for character to numeric conversion; length c_date $ 10 numeral $ 3; input c_date numeral; sas_date = input(c_date,mmddyy10.); number = input(numeral,3.); datalines; 11/12/1950 123 Listing of Data Set SPECIAL 9-15-2004 99 c_date numeral sas_date number ;
11/12/1950 9-15-2004 123 99 -3337 16329 123 99

The PUT Function


data special; ***PUT is a special function often used for numeric to character conversion; input sas_date number ss; c_date = put(sas_date,date9.); money = put(number,dollar8.); ss_char = put(ss,ssn.); datalines; 0 1234 123456789 ;
Listing of Data Set SPECIAL sas_date 0 number 1234 ss 123456789 c_date 01JAN1960 money $1,234 ss_char 123-45-6789

DATA SUBSTRING; INPUT ID $ 1-9; LENGTH STATE $ 2; STATE = SUBSTR(ID,1,2); NUM = INPUT(SUBSTR(ID,7,3),3.); DATALINES; NYXXXX123 NJ1234567 ; PROC PRINT DATA=SUBSTRING NOOBS; TITLE 'Listing of Data Set SUBSTRING'; RUN;

Output Dataset: Listing of Data Set SUBSTRING

ID STATE NUM NYXXXX123 NY 123 NJ1234567 NJ 567

Converting Multiple Blanks to a Single Blank


data multiple; input #1 @1 Name $20. #2 @1 Address $30. #3 @1 City $15. @20 State $2. @25 Zip $5.; name = compbl(name); address = compbl(address); city = compbl(city); datalines; Ron Cody 89 Lazy Brook Rd. Flemington NJ 08822 Bill Brown 28 Cathy Street North City NY 11518 ;

Multiple
Name Ron Cody Bill Brown City Flemington North City Address 89 Lazy Brook Rd. 28 Cathy Street State NJ NY Zip 08822 11518

How to Remove Characters from a String


data phone; input phone $15.; phone1 = compress(phone); phone2 = compress(phone,'(-) '); datalines; (908)235-4490 (201) 555-77 99 ; Phone
phone (908)235-4490 (201) 555-77 99 phone1 (908)235-4490 (201)555-7799 phone2 9082354490 2015557799

Another COMPRESS Example


data social; input ss_char $11.; ss = input(compress(ss_char,'-'),9.); easy_ss = input(ss_char,comma11.); datalines; 123-45-6789 ;

ss = 123456789 (numeric) ss_easy = 123456789 (numeric)

The VERIFY Function


data verify; input @1 id $3. @5 answer $5.; position = verify(answer,'abcde'); datalines; Verify 001 acbed 002 abxde id answer position 003 12cce 001 acbed 0 004 abc e 002 abxde 3 003 12cce 1 ;
004 abc e 4

Watch Out for Trailing Blanks


data trailing; length string $ 10; string = 'abc'; position = verify(string,'abcde'); run;
String = 'abc ' Position = 4 (the position of the first trailing blank)

Function: TRIM
Purpose: To remove trailing blanks from a character value. Syntax: TRIM(character-value) Examples For these examples, STRING1 = "ABC Function TRIM(STRING1) TRIM(STRING2) Returns "ABC" " XYZ"

" and STRING2 = "

XYZ

TRIM("A B C

")

"A B C"
"AB" " " (length = 1)

TRIM("A ") || TRIM("B ") TRIM(" ")

Function: STRIP Purpose: To strip leading and trailing blanks from character variables or strings.
STRIP(CHAR) is equivalent to TRIMN(LEFT(CHAR)), but more convenient. Syntax: STRIP(character-value) Examples For these examples, let STRING = "

abc

Function STRIP(STRING) STRIP(" LEADING AND TRAILING ")

Returns "abc" "LEADING AND TRAILING"

STRIP Function
data _null_; string = ' Testing '; try1 = strip(string); try2 = trim(left(string)); put string= quote12./ try1= quote12./ try2= quote12.; run;

Partial log:
string=" Testing" try1="Testing" try2="Testing"

Watch Out for Trailing Blanks


data trailing; length string $ 10; string = 'abc'; position = verify(trim(string),'abcde'); run;
Position = 0

Using VERIFY for Data Cleaning


data clean; input id $; ***Valid ID's contain letters X,Y, or Z and digits; if verify(trim(id),'XYZ0123456789') eq 0 then valid = 'Yes'; else valid = 'No'; datalines; Listing of Data Set CLEAN 12X67YZ 67WXYZ id valid ; 12X67YZ Yes 67WXYZ No

Substring Example
data pieces_parts; input Id $9.; length State $ 2; state = substr(id,1,2); Num = input(substr(id,7,3),3.); datalines; Listing of Data Set PIECES_PARTS NYXXXX123 NJ1234567 Id State Num ;
NYXXXX123 NJ1234567 NY NJ 123 567

Changing Case
Data case; input name $15.; upper = upcase(name); lower = lowcase(name); proper = propcase(name); Datalines; gEOrge SMITH The end ;
Listing of Data Set CASE name gEOrge SMITH The end upper GEORGE SMITH THE END lower george smith the end proper George Smith The End

The SUBSTR Function on the LeftHand Side of the Equal Sign


data pressure; input sbp dbp @@; length sbp_chk dbp_chk $ 4; sbp_chk = put(sbp,3.); dbp_chk = put(dbp,3.); if sbp gt 160 then substr(sbp_chk,4,1) = '*'; if dbp gt 90 then substr(dbp_chk,4,1) = '*'; datalines; 120 80 180 92 200 110 ;

The SUBSTR Function on the LeftHand Side of the Equal Sign


Listing of Data Set PRESSURE
sbp 120 180 200 dbp 80 92 110 sbp_chk 120 180* 200* dbp_chk 80 92* 110*

Parsing a String
data take_apart; input @1 Cost $10.; Integer = input(scan(Cost,1,' /'),8.); Num = input(scan(Cost,2,' /'),8.); Den = input(scan(Cost,3,' /'),8.); if missing(Num) then Amount = Integer; else Amount = Integer + Num/Den; datalines; Listing of Data Set TAKE_APART 1 3/4 12 1/2 Cost Integer Num Den Amount 123 ; 1 3/4 1 3 4 1.75
12 1/2 123 12 123 1 . 2 . 12.50 123.00

Using the SCAN Function to Extract a Last Name


data first_last; length last_name $ 15; input @1 name $20. @22 phone $13.; ***extract the last name from name; last_name = scan(name,-1,' '); *** minus value scans from the right; datalines; Jeff W. Snoker (908)782-4382 Raymond Albert (732)235-4444 Alfred Edward Newman (800)123-4321 Steven J. Foster (201)567-9876 Jose Romerez (516)593-2377 ;

Using the SCAN Function to Extract a Last Name


Names and Phone Numbers in Alphabetical Order (by Last Name) Name Raymond Albert Steven J. Foster Alfred Edward Newman Jose Romerez Jeff W. Snoker Phone Number (732)235-4444 (201)567-9876 (800)123-4321 (516)593-2377 (908)782-4382

Locating the Position of One String Within Another String


data locate; input string $10.; first = index(string,'xyz'); first_c = indexc(string,'x','y','z'); /*Equivalent indexc(string,'xyz') */ datalines; string first first_c abczyx1xyz 1234567890 abczyx1xyz 8 4 abcx1y2z39 1234567890 0 0 XYZabcxyz abcx1y2z39 0 4 ;
XYZabcxyz 7 7

Locating the Position of One String Within Another String


data locate; input string $10.; first = find(string,'xyz','i'); first_c = findc(string,'xyz','i'); /* i means ignore case */ datalines; string first first_c abczyx1xyz 1234567890 abczyx1xyz 8 4 abcx1y2z39 1234567890 0 0 XYZabcxyz abcx1y2z39 0 4 ;
XYZabcxyz 1 1

Locating One Word in a String Function INDEXW


data _null_; string = 'anything goes any where'; index = index(string,'any'); indexw = indexw(string,'any'); put index= indexw=; run;

index = 1 indexw = 15
Note: You can specify delimiters for indexw in a third argument

Substituting One Word for Another in a String


data convert; input @1 address $20. ; *** Convert Street, Avenue and Boulevard to their abbreviations; Address = tranwrd(address,'Street','St.'); Address = tranwrd(address,'Avenue','Ave.'); Address = tranwrd(address,'Road','Rd.'); datalines; Listing of Data Set CONVERT 89 Lazy Brook Road 123 River Rd. Obs Address 12 Main Street ; 1 89 Lazy Brook Rd. 2 123 River Rd. 3 12 Main St.

Spelling distance
data compare; length string1 string2 $ 15; input string1 string2; points = spedis(string1,string2); datalines; Listing of Data Set COMPARE same same same sam string1 string2 points first xirst last lasx same same 0 receipt reciept same sam 8 ; first xirst 40
last receipt lasx reciept 25 7

The "ANY" Functions


data find_alpha_digit; input string $20.; first_alpha = anyalpha(string); first_digit = anydigit(string); datalines; Listing of Data Set FIND_ALPHA_DIGIT no digits here first_ first_ the 3 and 4 string alpha digit 123 456 789 no digits here 1 0 ;
the 3 and 4 123 456 789 1 0 5 1

The "NOT" Functions Beware of Trailing Blanks


length string $ 10; string = '123'; position = notdigit(string); pos_trim = notdigit(trim(string));

position = 4 (position of first blank) pos_trim = 0

The "NOT" Functions


data data_cleaning; input string $20.; not_alpha = notalpha(trim(string)); not_digit = notdigit(trim(string)); datalines; Listing of Data Set DATA_CLEANING abcdefg 1234567 not_ not_ abc123 string alpha digit 1234abcd ;
abcdefg 1234567 abc123 1234abcd 0 1 4 1 1 0 1 5

Concatenation Functions
data join_up; length cats $ 6 catx $ 17; string1 = 'ABC '; string2 = ' XYZ '; string3 = '12345'; cats = cats(string1,string2); catx = catx('***',string1,string2,string3); run;

cats = 'ABCXYZ' catx = 'ABC***XYZ***12345

Without the length statement, cats and catx would have a length of 200

Some LENGTH Functions


data how_long; one = 'ABC '; miss = ' '; /* char missing value */ 3 length_one = length(one); 3 lengthn_one = lengthn(one); 6 lengthc_one = lengthc(one); 1 length_two = length(miss); 0 lengthn_two = lengthn(miss); 1 lengthc_two = lengthc(miss); run;

The COMPARE Function


COMPARE(string1, string2 <,'modifiers'>)
I ignore case L remove leading blanks : truncate the longer string to the length of the shorter string. The default is to pad the shorter string with blanks before a comparison. (Note: similar to the =: comparison operator)
If string1 and string2 are the same, COMPARE returns a value of 0. If the arguments differ, the sign of the result is negative if string1 precedes string2 in a sort sequence, and positive if string1 follows string2 in a sort sequence The magnitude of the result is equal to the position of the leftmost character at which the strings differ.

The STRIP Function


data _null_; length concat $ 8; file print; one = ' ABC '; two = ' XYZ '; one_two = ':' || one || two || ':'; strip = ':' || strip(one) || strip(two) || ':'; concat = cats(':',one,two,':'); put one_two= / strip= / concat=; run;

one_two=: ABC strip=:ABCXYZ: concat=:ABCXYZ:

XYZ

COUNT and COUNTC Functions


data Dracula; /* Get it Count Dracula */ input string $20.; count_abc = count(string,'abc'); countc_abc = countc(string,'abc'); count_abc_i = count(string,'abc','i'); datalines; xxabcxABCxxbbbb cbacba Listing of Data Set DRACULA ;
string
xxabcxABCxxbbbb cbacba count_ abc 1 0 countc_ abc 7 6 count_ abc_i 2 0

Contact Information
Author: Ron Cody You may download copies of the Powerpoint presentation from: www2.umdnj.edu/codyweb/biocomputing

Anda mungkin juga menyukai