Anda di halaman 1dari 2

NESUG 15 Coders' Corner

Get With the Validation Program –


Successful Steps to Edit Check Failures
Michael Rea, Synteract, Inc, Ambler, PA

ABSTRACT In order for the Data Management team to clean the


database of errors, they must first know the source
Database Validation is in its own class in the SAS of the problem. This is where ECNUM and ECKEY
programming arena. Some edit checks are very come into play. When an edit check produces a
simple to implement, others can be very failure, it is beneficial to provide output that will
complicated. The logic is very different when pinpoint which edit check failed and why. These two
compared to other SAS applications such as variables can be included in the output for every
programming tables or listings. This can only be failure that occurs in a validation program.
improved upon with hands-on experience. Having
more tools at your disposal can be very helpful ECNUM is a variable that stores a code that
however. corresponds to an entry in the validation plan (edit
check specification document). For example, if the
This paper will cover how to approach some of the edit check number (ECNUM) reads “AE011”, this
more complicated issues that arise in edit check indicates that data exists that violates entry number
programming. The goal of the paper is to give 11 in the Adverse Event (AE) section of the
beginning programmers or programmers new to validation plan.
database validation additional tools to program edit
checks efficiently. These techniques will be ECKEY is a character variable that is created during
demonstrated through a coding example. the programming process that stores the detail for
why the failure exits. Typically, the ECKEY value
TEST PATIENTS will contain information that makes that failure
unique along with additional variable values that
If it is possible for Data Management to provide you caused the problem.
with test patients, it will help smooth out the entire
validation process. Test patients contain purposely FIRST. AND LAST.
faulty data and are specifically designed to fail one
or several edit checks. This can aid you in Sorting appropriately for each data set is critical to
determining if your edit checks are working and finding errors in a database. Once this is done, the
failing as they were intended. It can also assist the BY statement can be utilized in the data step. Every
Data Management team in correctly identifying a variable included in the BY statement gets assigned
problem in the database and give them something two temporary variables, FIRST.<byvar> and
concrete to look at before taking on the actual data. LAST.<byvar>. The values assigned are true (1) or
It may take some time to design and enter these false (0) on each observation of the data set being
patients into a database, but the pros usually processed. They are not shown in the output, but
outweigh the cons in this situation. It reduces the are available for manipulation within the data step.
“trial and error” aspect of the process. These are most commonly used in IF or IF-THEN
statements.
ECNUM AND ECKEY
If an observation is the first value of the by variable,
The main purpose of edit check programs is to serve then FIRST.byvar is set to 1 (which is true). If the
as a database clean-up tool for the Data observation is the last value of the by variable, then
Management team. The output generated from LAST.byvar is set to 1 (true). If the observation is
these programs tells them one of two things. In one neither the first nor the last value of the by variable,
case, failures may be the result of data entry error, both temporary variables are set to 0 (FIRST.byvar
which is fixed in-house. Otherwise, the investigator and LAST.byvar). FIRST.byvar and LAST.byvar will
did not fill out the CRF according to the protocol and be true (1) only one time for each by variable value.
thus a data clarification query has to be sent to the It is also important to note that if there is only one
site to resolve the problem. observation in a by variable, both FIRST.byvar and
LAST.byvar will be 1( true).

1
NESUG 15 Coders' Corner

RETAIN STATEMENT It is not uncommon to need to modify your validation


programs after they have been written. This might
By default SAS automatically sets all variables happen because data management will not want a
values to missing at the start of every iteration. The check to fail exactly as it was written in the validation
retain statement provides the programmer with the plan. It is also foreseeable that you might not have
ability to compare values of specified variables written a check correctly. To help find these
across observations in a dataset. It can assign an discrepancies, a PROC FREQ statement can come
initial value to variables, and can hold the value of a in handy. After running your batch program, simply
variable throughout the entire iteration of the data type the code.
step, or you can change the value at any point, and
it will continue retaining until you change it again. PROC FREQ DATA = EDITCHK.ALL;
TABLES ECNUM / LIST MISSING;
RUN;
EXAMPLE
…where EDITCHK is the libname pointing to the
Now that we have an idea of what the retain statement is folder where ALL.SD2 (your permanent compilation
and what the temporary variables FIRST.byvar and dataset) resides.
LAST.byvar are, let us look at them in an example to show
their potential in validation programs. This example will The output from this code will provide a list of codes
check to see if the “LAST PAGE” box was checked in a for validation entries and the number of occurrences
series of pages, and that it was truly the last page in the within the ALL.SD2 dataset. These occurrences can
series. then be compared with the validation plan. When
looking at this listing you can find outliers that may
DATA ECAE (KEEP= PATNUM ECNUM ECKEY); not be failing at all, or much too often alerting you to
LENGTH ECNUM $5 ECKEY $200;
SET AE;
a potential programming issue. This is a good place
BY PATNUM PAGENO; to start searching for possible mistakes or
RETAIN LASTPG PGNO FLAG011 .; discrepancies within the validation program.
IF (UPCASE(LAST) EQ 'Y') THEN LASTPG=1;
IF (UPCASE(LAST) EQ 'Y') AND SUMMARY
(LAST.PAGENO) AND NOT (LAST.PATNUM)
THEN DO; There are many more issues that you will encounter
FLAG011=1; PGNO=PAGENO;
END; when writing validation programs other than the
IF (LAST.PATNUM) THEN DO; ones I have presented here. Hopefully the
IF (LASTPG NE 1) THEN DO; techniques I have described will help in tackling
ECNUM = 'AE011'; more varied validation programming tasks.
ECKEY = "NO LAST PAGE WAS"||
" CHECKED, FINAL PAGENO="|| REFERENCES
TRIM(LEFT(PUT(PAGENO,8.)));
OUTPUT;
END; SAS Language Version 6 First Edition.
IF (FLAG011 EQ 1) THEN DO;
ECNUM = 'AE011'; SAS is a registered trademark of the SAS Institute
ECKEY = "LAST PG CHECKED,"|| Inc., Cary, NC, USA.
" NOT LAST PAGE, PAGE CHECKED="||
TRIM(LEFT(PUT(PGNO,8.)));
OUTPUT; ACKNOWLEDGEMENTS
END; LASTPG=.; PGNO=.; FLAG011=.;
END; I would like to thank Tim Kelly and Daphne Ewing for
RUN; their input regarding the paper and for Synteract for
giving me the opportunity to present at NESUG.
PROC FREQ
AUTHOR INFORMATION
After you have written all of your validation
programs, it is best to %include them all in a batch The author of this paper can be contacted as
program. It is best to call this program the same follows:
thing across all your projects, for example RUN_EC.
This program will run each of your programs and can Michael Rea, Synteract, Inc.
be written to combine all of the edit check failures Voice: (215) 283-9370
data sets into a permanent dataset. This permanent FAX: (215) 283-9366
data set should also be named consistently across EMS: MRea@synteract.com
projects, for example ALL.SD2.