Anda di halaman 1dari 3

Saving Trees: Managing the Edit Check Process Electronically

Tim Kelly, Synteract, Inc., Ambler, PA

Abstract Edit Check Types

The whole process of cleaning clinical trials data can Along with the edit check number is a definition of the
be very time consuming and frustrating. A computer can Edit Check Type:
be better, more accurate, and serve as “eyes” for cleaning
data. Without a validated procedure for storing and SAS Programmed
retrieving the edit check failures, a lot of energy is spent SAS/MAN Programmed, but must be reviewed
reviewing old and known issues multiple times. Due to to look at other things on CRF to
ensure it really is an issue
the iterative nature of data cleaning, managing this process
LC Logical Change
electronically can save time and trees. Organizing this
DC Data Convention, not programmed
process programmatically will enable data management to should be the way the data are
better use their time on the current issues, link the failures entered
to queries, and generate reports on outstanding issues. CRA/LST Client-requested reports for the
This paper will focus on a technique used for managing CRA’s to review
the edit check failure information and the many incentives DMS/LST Reports of checks that are
it provides. programmatically infeasible
Figure 2

All Edit check program names should be created using


Edit Check Document a predefined naming convention, i.e. beginning with
“EC_” followed by the two character EC Category as
The Edit Check process begins with the creation of the noted in the Edit Check Document. This will make it easy
Edit Check document. This document drives the edit to locate the program from the document. Exception:
check programming process. The document is created so Global checks and special across data set checks may be
that the checks are noted for each data set in the order that named differently (%include a common program
data set is encountered on the Case Report Form (CRF). containing global checking macros, etc.)
Typically, Global* edit checks are identified first, then the
ensuing data set edits will follow. Each type of check has Edit Check Failure Data set names
edit check numbers assigned as well as an edit check type.
The CRF issues describe the actual problems that might When programming the edit check failures, each
occur in the database (on the CRF). There is sample program named “EC_xx.SAS” should create a SAS data
query text included in this edit check document as well. set with the name “ECxx.SD2”. The xx in the filename
must match the two character EC Category noted in the
Edit Check Document…and the failures generated from
Edit Check Numbers that program should be the ones starting with those same
Below is an example of how the edit checks can be two characters.
created:
Edit Check Program Output
EC Category This is a two character representation of
where on the CRF the issue exists (e.g. The output from the edit check programs, as mentioned
Global checks: GL; Inclusion: IN;
above, will be contained in data sets named “ECxx.SD2”.
Exclusion: EX; Demography: DE, etc. These
categories must be unique by panel/data set The variables inside the failure data sets are essential for a
EC This is a three-digit number (usually no good Edit check process. To be consistent, all the failure
Number decimal places) following the EC category data sets should have the same structure, so the
code. These are numbered in order based on information, regardless of which failures need to be
the issues in the edit check document. These looked at, can be easily extracted. Shown below in Figure
numbers must be unique within the 3 is the structure of a failure data set
panel/data set
EC This is a description of what the actual edit
Description check is doing
Figure 1

* Global edit checks are edits that are common across the study in all or most data sets. The edits are included in a separate
section named Global, instead of having the edits in every data set.
# Var Type Len Pos Label why an edit will fail, and even more as to what needs to be
1 ECNUM Char 10 0 EDIT CHECK NUMBER done to fix it. Having this process automated will cut
2 ECKEY Char 200 10 EDIT CHECK KEY FIELDS
3 INV Char 9 210 STUDY NUMBER down on time and paper in the long run, besides saving
4 PAT Char 4 219 PATIENT NUMBER the sanity of your data management team.
5 VISIT Num 8 223 VISIT NUMBER
6 PGNO Char 4 231 PAGE NUMBER
Figure 3 The next step to managing the failures is to set up an
application. This utility has many uses. The first being
The contents in Figure 3 show that there are only 6 the capability to give each failure a Status Code. The
variables being kept. These variables describe the status code will provide a label for each specific failure.
pertinent information for the record that failed. First there If a failure is queried, then a Queried Status can be
is the Investigator, Patient, Visit, and Page Number to applied along with the Query Number. If it is a self-
identify exactly who is failing and where. The other two evident correction (or data entry error), or it’s an issue
variables kept in this failure data set are the ECNUM that has been looked at but cannot be changed, then those
field, which just simply states the Edit Check number (i.e. status’s can be classified as well. Another utility will
IN001), and the ECKEY field. The ECKEY field is enable data management to generate reports for all the
determined by Data Management and Programmers in outstanding issues (failures), another for all the failures
order to have all the necessary information for reviewing that have been queried or requeried, and all the
the specific Edit Check Failure. The ECKEY value irresolvable issues, plus many more. It can also be used to
includes the key variables and their values from the produce a listing of all the failures by visit, investigator,
clinical data, which made the edit check fail in the first patient, or edit check number. There are many roles this
place. All these fields together are what make each record application performs. These functionalities are very
unique. For example if a patient is failing an Edit Check useful once the process of getting the failures into the
which states “Every Patient must have a valid final status system takes place. This process will be described in
on the Clinical Summary page” (CS006), then the key detail in the next section.
variables (ECKEY) to this failure would be the final status
variable. The failure data set would have a record that
looks like this: Edit Check Master Process
ECNUM ECKEY INV PAT VISIT PGNO Now, that the edit check programs have been verified
CS006 FINLSTAT=7 100 0001 7 63 against the edit check document and the failure data sets
Figure 4
have been created, some data changes are inevitable.
Once the data is changed, then some of the failures may
The table in Figure 4 shows that the Investigator
no longer come out, or maybe the change made will
Patient combination 100/0001 is failing an edit check on
invoke some other edit checks to fail. Since the edit check
page 63 for visit 7, and the key variable information for
programs will need to be rerun several times during the
edit check failure CS006 is the Final Status Field, which is
course of a study, the failures that have already been
displayed along with it’s value. Some other edit checks
addressed don’t need to be displayed again. This is
may have multiple key variables that are pertinent to a
where the Master data set is set up and the Status Code
specific failure, that’s why the ECKEY field was created
comes into play. The EC_MASTR data set holds all the
with a length of 200, to allow for those circumstances.
information from each edit check failure data set along
with a few other variables: STATUS, QNUM, FAILDT.
These are the Status Code, Query Number, and Failure
Edit Check Planning Date variables, respectively. The Status Code variable
can have many values:
Normally a study will have several data sets, and
within each of these data sets you will have a cluster of Status Code Values:
edit checks to make sure the data is clean. This can lead
to several hundred edit checks across all the data sets. A = Addressed
When these edit check programs are run, even an average I = Irresolvable
size study can have an over-abundance of failures, even if Q = Queried
the data is only somewhat dirty. R = Query Resolved
O = Outstanding Issue
What needs to be done is to create a way to manage the
edit check failures so as to minimize the time looking at The SC (Status Code) of ‘A’ for Addressed, means that
old and new issues. There are some failures that will Data Management has looked at the issue, and realized
require being queried, self evidently corrected, or may be that it does fail the edit check, but the data is actually
addressed and determined to be irresolvable based on valid and will not change. An SC of ‘I’ for Irresolvable,
information found on the CRF. There are many reasons
means that a Query was sent out and came back in the EC_MASTR data set. This enables data
determining that the request is not possible or the management to still only have to modify the Query
information needed is not accessible, so the data will not information in one place. This needs to be done to ensure
change. An SC of ‘Q’, means that the issue has been that the current information is being used because one part
Queried and has not come back yet, for cross referencing of the EC Master process is to generate a report for all the
purposes, the QNUM variable will be populated with the failures that have resolved queries attached to them, but
actual query number that was sent for this issue. An SC of are still failing.
‘R’, means that the issue has been queried and came back,
but the record is still failing, this would also have the
QNUM field populated. The last SC of ‘O’ is for the Conclusion
outstanding issues that have not been looked at yet. The
STATUS variable is set to ‘O’ at the time it is loaded into The EC Master process will continue until all the
the EC_MASTR data set. The other variable inside records in the EC_MASTR data set are no longer
EC_MASTR is FAILDT, which is the date the record first outstanding, there are no more unresolved queries, and
failed. there are no new failures. Once this happens, the edit
check process is complete and the EC_MASTR data set
The other six variables inside EC_MASTR are the has all the information for every failure that occurred in
ones coming from the individual failure data sets case a future inquiry is needed. Having this information
(ECNUM, ECKEY, INV, PAT, VISIT, PGNO). These stored electronically along with a good application gives
six fields define the uniqueness of a failure and determine the users the opportunity to rerun the edit check programs
whether or not the failure already exists in EC_MASTR. and ensure there are not any new or even old issues
The STATUS variable is used to determine whether the outstanding. The old way of managing the edit check
failure should be reloaded or not. New failures, as failures manually would still yield some failures even
determined by the aforementioned six variables will be when the data is clean. This causes wasted time
loaded with a status of ‘O’. When rerunning the edit overlooking known issues that will not ever be changed.
checks and then comparing the current failures with the
EC_MASTR data set, records that are no longer failing This whole process is long and tedious, but having a
and still have a status of ‘O’ in EC_MASTR will be status on each failure eliminates looking at the same issues
deleted (as that implies the data has been updated for one multiple times, thus reducing the time it takes to clean the
reason or another and it is no longer failing). Already data in the long run and saving some trees along the way.
existing failures will not be reloaded and will retain their
present information in EC_MASTR. If there is a record
that is failing, but resides in EC_MASTR with a STATUS
of ‘R’, it will not be reloaded, but rather be output in a Acknowledgements
report during the load new failure process, to show that
the issue has a resolved query attached to it, but is still an I would like to thank the following Synteract folk for
issue. their input and making this whole process work: Daphne
Ewing, Russell Holmes, Brian Shilling, Clark Roberts,
and Barb McLaughlin.
Updating Status Code Values
The application is now ready and the edit check References
programs have been run which created the failure data
sets. The EC_MASTR process is invoked for the first SAS INSTITUTE, INC., SAS Language:
time, loading all the failures into the EC_MASTR data set Reference, Version 6, First Edition, Cary, NC.
with all the status codes set to ‘O’. The next step is to
start modifying the Status codes. The application is set up
so there is a SAS/AF screen for Data Managers/Reviewers Author Information
to modify the status codes of the failures. Besides
manually changing the status codes, they can be altered Tim Kelly
one other way. After the Master process is run for the voice:(215)-283-9370 FAX: by request
first time, each ensuing run will update the status codes of e-mail: tkelly@synteract.com
‘Q’ prior to comparing the new failures with what already
exists in the EC_MASTR data set. All the status codes of
‘Q’ will be merged with the Query file, to get the most up
to date Query information. If the Query is resolved in the
Query file then the status code will be changed to an ‘R’

Anda mungkin juga menyukai