Anda di halaman 1dari 4

| Biostatistics

Image: © Edelweiss – Fotolia.com


SAS and R Team
in Clinical Research
Without data and a formal process of searching for proof to either Adrian Olszewski
support or disprove stated hypotheses, they are nothing but mere at KCR

opinions. Evidence-based medicine is no exception – and a range


of statistical methods is available to evaluate these drugs

Statistical analysis constitutes an method, similarly to ridge regression Employing both of them in cooperation
essential part of all serious scientific or factor analysis. Nowadays, however, may result in tangible benefits for CROs.
research. By conducting controlled statistical methods are becoming more and
experiments, medical, genetics and more advanced, as well as computationally SAS
pharma companies can assess various intensive. Complex trials involving the use Provided by the SAS Institute and well-
objectives, including safety, efficacy, of advanced statistical methods – such established in clinical research, SAS is
tolerability, pharmacokinetics and as nonlinear mixed-effects models, GEE, a reliable and powerful package and,
pharmacodynamics of the simulation-based inference and multiple therefore, recognised as industry standard.
developed therapy. data imputation – require appropriate tools.
The set of implemented statistical method
Over recent years, huge developments SAS and GNU R covers most of the classic and modern
have been made in the field of biostatistics. algorithms. SAS is perfectly suited to
Not so long ago, multi-way ANOVA was Two such tools, SAS® and GNU R, are two of process data sources consisting of millions
found to be a fairly sophisticated statistical the leading packages in statistical analysis. of records, and is a good solution –

18 l November 2016
n
1
( )
x – xi

nh d
∑K h
SAS IML module
i=1

Or different SAS module 1


method of
communication

Required algorithm or functionality SAS base

Missing or
Bidirectional data exchange expensive SAS module 2
functionality

Figure 1

especially for medium- and large- at the CERN, NASA and the National Institute Computing – as well as the entire source
sized teams of analysts in advanced, of Standards and Technology. code of R and all its packages – are publicly
corporate settings. available, the validation is easy to achieve
R is well-known for its flexibility, full-featured (4-7). Every R package has its author or
The software must be bought, and certain language, strong graphical and reporting maintainer, who must follow rules in order to
functionalities are grouped and packed capabilities, and ability to access data in publish the package into the Comprehensive
in modules, which can be purchased multiple formats as well as its capacity of R Archive Network.
separately. Buying the license also grants containing a huge number of statistical
access to a professional helpdesk. methods, stored in nearly 9,000 additional There is also an informal though significant
packages (3). It is a free, general public licensed argument: R is created by professional
GNU R software; however, companies like Microsoft, statisticians and used by over two million
Developed by the R Core Team and supported Oracle and RStudio offer commercial, tweaked individuals (8,9). As the code is open,
by the R Consortium, GNU R (R) is one of versions as well. The support provided by the anyone can verify it. Archival messages in
the most popular and best recognisable broad and vibrant community of both users mailing groups, forums and GitHub prove
computational environments. It is a direct and institutions is comprehensive. both – that new procedures are constantly
successor of the S programming language, checked and improved, while the older ones
founded in 1976 at Bell Laboratories. In 1998, R in Controlled Trials are clean and stable. It is also important to
S became the first statistical system that note that a local, version-frozen repository
received the top award from the Association It is sometimes claimed that R is not of packages can be set up in order to ensure
for Computing Machinery. validated and, as a result, cannot be used reproducibility of results, regardless of
in controlled trials and environments. possible changes in the code (10).
Today, R can be found in almost every area This requires a deeper explanation but, in
of science, such as medicine, pharmacy, brief, it is a common misunderstanding. Last but not least, R has been identified by
genetics, epidemiology, banking, social In fact, every CRO develops processes the FDA as suitable for both interpreting
media, data mining and machine learning. regarding widely understood verification of data from clinical trials, as well as for
In particular, its capabilities in clinical research both the software and written programmes. making submissions (11).
and genomics are remarkable. Many of the Thus, validation is a constant part of creating
top companies – including pharmaceutical programmes, and should be carried out Why Combine SAS and R?
ones – not only use R, but also contribute by regardless of any assurances made by
supplying specialised packages (1,2). the software vendor. Every piece of software has its strengths and
Likewise, R is also popular at universities weaknesses, and so does SAS. There are tasks
and in research departments, where new Since the relevant guidelines released by the that can be completed easier or cheaper by
algorithms are invented – for example, FDA, ICH and the R Foundation for Statistical employing external programmes –

www.samedanltd.com l 19
| Biostatistics

in other words, searching for process and statistical packages, can be easily creation of tables of any complexity – like
cost optimisations is nothing unusual. obtained. Most relational databases are clinical tables, for instance. Most aspects of
supported directly or through the JDBC/ generated objects can be controlled: size,
Most of the top analytical software ODBC interface; Microsoft (MS) Office font, colours, thickness and type of lines,
currently available on the market offer a spreadsheets may be read and modified borders and shadows, to name a few.
way to communicate with R, and SAS is no directly, and web services are consumed via
exception (12). Since 2009, SAS owns the JSON data format. Full support of XML data Plots can be transformed into vector
Interactive Matrix Language (IML) module, format is available, enabling information graphics and become editable and scalable.
which enables bidirectional communication exchange with electronic data capture Produced this way, graphics may be resized
between both packages (13) – but this is systems. With only a few lines of code, to any degree. Existing documents can
not the only way to establish such data can be fetched from a given source, be turned into templates with additional
a connection (14-16). processed locally or sent to SAS. Data can placeholders for tables and graphics.
be exported from SAS in the same way. It facilitates the creation of documents
The following are some exemplary conforming to corporate standards.
scenarios where the cooperation between Scenario 2: Exporting Results
SAS and R seems worthwhile, which SAS owns an advanced Output Delivery Scenario 3: Missing Methods
enhances the present toolkit and System (ODS), which is able to produce Although SAS is full of advanced statistical
significantly reduces costs. professional-looking, rich text format (RTF) methods, it is not unlikely to find one
documents. The RTF is acceptable and offers missing – such as confidence intervals for
Scenario 1: Accessing Data a wealth of formatting options, but there the difference of two proportions. It can
SAS provides a wide set of modules for may be a requirement to prepare a native be programmed in SAS, but R owns the
data access, commonly named ‘SAS/ MS Word document instead. This option is ExactCIdiff package that implements this
ACCESS’. The modules perform very well, currently missing in ODS. SAS, on the other method (17). The code calling required
although they must be purchased. They hand, provides an add-in for MS Office – but functions will be just another statement
are worth considering when dealing with this is a part of a separate module, the SAS in an existing SAS programme.
huge data volumes, but in clinical research, Platform for Business Analytics.
it is common to process relatively small The gsDesign package, which helps to
datasets. It seems reasonable to have a This is the place where R can be used due derive group sequential designs and
closer look at the data access capabilities to its advanced reporting capabilities. describe their advanced properties, makes
offered by R. ReporteRs is a package deserving special another example of a specialised R module
attention, as it allows the user to produce called from SAS (2). Another example is the
In R, data from SAS (native and transport native MS Word and MS PowerPoint use of R as a connector between SAS and
formats), as well as from many other documents. The package supports the the ADMB package applied for automatic

Image: © agsandrew – Fotolia.com

20 l www.samedanltd.com
differentiation, used for advanced,
nonlinear statistical modelling (18).

Scenario 4: Validation
As a lightweight and fully portable
Regardless of the fact that there is no software, where installation is not required and
requirement for validation of critical parts
of statistical programmes to be done in a which works on various operating systems
different statistical package, our experience and architectures, R is a good candidate for
show that following this route may leverage
quality of the validation. Changing the a framework used to create advanced
way of thinking, forced by using a different
programming language, may alter the
statistical solutions
perspective and reception of validation
instructions, which helps to detect
and resolve issues.
• L ocal, handy windows-based 5. Visit: www.fda.gov/ohrms/dockets/
Scenario 5: Advanced Graphing analytical tools 98fr/5667fnl.pdf
There is an advanced graphing subsystem • Web-enabled reporting systems 6. Visit: www.fda.gov/regulatoryinformation/
in SAS called SAS/GRAPH, which can and dashboards guidances/ucm085281.htm
produce high-quality plots. Opinions 7. Visit: www.r-project.org/doc/R-FDA.pdf
vary, but some programmers describe Key Considerations 8. Visit: www.r-project.org/foundation/board.html
the process as slightly complex. In such a 9. Visit: www.oracle.com/technetwork/database/
situation, it is worth noting that R is capable SAS and R are two different worlds, so options/advanced-analytics/r-enterprise/
of producing advanced and professional- connecting them may result in issues bringing-r-to-the-enterprise-1956618.pdf
looking graphs by using the famous significantly affecting the results of the 10. Visit: https://mran.microsoft.com/documents/
ggplot2 package, an implementation analysis. Some fundamental discrepancies are: rro/reproducibility
of ‘Grammar of Graphics’. There are also 11. Visit: http://blog.revolutionanalytics.
other graphing subsystems available. • Origin of dates com/2012/06/fda-r-ok.html
• Representations of floating point numbers 12. Visit: https://support.sas.com/rnd/app/studio/
Scenario 6: Exposing Results • Used sum of squares Rinterface2.html
R may help to expose results of analyses • Default contrasts 13. Visit: http://support.sas.com/documentation/
done in SAS in a network. There are three • Calculation of quantiles cdl/en/imlug/63541/HTML/default/viewer.
packages that are able to simplify the • Generation of random numbers htm#r_toc.htm
process: the first is knitr, a general-purpose • Implementation of advanced models 14. Visit: www.jstatsoft.org/article/view/
package for dynamic report generation v046c02/v46c02.pdf
by following the reproducible research All of these must be taken into account 15. Visit: www.lexjansen.com/nesug/nesug12/
paradigm. Reports are created by mixing when integrating both systems or bb/bb10.pdf
formatted content with chunks of R, SAS validating the result of analyses. 16. Visit: www.phuse.eu/download.
and SQL codes. When processed, results aspx?type=cms&docid=2847
replace the commands or are coalesced. As a summary, it can be said that the 17. Visit: https://journal.r-project.org/
Meanwhile, the OpenCPU and Shiny integration of SAS with R packages, when archive/2013-2/wang-shan.pdf
packages help to constitute a full featured done properly, may bring noticeable benefits 18. Visit: www.admb-project.org
web server that is able to host dynamic web in terms of enhanced functionality and
applications and reports. reduced costs. The scenarios shown above
do not exhaust the list, which is limited Adrian Olszewski is a
Scenario 7: R-Based Tools mostly by one’s experience and invention. Biostatistician in the
As a lightweight and fully portable Biometrics and Clinical Trial
software, where installation is not required References Data Execution Systems
and which works on various operating 1. Visit: http://blog.revolutionanalytics. Department at KCR. He is
systems and architectures (including com/2014/05/companies-using-r-in-2014.html responsible for providing
ARM-based minicomputers), R is a good 2. Visit: www.cioreview.com/news/gsdesign- comprehensive support for trials from early
candidate for a framework used to create explorer-to-optimize-merck-s-clinical-trial- design considerations, through the data
advanced statistical solutions, such as: process-nid-1305-cid-36.html analysis – including interim evaluations – to
3. Visit: www.r-clinical-research.com the final report. Adrian holds an MSc degree
• Automated processes searching 4. Visit: www.fda.gov/ohrms/dockets/98fr/04d- in Computer Science.
a database for potential frauds 0440-gdl0002.pdf Email: info@kcrcro.com

www.samedanltd.com l 21

Anda mungkin juga menyukai