ENAR 2013
Spring Meeting
March 10– 13
E NT
ES POSTER PR
1c. SEMIPARAMETRIC PROPORTIONAL RATE
REGRESSION FOR THE COMPOSITE ENDPOINT
OF RECURRENT AND TERMINAL EVENTS
Lu Mao*, University of North Carolina, Chapel Hill
Danyu Lin, University of North Carolina, Chapel Hill
ENAR 2013 Analysis of recurrent event data has received tremendous
S|
the overall covariate effects, we consider the composite
March 10 – 13 CT endpoint of recurrent and terminal events and propose
A
a proportional rate model which specifies that (possibly
ABS
on the marginal rate function of the composite event pro-
cess. We derive appropriate estimators for the regression
parameters and the baseline mean function by modifying
the familiar inverse probability weighting technique. We
show that the estimators are consistent and asymptoti-
cally normal with variances that can be consistently
estimated. Simulation studies demonstrate that the
proposed methods perform well in realistic situations.
An application to the Community Programs for Clinical
1. POSTERS: CLINICAL TRIALS AND 1b. INTERACTIVE Q-LEARNING FOR DYNAMIC Research on AIDS (CPCRA) study is provided.
TREATMENT REGIMES
STUDY DESIGN Kristin A. Linn*, North Carolina State University email: lmao@unc.edu
Eric B. Laber, North Carolina State University
1a. OPTIMAL BAYESIAN ADAPTIVE TRIAL
Leonard A. Stefanski, North Carolina State University
OF PERSONALIZED MEDICINE IN CANCER
Yifan Zhang*, Harvard University 1d. DETECTION OF OUTLIERS AND INFLUENTIAL
Forming evidence-based rules for optimal treatment POINTS IN MULTIVARIATE LONGITUDINAL
Lorenzo Trippa, Harvard University, Dana-Farber
allocation over time is a priority in personalized medicine MODELS
Cancer Institute
research. Such rules must be estimated from data col- Yun Ling*, University of Pittsburgh School of Medicine
Giovanni Parmigiani, Harvard University, Dana-Farber
lected in observational or randomized studies. Popular Stewart J. Anderson, University of Pittsburgh
Cancer Institute
methods for estimating optimal sequential decision rules School of Medicine
from data, such as Q-learning, are approximate dynamic Richard A. Bilonick, University of Pittsburgh
Clinical biomarkers play an important role in personalized
programming algorithms that require modeling non- School of Medicine
medicine in cancer clinical trials. An adaptive trial design
smooth transformations of the data. Postulating a simple, Gadi Wollstein, University of Pittsburgh
enables researchers to use treatment results observed
well-fitting model for the transformed data can be diffi- School of Medicine
from early patients to aid in treatment decisions of later
cult, and under many simple generative models the most
patients. We describe a biomarker-incorporated Bayesian
commonly employed working models—namely linear In clinical trials, multiple characteristics of individuals
adaptive trial design. This trial design is the optimal
models—are known to be misspecified. We propose an are repeatedly measured. Multivariate longitudinal data
strategy that maximizes the total patient responses. We
alternative strategy for estimating optimal sequential allow one to analyze the joint evolution of multiple char-
study the effects of the biomarker and marker group
decision rules wherein all modeling takes place before acteristics over time. Detection of outlier and influential
proportions on the total utility and present comparisons
applying non-smooth transformations of the data. points for multivariate longitudinal data is important to
between the optimal trial design with other adaptive
This simple change of ordering between modeling and understand potentially critical multivariate observations
trial designs.
transforming the data leads to high quality estimated which can unduly influence the results of analyses. In this
sequential decision rules. Additionally, the proposed presentation, we propose a new approach that extends
email: yifanzhangyifan@gmail.com
estimators involve only conditional mean and variance Cook’s distance to multivariate mixed effect models,
modeling of smooth functionals of the data. Consequent- conditional on different characteristics and subjects. Our
ly, standard statistical procedures for exploratory analysis, approach allows different types of outliers and influential
model building, and validation can be used. Furthermore, points: it could be one or more measurements on an
under minimal assumptions, the proposed estimators individual at a single time point, or all measurements
enjoy simple normal limit theory. on that individual over time. Our approach also takes
email: kalinn@ncsu.edu
Abstracts 2
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
Holm (1979). The Hommel (1988) procedure is even 2. POSTERS: BAYESIAN METHODS /
S
AB
Abstracts 4
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
2i. EFFICIENT SAMPLING METHODS FOR Due to the existence of strong correlations between
Current methods for inference of somatic DNA aberrations
MULTIVARIATE NORMAL AND STUDENT-t expression levels of different genes, the procedures which
in tumor genomes using SNP DNA microarrays require
DISTRIBUTIONS SUBJECT TO LINEAR are commonly used to detect the genes differentially
sufficiently high tumor purities (10-15%) to detect an
CONSTRAINTS expressed between two or more phenotypes are unable
increase in the variation of particular features of the
Yifang Li*, North Carolina State University to overcome the two main problems: high instability of
microarray, such as the B allele frequency or total allelic
Sujit K. Ghosh, North Carolina State University the number of false discoveries and low power. It may be
intensity (logR ratio). By incorporating information from
impossible to completely understand these correlations
the germline genome, we are able to detect aberra-
Sampling from a truncated multivariate normal distribu- due to the complexity of their biological nature. We have
tions at vastly lower tumor proportions (e.g. 3-4%). Our
tion subject to multiple linear inequality constraints is a proposed a new multiple testing method to balance type
likelihood-based approach integrates a hidden Markov
recurring problem in many areas in statistics and econo- $I$ and type $II$ errors in an optimal, in a sense, way.
model (HMM) for population haplotype variation for the
metrics, such as the order restricted regressions, censored However, the correlation structure of microarray data is
germline genome (Scheet & Stephens, AJHG 78:629,
models, and shape-restricted nonparametric regressions. still the main obstacle standing in the way of this and
2006) with an HMM for DNA aberrations in the tumor.
However, this is not an easy problem due to the existence other gene selection procedures. To remove this obstacle,
Thus, our approach directly accounts for the perturba-
of the normalizing constant involving the probability of we further improve the statistical methodology by
tions in the data that would be expected from actual
the multivariate normal distribution. In this paper, we es- exploiting the property of low dependency between the
chromosomal-level aberrations. Our method reports
tablish an efficient mixed rejection sampling method for terms of the so-called $\delta$-sequence proposed by
mean allele specific copy number, as well as marginal
the truncated univariate normal distribution. Our method Klebanov and Yakovlev ($2007b$). In this paper, we will
probabilities of aberration types in tumor DNA. We test
has uniformly larger acceptance rates than the popular review and further study the application of the $\delta$-
our method on real and simulated data based on breast
existing methods. Since the full conditional distribution sequence in the selection of the significantly changed
cancer cell line and Illumina 370K array, and identify
of a truncated multivariate normal distribution is also genes. We will examine the use of the $\delta$-sequence
aberration regions of 11Mb length in 3% tumor purities.
truncated normal, we employ our univariate sampling in conjunction with the Bonferroni adjustment and
We expect our method to provide more accurate inference
method and implement the Gibbs sampler for sampling balancing type $I$ and type $II$ errors, and will discuss
of copy number changes in a variety of settings (tumor
from the truncated multivariate normal distribution the results of analysis of some real microarray data. The
profiles, somatic variation). And more generally, our
with convex polytope restriction regions. Experiments comparison with the univariate gene selection method
approach establishes an integrated statistical framework
show that our proposed Gibbs sampler is accurate and will also be discussed.
for studying inherited and tumor genomes.
has good mixing property with fast convergence. Since a
Student-t distribution can be obtained by taking the ratio email: linlin.chen@gmail.com
email: rxia@mdanderson.org
of a multivariate normal distribution and an independent
chi-squared distribution, we can also easily generate this
sampling method to the truncated multivariate Student-t 3d. APPLICATION OF BILINEAR MODELS TO
3b. A PROFILE-TEST FOR MICRORNA MICROARRAY
distribution, which is also a common encountered THREE GENOME-WIDE EXPRESSION ANALYSIS
DATA ANALYSIS
problem. PROBLEMS
Bin Wang*, University of South Alabama
Pamela J. Lescault, University of Vermont
email: yli40@ncsu.edu Julie A. Dragon, University of Vermont
MicroRNA is a set of small RNA molecules mediating gene
Jeffrey P. Bond*, University of Vermont
expression at post-transcriptional/translational levels.
Most of well-established high throughput discovery
The development of biomedical processes that include
platforms, such as microarray, real time quantitative PCR,
the production and interpretation of genome-wide
and sequencing, have been adapted to study microRNA
expression profiles often involves comparison of alterna-
in various human diseases. Analyzing microRNA data
tive technologies. We study two examples in which
is challenging due to the fact that the total number
expression profiles are obtained from each member of a
of microRNAs in humans small and the majority of
set of cellular samples using two different technologies.
microRNA maintains relatively low abundance in the cells.
In the first case the two technologies are alternatives for
The signals of these low-expressed microRNAs are greatly
obtaining purified cell populations from cell mixtures. In
affected by non-specific signals including the background
the second case the two technologies are alternatives for
obtaining gene expression profiles from RNA samples.
Abstracts 6
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
3k. IN SILICO POOLING DESIGNS FOR ChIP-Seq stratification by sequencing depth. In addition, compared
S
AB
CONTROL EXPERIMENTS with our other proposed models, our beta binomial model
Guannan Sun*, University of Wisconsin, Madison with dynamic overdispersion rate was superior. The current
work allows one to characterize the simultaneous or Sunduz Keles, University of Wisconsin, Madison study will aid in analysis of RNA-Seq data for detecting and
independent effects of covariates on reproducibility and exploring biological problems.
to compare reproducibility while controlling for potential As next generation sequencing technologies are becom-
confounding variables. We illustrate this method using ing more economical, large-scale ChIP-seq studies are email: GCAI@mdanderson.org
data from ChIP-seq experiments. enabling the investigation of the roles of transcription
factor binding and epigenome on phenotypic variation.
email: qunhua.li@psu.edu Studying such variation requires individual level ChIP-seq 3m. B INARY TRAIT ANALYSIS IN SEQUENCING
experiments. Standard designs for ChIP-seq experiments STUDIES WITH TRAIT-DEPENDENT SAMPLING
employ a paired control per ChIP-seq sample. Genomic Zhengzheng Tang*, University of North Carolina,
3j. NONPARAMETRIC METHODS FOR IDENTIFYING coverage for control experiments is often sacrificed to Chapel Hill
DIFFERENTIAL BINDING REGIONS WITH ChIP-Seq increase the resources for ChIP samples. However, the Danyu Lin, University of North Carolina, Chapel Hill
DATA quality of ChIP-enriched regions identifiable from a ChIP- Donglin Zeng, University of North Carolina, Chapel Hill
Qian Wu*, University of Pennsylvania School of Medicine seq experiment depends on the quality and the coverage
Kyoung-Jae Won, University of Pennsylvania School of the control experiments. Insufficient coverage leads In sequencing study, it is a common practice to sequence
of Medicine to loss of power in detecting enrichment. We investigate only the subjects with the extreme values of a quantita-
Hongzhe Li, University of Pennsylvania School the effect of in silico pooling of control samples across tive trait. This is a cost-effective strategy to increase
of Medicine biological replicates, treatment conditions, and cell lines power in the association analysis. In the National Heart,
across multiple datasets with varying levels of genomic Lung, and Blood Institute (NHLBI) Exome Sequencing
ChIP-Seq provides a powerful method for detecting coverage. Our empirical studies that compare in silico Project (ESP), subjects with extremely high or low values
binding sites of DNA-associated proteins, e.g. transcrip- pooling designs with the gold standard paired-designs of body mass index (BMI), low-density lipoprotein (LDL)
tion factors (TFs) and histone modification marks (HMs). indicate that Pearson correlation of the samples can be or blood pressures (BP) were selected for whole-exome
Previous research has focused on developing peak-calling used to decide whether or not to perform pooling. Using sequencing. For a binary trait of interest, the standard
procedures to detect the binding sites for TFs. However, vast amounts of ENCODE data, we show that pair wise logistic regression-even adjust for the trait of sampling-
these procedures have difficulty when applied to ChIP correlations between control samples originating from can give misleading results. We present valid and efficient
data of HMs. In addition, it is also important to identify different biological replicates, treatments, and cell lines methods for association analysis under trait-dependent
genes with differential binding regions between two can be grouped into two classes representing whether sampling. Our methods properly combine the association
experimental conditions, such as different cellular states or not in silico pooling leads to power gain in detecting results from all studies and more powerful than the stan-
or different time points. Parametric methods based enrichment between the ChIP and the control samples. dard methods. The validity and efficiency of the proposed
on Poisson/Negative Binomial distribution have been Our findings have important implications for multiplexing methods are demonstrated through extensive simulation
proposed to address this problem and most require multiple samples. studies and ESP real data analysis.
biological replications. However, many ChIP-Seq data
usually have a few or even no replicates. We propose a email: sun@stat.wisc.edu email: ztang@bios.unc.edu
novel nonparametric method to identify the differential
binding regions that can be applied to the ChIP-Seq data
of TF or HM, even without replicates. Our method is based 3l. METHOD FOR CANCELLING NONUNIFORMITY 3n. QUANTIFYING COPY NUMBER VARIATIONS
on nonparametric hypothesis testing and kernel smooth- BIAS OF RNA-seq FOR DIFFERENTIAL EXPRESSION USING A HIDDEN MARKOV MODEL WITH
ing. We demonstrate the method using a ChIP-Seq data ANALYSIS INHOMOGENEOUS EMISSION DISTRIBUTIONS
on comparative epigenomic profiling of adipogenesis Guoshuai Cai*, University of Texas MD Anderson Kenneth McCallum*, Northwestern University
of human adipose stromal cells and our method detects Cancer Center Ji-Ping Wang, Northwestern University
nearly 20% of genes with differential binding of HM mark Shoudan Liang, University of Texas MD Anderson
H3K27ac in gene promoter regions. The test statistics Cancer Center Copy number variations (CNVs) are a significant source
also correlate with the gene expression changes well, of genetic variation and have been found frequently
indicating that the identified differential binding regions Many biases and effects are inherent in RNA-Seq technol- associated with diseases such as cancers and autism.
are indeed biologically meaningful. ogy. A number of methods have been proposed to handle High-throughput sequencing data is increasingly being
these biases and effects in order to accurately analyze used to detect and quantify CNVs; however, the distribu-
email: wuqian7@gmail.com differential RNA expression at the gene level. However, to tional properties of the data are not fully understood. A
precisely estimate mean and variance by cancelling biases hidden Markov model is proposed using inhomogeneous
such as those due to random hexamer priming and non- emission distributions based on negative binomial regres-
uniformity, modeling at the base pair level is required. We sion to account for sequencing biases. The model is tested
previously showed that the overdispersion rate decreases on whole genome sequencing data and simulated data
as sequencing depth increases on the gene level. We tested sets. The model based on negative binomial regression
the hypothesis that the overdispersion rate also decreases is shown to provide a good fit to the data and provides
as sequencing depth increases on the base pair level. In this competitive performance compared to methods based on
study, we found that the overdispersion rate decreased as normalization of read counts.
sequencing depth increased on the base pair level. Also,
we found that the influence of local primer sequence on email: kennethmccallum2013@u.northwestern.edu
the overdispersion rate was no longer significant after
Abstracts 8
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
function is specified in the same manner as for most a simulation-based chi-square test as proposed by Lin
S
AB
phylogenetic models using continuous Markov processes (2005) to adjust for multiple testing. Comparison of these
with variables representing microbiota structure and four statistics in terms of type I error, power, family-wise
data from NGS to detect differentially methylated loci. composition rather than nucleotide sequences. The error rate and computational efficiency under various
Simulations under several distributions for the measured computation is performed through a pruning algorithm scenarios are examined via extensive simulations.
methylation levels show that the proposed method is ro- nested inside a gradient descent based method for pa-
bust and flexible. It has good power and is computation- rameter optimization. With the availability of a likelihood email: fmendolia@mcw.edu
ally efficient. Finally, we apply the test to our NGS data on function, likelihood based inference such as likelihood ra-
chronic lymphocytic leukemia. The results indicate that it tio test can be used to formally test hypothesis of interest.
is a promising and practical test. We illustrated the application of this method with a data 4k. S PARSE MULTIVARIATE FACTOR REGRESSION
set pertaining microbial communities in the rat digestive MODELS AND ITS APPLICATION TO HIGH-
email: hxu@georgiahealth.edu tract under different diet regiments. Different hypotheses THROUGHPUT ARRAY DATA ANALYSIS
regarding microbial community structure were studied Yan Zhou*, University of Michigan
with the proposed approach. Peter X.K. Song, University of Michigan
4g. MIXED MODELING AND SAMPLE SIZE Ji Zhu, University of Michigan
CALCULATIONS FOR IDENTIFYING HOUSEKEEPING email: hxjhelon@gmail.com
GENES IN RT-PCR DATA The sparse multivariate regression model is a useful
Hongying Dai*, Children’s Mercy Hospital tool to explore complex associations between multiple
Richard Charnigo, University of Kentucky 4i. A HIGH DIMENSIONAL VARIABLE SELECTION response variables and multiple predictors. When those
Carrie Vyhlidal, Children’s Mercy Hospital APPROACH USING TREE-BASED MODEL multiple responses are strongly correlated, ignoring such
Bridgette Jones, Children’s Mercy Hospital AVERAGING WITH APPLICATION TO SNP DATA dependency will impair statistical power and accuracy in
Madhusudan Bhandary, Columbus State University Sharmistha Guha*, University of Minnesota the association analysis. In this paper, we propose a new
Saonli Basu, University of Minnesota methodology -- sparse multivariate regression factor
Normalization of gene expression data using internal model (sMFRM), which accounts for correlations of the
control genes that have biologically stable expression As opposed to the existing methodology in high response variables via lower numbers of unobserved
levels is an important process for analyzing RT-PCR data. dimensional regression problems, we propose a novel random quantities. This proposed method not only allows
We propose a three-way linear mixed-effects model tree-based variable selection approach. Our proposed us to address the issue that the number of association
(LMM) to select optimal housekeeping genes. The LMM approach combines different low-rank models, together parameters is much larger than the sample size, but also
can accommodate multiple continuous and/or categorical with model averaging techniques, to yield a model that to account for some unobserved factors that poten-
moderator variables with sample random effects, gene exhibits far less computational time and greater flexibility tially obscure real response-predictor associations. The
fixed effects, systematic effects, and gene by systematic in terms of estimation. Simulation examples show high proposed sMFRM is efficiently implemented by utilizing
effect interactions. Global hypothesis testing is proposed power for the proposed method. We compare our method merits of both the EM algorithm and the group-wise
to ensure that selected housekeeping genes are free of to some of the current existing methods and show coordinate descend algorithm. The proposed methodol-
systematic effects or gene by systematic effect interac- empirically better performance. The proposed approach ogy is evaluated through extensive simulation studies.
tions. Sample size calculation based on the estimation has been validated using SNP data. It is shown that our proposed sMFRM outperforms the
accuracy of the stability measure is offered to help existing methods in terms of high sensitivity and accuracy
practitioners design experiments to identify housekeep- email: sharmistha84@gmail.com in mapping the underlying response-predictor associa-
ing genes. We compare our methods with geNorm and tions. Throughout this paper, we motivate and apply
NormFinder using case studies. our method with the objective of constructing genetic
4j. COMPARISON OF STATISTICS IN association networks. Finally, we analyze a breast cancer
email: hdai@cmh.edu ASSOCIATION TESTS OF GENETIC MARKERS data adjusting for unobserved non-genetic factors.
FOR SURVIVAL OUTCOMES
Franco Mendolia*, Medical College of Wisconsin email: zhouyan@umich.edu
4h. L IKELIHOOD BASED INFERENCE ON John P. Klein, Medical College of Wisconsin
PHYLOGENETIC TREES WITH APPLICATIONS TO Effie W. Petersdorf, Fred Hutchinson Cancer
METAGENOMICS Research Center 4l. BAYESIAN GROUP MCMC
Xiaojuan Hao*, University of Nebraska Mari Malkki, Fred Hutchinson Cancer Research Center Alan B. Lenarcic*, University of North Carolina, Chapel Hill
Dong Wang, University of Nebraska Tao Wang, Medical College of Wisconsin William Valdar, University of North Carolina, Chapel Hill
Interaction dynamics between the micobiota and the In genetic association studies, there is a need for com- In mouse experiments, SNP derived genetics sequences
host have taken on ever increasing importance and are putationally efficient statistical methods to handle the are often encoded and imputed into a sequence of prob-
now frequently studied. But the statistical methodology large number of tests of genetic markers. In this study, abilities representing haplotype group membership (i.e.
for hypothesis testing regarding microbiota structure we explore several tests based on the Cox proportional Black6-derived, Mouse-Castenious-derived, etc.) rather
and composition is still quite limited. Usually, the relative hazards models for survival outcomes. We examine the than 0-1 SNPs, which are identified through a hidden
enrichment of one or more taxa is demonstrated with classical partial likelihood-based Wald and score tests markov model, such as with the Happy algorithm (Mott
contingency tables formed in an ad hoc manner. To and we propose a score statistic which is motivated et al. 2000). Multi-SNP regression must then follow a
provide a formal statistical framework, we have proposed by Cox-Snell residuals to assess the effects of genetic mixed-effects framework, with regression coefficients
a likelihood based approach in which the likelihood markers. Computational efficiency and incorporation of learned at a single SNP representing a set of factors. We
these three statistics into a permutation procedure to implement a sparse Bayesian MCMC framework, built
adjust for multiple testing is addressed. We also consider around dynamic memory structures initially designed
Abstracts 10
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
5f. FRAILTY PROBIT MODEL FOR CLUSTERED provide valid inferences. In a stratified case-cohort design
S
AB
INTERVAL-CENSORED FAILURE TIME DATA with clustered times to tooth extraction in a dental study,
Haifeng Wu*, University of South Carolina, Columbia similar results to those from alternative methods were
5d. STRATIFIED AND UNSTRATIFIED LOG-RANK TEST Lianming Wang, University of South Carolina, Columbia found but were obtained much faster.
IN CORRELATED SURVIVAL DATA
Yu Han*, University of Rochester Clustered interval-censored data commonly arise in many email: steven.chiou@uconn.edu
David Oakes, University of Rochester studies of biomedical research where the failure time of
Changyong Feng, University of Rochester interest is subject to interval-censoring and subjects are
correlated for being in the same cluster. In this paper, we 5h. A CUMULATIVE INCIDENCE JOINT MODEL
The log-rank test is the most widely used nonparametric propose a new frailty semiparametric Probit regression OF TIME TO DIALYSIS INDEPENDENCE AND
method for testing treatment differences in survival model to study the covariate effects on the failure time INFLAMMATORY MARKER PROFILES IN
analysis due to its efficiency under the proportional and the intra-cluster dependence. The proposed normal ACUTE KIDNEY INJURY
hazards model. Most previous work on the log-rank test frailty Probit model enjoys several nice properties that Francis Pike*, University of Pittsburgh
has assumed that the samples from different treatment existing survival models do not have: (1) the marginal Jonathan Yabes, University of Pittsburgh
groups are independent. This assumption is not always distribution of the failure time is a semiparametric Probit John Kellum, University of Pittsburgh
true. In multi-center clinical trials, survival times of model, (2) the regression parameters can be interpreted
patients in the same medical center may be correlated either as the conditional covariate effects given frailty Recovery from Acute Renal Failure is a clinically relevant
due to factors specific to each center. For such data we can or the marginal covariate effects up to a multiplicative issue in critical care medicine. The central goal of the
construct both stratified and unstratified log-rank tests. constant, and (3) the intra-cluster association can be Biological Markers of Recovery for the Kidney study
These two tests turn out to have very different powers for summarized by two nonparametric measures in simple (BIOMARK) was to understand the relationship between
correlated samples. An appropriate linear combination of and closed form. A fully Bayesian estimation method is inflammation and oxidative stress in recovery from Acute
these two tests may give a more powerful test than either developed based on the use of monotone spline for the Renal Failure (ARF) and how intensity of Renal Replace-
individual test. Under a frailty model, we obtain a closed unknown nondecreasing function and our Gibbs sampler ment Therapy (RRT) affects this relationship. To effectively
form of asymptotic local alternative distributions and the is straightforward to implement. The proposed method model this relationship the chosen analytical procedure
correlation coefficient between these two tests. Based on performs well in estimating the regression parameters has to, (i) account for censoring in patient inflammatory
these results we construct an optimal linear combination and is robust to misspecified frailty distributions in our profiles due to the sensitivity of the assays, and (ii) be
of the two test statistics to maximize the local power. simulation studies. Two real-life data sets are analyzed as able to include this information into the survival model
We also consider sample size calculation in the paper. illustrations. whilst accounting for competing terminal events such
Simulation studies are used to illustrate the robustness of as death. To this end we formulated and implemented a
the combined test. email: wuh@email.sc.edu fully parametric cumulative incidence (CIF) joint model
within SAS using NLMIXED. Specifically we combined a
email: Yu_Han@urmc.rochester.edu linear mixed effects Tobit model with a parametric CIF
5g. SEMIPARAMETRIC ACCELERATE FAILURE TIME model proposed by Jeong and Fine to account for the
MODELING FOR CLUSTERED FAILURE TIMES longitudinal censoring and competing risks respectively.
5e. ANALYSIS OF MULTIPLE MYELOMA LIFE FROM STRATIFIED SAMPLING We verified the performance of this model via simulation
EXPECTANCY USING COPULA Sy Han Chiou*, University of Connecticut and applied this method to the BIOMARK study to ascer-
Eun-Joo Lee*, Millikin University Sangwook Kang, University of Connecticut tain if intensity of treatment and inflammatory profiles
Jun Yan, University of Connecticut significantly affect time to dialysis independence.
Multiple myeloma is a blood cancer that develops in the
bone marrow. It is assumed that in most cases multiple Clustered failure time data arising from stratified email: pikef@upmc.edu
myeloma develops in association with several medical sampling are often encountered in studies where it is
factors acting together, although the leading cause of desired to reduce both cost and sampling error. In such
the disease has not yet been identified. In this paper, we settings, semiparametric accelerated failure time (AFT) 5i. AGE-SPECIFIC RISK PREDICTION WITH
investigate the relationship between the factors to mea- models have not been used as frequently as Cox relative LONGITUDINAL AND SURVIVAL DATA
sure multiple myeloma patients’ survival time. For this, risk models in practice due to lack of efficient and reli- Wei Dai*, Harvard School of Public Health
we employ a copula that provides a convenient way to able computing routines for statistical inferences, with Tianxi Cai, Harvard School of Public Health
construct statistical models for multivariate dependence. challenge rooted in nonsmooth rank-based estimating Michelle Zhou, Simon Fraser University
Through an approach via copulas, we find the most influ- functions. The recently proposed induced smoothing
ential medical factors that affect the survival time. Some approach, which provides fast and accurate inferences Often in cohort studies where the primary endpoint is
goodness-of-fit tests are also performed to check the for AFT models, is generalized to incorporate weights time to an event, patients are also monitored longitu-
adequacy of the copula chosen for the best combination that can be applied to accommodate stratified sampling dinally with respect to one or more biological variables
of the survival time and the medical factors. Using the design. The estimators resulting from the induced throughout the follow-up period. A primary goal of such
Monte Carlo simulation technique with the copula, we smoothing weighted estimating equations are consistent studies is to predict the risk of future events. Joint models
re-sample survival times from which the anticipated life and asymptotically normal with the same distribution for both the longitudinal process and survival data have
span of a patient with the disease is calculated. as the estimators from the nonsmooth estimating equa- been developed in recent years to analyze such data.
tions. The variance is estimated by two computationally In the joint modeling framework, risk of future events
email: elee@millikin.edu efficient sandwich estimators. The proposed method is is estimated using Monte Carlo simulations. In most
validated in extensive simulation studies and appears to existing risk prediction models, age is modeled as one of
the standard risk factors with simple effects. However,
for many complex phenotypes such as the cardiovascular
Abstracts 12
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
6b. L ONGITUDINAL ANALYSIS OF THE EFFECT hood estimation. The homogeneity of dynamic transition
S
AB
email: such@umich.edu
Abstracts 14
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
the auditory network, etc. In addition to improving T1-weighted, T2-weighted, fluid-attenuated inversion
S
AB
network estimation, H-gICA allows for the investigation of recovery and proton density volumes from 131 MRI studies
functional homotopy via ICA based networks. with manual lesion segmentations are used to train and
participant self-reports is not identifiable and may be validate our model. Within this set, OASIS detected lesions
informative. Most methods with a misclassified exposure email: juyang@jhsph.edu with an area under the receiver-operator characteristic
require either validation data, replicate data, or an as- curve of 98% (95% CI; [96%, 99%]) at the voxel level. Use
sumption of nondifferential misclassification. We propose of intensity-normalized MRI volumes enables OASIS to be
a pattern-mixture model where none of these is required. 7b. THE ‘GENERAL LINEAR MODEL’ IN fMRI ANALYSIS robust to variations in scanners and acquisition sequences.
Instead, the model is indexed by two user-specified Wenzhu Mowrey*, Albert Einstein College of Medicine We applied OASIS to 169 MRI studies acquired at a sepa-
tuning parameters that represent an assumed level of rate imaging center. A neuroradiologist compared these
agreement between the observed proxy and missing Functional Magnetic Resonance Imaging (fMRI) has seen segmentations to segmentations produced by another
participant responses and can be varied to perform a wide use in neuropsychology since its invention in the software, LesionTOADS. For lesions, OASIS out-performed
sensitivity analysis. We estimate associations standard- early 1990s. Statisticians have made critical contributions LesionTOADS in 77% (95% CI: [71%, 83%]) of cases. For
ized for high-dimensional covariates using multiple along its side. The interdisciplinary nature of an imaging a randomly selected subset of 50 of these studies, one
imputation followed by inverse probability weighting. study makes the learning curve steep for everybody additional radiologist and one neurologist also scored the
Simulation studies show that the proposed method because of the physics, the neuroanatomy and the various images. Within this set, the neuroradiologist ranked OASIS
performs well. imaging processing involved. This presentation is to intro- higher than LesionTOADS in 76% (95% CI: [64%, 88%]) of
duce fMRI data analysis stream surrounding its core - the cases, the neurologist 66% (95% CI: [52%, 78%]) and the
email: mshardel@epi.umaryland.edu general linear model (LM). We will focus on the commonly radiologist 52% (95% CI: [38%, 66%]).
used task data, where subjects perform tasks, for example
finger tapping, in a scanner and the goal is to find the email: emsweene@jhsph.edu
task-related brain activities and to assess its differences
7. POSTERS: IMAGING / HIGH DIMEN- across groups. We start with preprocessing steps including
SIONAL DATA slice timing correction, motion correction, normalization 7d. NONLINEAR MIXED EFFECTS MODELING WITH
and smoothing. Then we perform the subject level (first DIFFUSION TENSOR IMAGING DATA
7a. HOMOTOPIC GROUP ICA level) general linear model analysis, where effects of inter- Namhee Kim*, Albert Einstein College of Medicine
Juemin Yang*, Johns Hopkins University est are extracted from each individual's time series data. of Yeshiva University
Ani Eloyan, Johns Hopkins University We will clarify its differences from the LM in a traditional Craig A. Branch, Albert Einstein College of Medicine
Anita Barber, Kennedy Krieger Institute statistics setting and the role of the hemodynamics of Yeshiva University
Mary Beth Nebel, Kennedy Krieger Institute response function in the model. The last step is the group Michael L. Lipton, Albert Einstein College of Medicine
Stewart Mostofsky, Kennedy Krieger Institute level (second level) analysis, where a simple review on of Yeshiva University
James Pekar, Kennedy Krieger Institute multiple comparison correction will be given.
Brian Caffo, Johns Hopkins University In many statistical analyses with imaging data, a sum-
email: wenzhu.mowrey@einstein.yu.edu mary value approach for each region, e.g. an average
Independent Component Analysis (ICA) is a compu- of observed brain physiology across voxels, has been
tational technique for revealing hidden factors that frequently adopted. However, these approaches ignore
underlie sets of measurements or signals. It is widely 7c. OASIS IS AUTOMATED STATISTICAL INFERENCE FOR spatial variability within each region, and have potential
used in a variety of academic fields such as functional SEGMENTATION WITH APPLICATIONS TO MULTIPLE biases in the estimated parameters. We thus need ap-
neuroimaging. We have devised a new group ICA ap- SCLEROSIS LESION SEGMENTATION IN MRI proaches incorporating individual voxels to reduce biases
proach, Homotopic group ICA (H-gICA), for blind source Elizabeth M. Sweeney*, Johns Hopkins University and enhance efficiency in estimation of parameters.
separation of fMRI data. Our new approach enables us to Russell T. Shinohara, University of Pennsylvania Fractional Anisotropy (FA) from diffusion tensor imaging
double the sample size via brain functional homotopy, Navid Shiee, Henry M. Jackson Foundation (DTI) describes the degree of anisotropy of a diffusion
the high degree of synchrony in spontaneous activity Farrah J. Mateen, Johns Hopkins University process of water molecules, and abnormally low FA has
between geometrically corresponding interhemispheric Avni A. Chudgar, Brigham and Women’s Hospital been consistently found with traumatic brain injury (TBI).
regions. H-gICA can increase the power for finding under- and Harvard Medical School Since soccer heading may form a repetitive mild TBI, we in
lying networks with the existence of noise and it is proved Jennifer L. Cuzzocreo, Johns Hopkins University this study investigate the trajectory of FA over soccer head-
theoretically to be the same as commonly used group Peter A. Calabresi, Johns Hopkins University ing exposures by employing a growth curve. Proposed
ICA when the true sources are perfectly homotopic and Dzung L. Pham, Henry M. Jackson Foundation nonlinear mixed effects (NLME) model includes random
noise free. Moreover, compared to commonly used ICA Daniel S. Reich, National Institute of Neurological Disease effects to account spatial variability of each parameter
algorithms, the structure of the H-gICA input data leads and Stroke, National Institutes of Health of the growth curve across voxels, and adopt simultane-
to significant improvement of computational efficiency. Ciprian M. Crainiceanu, Johns Hopkins University ous autoregressive model to account correlation among
In a simulation study, our approach proved to be effective neighboring voxels. The fitted NLME model was compared
in both homotopic and non-homotopic settings. We also to a nonlinear model which utilizes an average of FA values
show the effectiveness of our approach by its application We propose OASIS is Automated Statistical Inference for per subject. The proposed NLME model is more efficient
on the ADHD-200 dataset. Out of the 15 components Segmentation (OASIS), an automated statistical method and provides additional information on spatial variability
postulated by H-gICA, several brain networks were found for segmenting multiple sclerosis (MS) lesions in magnetic of the estimated parameters of the growth curve.
including: the visual network, the default mode network, resonance images (MRI). We use logistic regression models
incorporating multiple MRI modalities to estimate voxel- email: namhee.kim@einstein.yu.edu
level probabilities of lesion presence. Intensity-normalized
Abstracts 16
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
8b. B AYESIAN PREDICTIVE DIVERGENCE BASED 8d. VARIABLE SELECTION IN MEASUREMENT ERROR
S
AB
MODEL SELECTION CRITERIA FOR CENSORED MODELS VIA LEAST SQUARES APPROXIMATION
AND MISSING DATA Guangning Xu*, North Carolina State University
use in a given situation. To tackle this problem, the Large- Liwei Wang*, North Carolina State University Leonard A. Stefanski, North Carolina State University
margin Unified Machine (LUM) was recently proposed as Sujit K. Ghosh, North Carolina State University
a unified family to embrace both groups. The LUM family A fundamental problem in biomedical research is iden-
enables one to study the behavior change from soft to Bayesian model selection for data analysis becomes a tifying key risk factors and determining their impact on
hard binary classifiers. For multicategory cases, however, challenging task when observations are subject to data health outcomes via statistical modeling. Due to device
the concept of soft and hard classification becomes less irregularities like censoring and missing values. Often a limitations and within-subject variation, some risk factors
clear. In that case, class probability estimation becomes linear mixed effects framework is used to approximate are measured with error, e.g., blood pressure. Ignoring
more involved as it requires estimation of a probability semiparametric models, but some of the popular Bayes- measurement error adversely impacts variable selection
vector. In this paper, we propose a new Multicategory ian criteria like DIC do not work well in choosing among and model fitting and thus complicates the statistical
LUM (MLUM) framework to investigate the behavior of mixed models and there have been ambiguities in defin- modeling. When measurement error is present, popular
soft versus hard classification under multicategory set- ing the deviances for mixed effect models. To illustrate variable selection methods, such as LASSO, ALASSO and
tings. Our theoretical and numerical results help to shed the proposed model selection criteria, first we develop SCAD are not appropriate. We propose a new method
some light on the nature of multicategory classification a flexible class of mixed effects models based on a for variable selection in measurement error models by
and its transition behavior from soft to hard classifiers. sequence of Bernstein polynomials with varying degrees integrating well-established measurement error model-
The numerical results suggest that the proposed MLUM is and propose a predictive divergence based model selec- ing methods with the least squares approximation (LSA)
highly competitive among several existing methods. tion criterion for the fully observed data. We then extend variable selection method of Wang and Leng (2007). The
the model selection criteria to accommodate the data resulting estimators are consistent and asymptotically
email: chongz@live.unc.edu irregularities and develop an importance sampling based normal in the usual case that the measurement error cor-
MCMC method to compute the criteria. Various simulated rected estimator is root-n consistent. The method inherits
data scenarios are used to compare the performance of the oracle property when an adaptive penalty is used and
the proposed model selection methodology with some of the tuning parameter is well selected. The key advantage
8. POSTERS: MODEL, PREDICTION, the popular Bayesian model selection methodologies. The of our new method is that it provides a unified solution to
VARIABLE SELECTION AND newly proposed models and associated model selection the variable selection in measurement error models and
DIAGNOSTIC TESTING criteria are also illustrated using real data analysis. greatly eases computing by using existing algorithms.
8a. PENALIZED COX MODEL FOR IDENTIFICATION email: lwang16@ncsu.edu email: gxu@ncsu.edu
OF VARIABLES’ HETEROGENEITY STRUCTURE
IN POOLED STUDIES
Xin Cheng*, New York University School of Medicine 8c. A TUTORIAL ON LEAST ANGEL REGRESSION 8e. SMOOTHED STABILITY SELECTION FOR ANALYSIS
Wenbin Lu, North Carolina State University Wei Xiao*, North Carolina State University OF SEQUENCING DATA
Mengling Liu, New York University School of Medicine Yichao Wu, North Carolina State University Eugene Urrutia*, University of North Carolina, Chapel Hill
Hua Zhou, North Carolina State University Yun Li, University of North Carolina, Chapel Hill
Pooled analysis that pools datasets from multiple studies Michael C. Wu, University of North Carolina, Chapel Hill
and treats them as one large single set of data can The least angel regression (LAR) was proposed by Efron,
achieve large sample size to allow increased power to Hastie, Johnstone and Tibshirani (2004) for continuous High dimensional data are increasingly common. Dif-
investigate variables’ effects, if homogeneous across stud- model selection in linear regression. It is motivated by a ficulties in model interpretation and limited power to
ies. However, inter-study heterogeneity often exists in geometric argument and tracks a path along which the detect effects have led to the use of variable selection
pooled studies, due to differences such as study popula- predictors enter successively and the active predictors methods, including penalized regression methods such as
tion and sample collection method. To evaluate variables’ always maintain the same correlation (angle) with the the LASSO. Recent advances in the variable selection lit-
homogeneous and heterogeneous structure and estimate residual vector. Although gaining popularity quickly, erature suggest that resampling strategies which include
their effects, we propose a penalized partial likelihood extensions of LAR seem rare compared to the penalty stability selection, complementary stability selection,
approach with an adaptively weighted L1 penalty on methods. In this expository article, we show that the and bolasso, can offer improvements over the LASSO and
variable’s average effects and a combination of adaptively powerful geometric idea of LAR can be extended in a non-resampling based approaches. We show that com-
weighted L1 and L2 penalty on heterogeneous effects. We fruitful way. We propose a ConvexLAR algorithm that mon resampling based methods can be recast as LASSO
show that our method can identify the structure of vari- works for any convex loss function and naturally extends with reweighted observations where weights are discrete
ables as heterogeneous effects, nonzero homogeneous to group selection and data adaptive variable selection. and identically distributed from a specified distribution.
effects and null effects, and give consistent estimation Variable selection in recurrent event and panel count data Sequencing data presents an additional challenge in that
of variables’ effects simultaneously. Furthermore, we analysis, Ada-Boost, and Gaussian graphical model, is many predictors are rare (minor alleles observed in only a
extend our method to high dimension situation, where reconsidered from the ConvexLAR angle. few individuals) and have a high probability of exclusion
the number of parameters diverges with sample size. The in the resampling schemes. Thus, we have developed the
proposed selection and estimation procedure can be eas- email: wxiao@ncsu.edu smooth stability selection procedure where we replace
ily implemented using the iterative shooting algorithm.
We conduct extensive numerical studies to evaluate
the practical performance of the proposed method and
demonstrate it using two real studies.
email: zhichi1@gmail.com
Histologic tumor grade is a strong predictor of risk of recur- Mi Zhou*, North Carolina State University The World Health Organization defines screening as
rence in breast cancer. However, tumor grade readings by Huixia (Judy) Wang, North Carolina State University the presumptive identification of unrecognized disease
pathologists are susceptible to intra- and inter- observer or defects by means of tests, examinations or other
variability due to its subjective nature. For this limitation, Sequential detection of change point has many impor- procedures that can be applied rapidly. With the progress
tumor grade is not included in the breast cancer staging tant applications in finance, econometrics, engineering of medicine in recent decades, there is now consider-
system. Latent class models are considered for analysis of etc, where it is desirable to raise an alarm as soon as able focus on early detection of disease via population
such discrete diagnostic tests with the underlying truth as the system or model structure has an abrupt change. screening, giving the patient more treatment options
a latent variable. However, the model parameters are only In the current literature, most methods for sequential if the disease is found by screening and consequently a
locally identifiable that any permutation on the categories monitoring focus on the mean function. We develop better prognosis outlook. Both the benefits and harms
of the truth also leads to the same likelihood function. In a new sequential change point detection method for of screening are well-documented. There is considerable
many circumstances, the underlying truth is known associ- linear quantile regression models. The proposed statistic debate as to whether early screening for diseases, such
ated with risk of certain event in a trend. Here we propose is based on the cusum of quantile score functions. The as cancer and cardiovascular disease, is useful, has a
a joint model with a Cox proportional hazard model for method can be used to detect change points at a single positive trade-off and has a broad public health impact.
the time-to-event data where the underlying truth is a quantile level or across quantiles, and can accommodate Once a screening protocol is introduced in practice, there
latent predictor. The joint model not only fully identifies both homoscedastic and heteroscedastic errors. We is much controversy as to whether it is possible to forgo
all model parameters but also provide valid assessment of establish the asymptotic properties of the developed test screening at all. Given that a screening test is introduced,
the association between the diagnostic test and the risk procedure, and assess its finite sample performance by it is imperative to assess the accuracy of the screening
of event. The EM algorithm was used for estimation. We comparing it to existing change point detection methods protocol. We introduce a new approach, Screening ROC,
showed that the M-steps are equivalent to fitting survey- for linear regression models. that can be used to assess the accuracy of a screening
weighted Cox models. The proposed method is illustrated protocol. We demonstrate the approach with simulated
in the analysis of data from a breast cancer clinical trial email: mzhou2@ncsu.edu and real dataset.
and simulation studies.
email: paramita.sahachaudhuri@duke.edu
email: sew53@pitt.edu 8i. ASSESSMENT OF THE CLINICAL UTILITY
OF A PREDICTIVE MODEL FOR COLORECTAL
ADENOMA RECURRENCE 8k. A SSESSING CALIBRATION OF RISK PREDICTION
8g. LOGIC REGRESSION MODELING WITH REPEATED Mallorie Fiero*, University of Arizona MODELS FOR POLYTOMOUS OUTCOMES
MEASUREMENT DATA AND ITS APPLICATIONS ON Dean Billheimer, University of Arizona Kirsten Van Hoorde*, Katholieke Universiteit,
SYNDROMIC DIAGNOSIS OF VAGINAL INFECTIONS Joshua Mallet, University of Arizona Leuven, Belgium
IN INDIA Bonnie LaFleur, University of Arizona Sabine Van Huffel, Katholieke Universiteit,
Tan Li*, Florida International University Leuven, Belgium
Wensong Wu, Florida International University Current methods for evaluating predictive models Dirk Timmerman, Katholieke Universiteit,
include evaluating sensitivity, specificity and area Leuven, Belgium
Most of regression methodologies are unable to find the under the ROC curve (AUC). Other metrics, such as the Ben Van Calster, Katholieke Universiteit, Leuven, Belgium
effect of complex interaction but only simple interac- Brier score, Somers’ Dxy, and R^2 are also advocated
tions (two-way or three-way). However, the complex by statisticians. The potential limitation of all of these Risk prediction models assist clinicians in making
interaction between more than three predictors may be predictive assessments is the translation from statistical treatment decisions. Therefore the estimated risks
the cause to the differences in response, especially when to clinical relevance. Clinical relevance includes individual should correspond to observed risks (calibration). For
all the predictors are binary. Logic regression, developed determination of risks of treatments, as well as adoption binary outcomes, tools to assess calibration exist, e.g.
by Ruczinski and LeBlanc (2003), is a generalized regres- of novel markers (biomarkers) to determine prognostic calibration-in-the-large, calibration slope, and calibration
sion methodology, which has been used to construct outcome. In this paper, we evaluate two potential clinical plots. We extend these tools to models for nominal
the complex interactions between binary predictors as metrics of prediction performance, the predictiveness outcomes developed using baseline-category logistic
Boolean logic statements. However, this methodology is curve, proposed by Pepe, et. al. (2008), and decision curve regression. The logistic recalibration model is a baseline-
not applicable to the repeated measurement data, which analysis, proposed by Vickers, et. al. (2006). We apply
Abstracts 18
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
Abstracts 20
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
which corrects the bias of naive variance estimation from II/III muscle-invasive bladder cancer cohort. We evaluate
S
AB
the second stage regression model that is routinely used the costs for radical cystectomy vs. combined radiation/
in practice. This finding is critical in practice, as it will chemotherapy, and find that the significance of the treat-
creatinine to account for kidney damage caused by the greatly help to reduce both false positive and negative ment effect is sensitive to unmeasured Bernoulli, Poisson,
heavy metals. A survey logistic regression was used for the rates of those studies where naive variance estimation is and Gamma confounders.
analysis with the binary response being obesity. A statisti- used. We established the asymptotic results for the treat-
cal issue in this analysis is the use of weights coming from ment effect estimator. email: elizabeth.handorf@fccc.edu
different subsamples in the survey data. Barium was
found to be the only significant heavy metal in the final email: bzou@email.unc.edu
model, and myristoleic acid, eicosadienoic acid, stearic 9j. THE APPROPRIATENESS OF COMORBIDITY SCORES
acid, gamma-linolenic acid, alpha-linolenic acid, and TO ACCOUNT FOR CLINICAL PROGNOSIS AND
docosahexaenoic acid were the significant fatty acids. The 9h. ESTIMATING INCREMENTAL COST-EFFECTIVENESS CONFOUNDING IN OBSERVATIONAL STUDIES
significance of myristoleic acid, eicosadienoic acid, and RATIOS AND THEIR CONFIDENCE INTERVALS WITH Brian L. Egleston*, Fox Chase Cancer Center
stearic acid were consistent with existing literature. DIFFERENT TERMINATING EVENTS FOR SURVIVAL Steven R. Austin, Johns Hopkins University
TIME AND COSTS Yu-Ning Wong, Fox Chase Cancer Center
email: gemechis.djira@sdstate.edu Shuai Chen*, Texas A&M University Robert G. Uzzo, Fox Chase Cancer Center
Hongwei Zhao, Texas A&M University J. R. Beck, Fox Chase Cancer Center
9g. A SEMI-NONPARAMETRIC PROPENSITY Cost-effectiveness analysis is an important component Comorbidity adjustment is an important goal of health
SCORE MODEL FOR TREATMENT ASSIGNMENT of the economic evaluation of new treatment options. In services research and clinical prognosis. When adjusting
HETEROGENEITY WITH APPLICATION TO many clinical and observational studies of costs, censored for comorbidities in statistical models, researchers can
ELECTRONIC MEDICAL RECORD DATA data pose challenges to the cost-effectiveness analysis. include comorbidities individually or through the use of
Baiming Zou*, University of North Carolina, Chapel Hill We consider a special situation where the terminating summary measures such as the Charlson Comorbidity
Fei Zou, University of North Carolina, Chapel Hill events for survival time and costs are different. Traditional Index or Elixhauser score. While many health services
Jianwen Cai, University of North Carolina, Chapel Hill methods for statistical inference offer no means for deal- researchers have compared the utility of comorbidity
Haibo Zhou, University of North Carolina, Chapel Hill ing with censored data in these circumstances. To address scores using data examples, there has been a lack of
this gap, we propose a new method for deriving the mathematical rigor in most of the evaluations. In the
Analyzing electronic medical record (EMR) data to confidence interval for this incremental cost-effectiveness statistics literature, Hansen (Biometrika 2008) provided
compare the effectiveness of different treatments is ratio, based on the counting process and the general a theoretical justification for the use of prognostic scores.
a central component in the moment of comparative theory for missing data process. The simulation studies We examined the conditions under which individual ver-
effectiveness research (CER). A key statistical challenge and real data example show that our method performs sus summary measures are most appropriate. We expand
in this research is how to properly analyze the EMR data, very well for some practical settings, revealing a great on Hansen's work, and show that comorbidity scores
which is different from the clinical trial data where the potential for application to actual settings in which termi- created analogously to the Charlson Comorbidity Index
treatment assignment is random, to unbiasedly and nating events for survival time and costs differ. are indeed appropriate balancing scores for prognostic
efficiently estimate the true treatment effect in the modeling and comorbidity adjustment.
real world large-scale medical data. Existing methods, email: shuai@stat.tamu.edu
such as propensity score approach, generally assume all email: Brian.Egleston@fccc.edu
confounding variables are observed. As clinical dataset for
CER research are not designed to capture all confounding 9i. A GENERAL FRAMEWORK FOR
variables, heterogeneity will exist in the real world EMR SENSITIVITY ANALYSIS OF COST DATA 9k. A SSESSMENT OF HEALTH CARE QUALITY WITH
or clinical dataset. Most importantly, it is known that real WITH UNMEASURED CONFOUNDING MULTILEVEL MODELS
world patient’s treatment assignment is influenced by Elizabeth A. Handorf*, Fox Chase Cancer Center Christopher Friese*, University of Michigan
the physician (care provider), the system (e.g. insurance Justin E. Bekelman, University of Pennsylvania Rong Xia, University of Michigan
type), and the patient themselves (e.g. religion). Hence, Daniel F. Heitjan, University of Pennsylvania Mousumi Banerjee, University of Michigan
heterogeneity in the treatment assignment in real world Nandita Mitra, University of Pennsylvania
EMR or clinical data needs to be taken into account in Assessment of health care quality provided in US hospi-
estimating the true treatment effect. We propose a Estimates of treatment effects on cost from observational tals is not only an important medical research target but
semi-nonparametric propensity score (SNP-PS) model to studies are subject to bias if there are unmeasured also a challenging statistics problem. To account for the
deal with the heterogeneity of treatment assignment. confounders. It is therefore advisable in practice to assess hierarchical structure in the data, we applied multilevel
Our model makes no specific distribution assumption on the potential magnitude of such biases; in some cases, logistic models to study the effects of hospital characters
the random effects except that the distribution function closed-form expressions are available. We derive a gen- on health care quality, specially the risk-adjusted mortal-
is smooth, and thus is more robust to model misspecifica- eral adjustment formula using the moment-generating ity and failure-to-rescue. We have found that patients
tions. A truncated Hermite polynomial along with the function for log-linear models and explore special cases treated in the Magnet hospitals have significant lower
normal density is used to approximate the unknown under plausible assumptions about the distribution of rate of mortality and failure-to-rescue. We have also
density of the heterogeneity. To avoid the potential large the unmeasured confounder. We assess the performance compared the Magnet hospitals to non-Magnet hospitals
Monte Carlo errors of sampling based algorithms, we de- of the adjustment by simulation, in particular examining to discover the differences in hospital size, nursing,
veloped an adaptive EM algorithm for SNP-PS parameter robustness to a key assumption of conditional indepen- cost, location etc. These conclusions were based on the
estimates. More importantly, a robust and consistent vari- dence between the unmeasured and measured covariates national wide data from year 1998 to 2008.
ance estimate for the parameter estimator is proposed given the treatment indicator. We show how our method
is applicable to cost data with informative censoring, and email: rongxia@umich.edu
apply our method to SEER-Medicare cost data for a stage
Abstracts 22
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
11. SPATIAL STATISTICS FOR We provide a thorough assessment of the strengths and
S
AB
Abstracts 24
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
14. TOOLS FOR IMPLEMENTING REPRO- which was overcome by later tools like Sweave. The
S
AB
Abstracts 26
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
e-mail: cui@stt.msu.edu
Abstracts 28
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
a continuous outcome within strata defined by a subset 20. HEALTH SERVICES AND HEALTH
S
AB
Abstracts 30
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
email: qliu@live.unc.edu
Abstracts 32
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
In addition, direct parameters estimation for high dimen- 23. BIOSTATISTICAL METHODS IN
S
AB
email: qpan@gwu.edu
Abstracts 34
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
Abstracts 36
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
email: mw@stat.duke.edu
Abstracts 38
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
Abstracts 40
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
are obtained. Special treatment needs to be applied very regular circadian patterns in activity, while others
S
AB
when target curves are closed. We will illustrate the show marked variation in these patterns. We develop a
methodology with applications in 1D proteomics data, latent variable Poisson model that characterizes irregular-
2D mouse vertebra outlines and 3D protein secondary ity in the circadian pattern through latent variables
32. BAYESIAN METHODS structure data. for circadian, stochastic, and individual variation. A
parameterization is proposed for modeling covariate
BAYESIAN GENERALIZED LOW RANK REGRESSION
email: chengwen1985@gmail.com dependence on the degree of regularity in the circadian
MODELS FOR NEUROIMAGING PHENOTYPES AND
patterns over time. Markov chain Monte Carlo sampling
GENETIC MARKERS
is used to carry out Bayesian posterior computation.
Zakaria Khondker*, University of North Carolina,
A BAYESIAN MODEL FOR IDENTIFIABLE SUBJECTS Several variations of the proposed model are considered
Chapel Hill and PAREXEL International
Edward J. Stanek III*, University of Massachusetts, and compared via the deviance information criterion.
Hongtu Zhu, University of North Carolina, Chapel Hill
Amherst The proposed methodology is motivated by and applied
Joseph Ibrahim, University of North Carolina, Chapel Hill
Julio M. Singer, University of Sao Paulo, Brazil to the NEXT generation health study conducted by the
Eunice Kennedy Shriver National Institute of Child Health
We propose a Bayesian generalized low rank regression
Practical problems often involve estimating individual and Human Development of the Nation Institutes of
model (GLRR) for the analysis of both high-dimensional
latent values based on data from a sample. We discuss Health in the summer of 2010.
responses and covariates. This development is motivated
an application where latent LDL cholesterol levels of
by performing genome-wide searches for associations
women from an HMO are of interest. We use a Bayesian email: kims2@mail.nih.gov
between genetic variants and brain imaging phenotypes.
model with an exchangeable prior distribution that
GLRR integrates a low rank matrix to approximate the
includes subject labels, and trace how the prior distribu-
high-dimensional regression coefficient matrix of GLRR
tion is updated via the data to produce the posterior OVERLAP IN TWO-COMPONENT MIXTURE MODELS:
and a dynamic factor model to model the high-dimen-
distribution. The prior distribution is specified for a finite INFLUENCE ON INDIVIDUAL CLASSIFICATION
sional covariance matrix of brain imaging phenotypes.
population of women in the HMO assumed to arise from a José Cortiñas Abrahantes*, European Food
Local hypothesis testing is developed to identify signifi-
larger superpopulation. The novel aspect is accounting for Safety Authority
cant covariates on high-dimensional responses. Posterior
the labels in the prior. We illustrate this via an example, Geert Molenberghs, I-BioStat, Hasselt Universiteit
computation proceeds via an efficient Markov chain
and show how the exchangeable prior distribution can & Katholieke Universiteit Leuven, Belgium
Monte Carlo algorithm. A simulation study is performed
be constructed for a finite population that arose from a
to evaluate the finite sample performance of GLRR and
superpopulation. Using data that consists of the (label, The problem of modeling the distributions of the continu-
its comparison with several competing approaches. We
response) pair for a set of women, we illustrate how ous scale measured values from a diagnostic test (DT) for
apply GLRR to investigate the impact of 10,479 SNPs on
conditioning on the set of labels, the sequence of labels a specific disease is often encounter in epidemiological
chromosome 9 on the volumes of 93 regions of interest
in the sample space, and the actual response impacts studies, in which disease status of the animal might
(ROI) obtained from Alzheimer’s Disease Neuroimaging
the change from the prior to the posterior distributions. depend on characteristics such as herd, age and/or other
Initiative (ADNI).
In particular, we show that conditioning on the actual disease-specific risk factors. Techniques to decompose
response, alters the distribution of latent values for the observed DT values into their underlying components
email: khondker@email.unc.edu
women in the data, but not for the remaining women in are of interest, for which mixture models offer a viable
the population. pathway to deal with classification of individual samples,
and at the same time account for other factors influenc-
BAYESIAN ANALYSIS OF CONTINUOUS
email: stanek@schoolph.umass.edu ing the individual classification. Mixture models have
CURVE FUNCTIONS
been frequently used as a clustering technique, but the
Wen Cheng*, University of South Carolina, Columbia
classification performance of individual observations has
Ian Dryden, University of Nottingham, UK
A LATENT VARIABLE POISSON MODEL FOR ASSESSING been rarely discussed. A case study in salmonella was
Xianzheng Huang, University of South Carolina, Columbia
REGULARITY OF CIRCADIAN PATTERNS OVER TIME the motivating force to study classification performance
Sungduk Kim*, Eunice Kennedy Shriver National Institute based on mixture model. Simulations using different
We consider Bayesian analysis of continuous curve
of Child Health and Human Development, National measures to quantify overlap of the components and with
functions in 1D, 2D and 3D space. A fundamental aspect
Institutes of Health this the degree of separation between the components
of the analysis is that it is invariant under a simultaneous
Paul S. Albert, Eunice Kennedy Shriver National Institute were carried out. The results provide insight into the
warping of all the curves, as well as translation, rotation
of Child Health and Human Development, National potential problems that could occur when using mixture
and scale of each individual. We introduce Bayesian
Institutes of Health models to assign individual observations to specific
models based on the curve representation named Square
component in the population. The measures of overlap
Root Velocity Function (SRVF) introduced by Srivastava et
Actigraphs are often used to assess circadian patterns prove useful to identify the potential ranges for each
al. (2011, IEEE PAMI). A Gaussian process model for SRVF
in activity across time. Although there is a statistical of the classification performance measures, once the
of curves is proposed, and suitable prior models such as
literature on modeling the circadian mean structure, little mixture model is fitted.
Dirichlet process are employed for modeling the warping
work has been done in understanding variations in these
function as a Cumulative Distribution Function (CDF).
patterns over time. The NEXT generation health study email: jose.cortinasabrahantes@efsa.europa.eu
Simulation from posterior distribution is via Markov chain
collects longitudinal actigraphs over a seven day period,
Monte Carlo methods, and credibility regions for mean
where activity counts are observed at 30 second epochs.
curves, warping functions as well as nuisance parameters
Exploratory analysis suggest that some individuals have
email: chenzhe@mail.nih.gov
Abstracts 42
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
the microbial percentages. Interest lies in determining or indirectly through further spread of HIV from those
S
AB
which gut microflora influence the phenotypes while ac- partners. Assessment of the extent to which individual
counting for the direct relationship between diet and the (incident or prevalent) viruses are clustered within a
BAYESIAN SEMIPARAMETRIC VARIABLE SELECTION other variables. A new methodology for variable selection community will be biased if only a subset of subjects are
WITH APPLICATION TO DENTAL DATA in this context is presented that links the concept of observed, especially if that subset is not representative
Bo Cai*, University of South Carolina q-values from multiple hypothesis testing to the recently of the entire HIV infected population. To address this
Dipankar Bandyopadhyay, University of Minnesota developed weighted Lasso. concern, we develop a multiple imputation framework
in which missing sequences are imputed based on a
Normality assumption is typically adopted for random ef- email: tpgarcia@stat.tamu.edu biological model for the diversification of viral genomes.
fects in repeated and longitudinal data analysis. However, Data from a household survey conducted in a village in
such an assumption is not always realistic as random Botswana are used to illustrate these methods.
effects could follow any distribution. The violation of VARIABLE SELECTION IN SEMIPARAMETRIC
normality assumption may lead to potential biases of the TRANSFORMATION MODELS FOR RIGHT email: shelleyliu@fas.harvard.edu
estimates, especially when variable selection is taken into CENSORED DATA
account. On the other hand, flexibility of nonparametric Xiaoxi Liu*, University of North Carolina, Chapel Hill
assumptions (e.g. Dirichlet process) may potentially Donglin Zeng, University of North Carolina, Chapel Hill BAYESIAN INFERENCE FOR CORRELATED BINARY
cause centering problems which lead to difficulty of DATA VIA LATENT MODELING
interpretation of effects and variable selection. Motivated There is limited work on variable selection for general Deukwoo Kwon*, University of Miami
by this problem, we propose a Bayesian method for fixed transformation models with censored data. Existing Jeesun Jung, National Institute on Alcohol Abuse
and random effects selection in nonparametric random methods use either estimating functions or ranks so they and Alcoholism, National Institutes of Health
effects models. We model the regression coefficients via are computationally intensive and inefficient. In this Jun-Mo Nam, National Cancer Institute, National
centered latent variables which are distributed as probit work, we propose a computationally simple method for Institutes of Health
stick-breaking (PSB) scale mixtures (Pati and Dunson, variable selection in general transformation models. The Yi Qian, Amgen Inc.
2011). By using the mixture priors for centered latent proposed algorithm reduces to maximizing a weighted
variables along with covariance decomposition, we can partial likelihood function within an adaptive LASSO Correlated binary data usually arise in many clinical trials.
avoid the aforementioned problems and allow fixed and framework. It includes both proportional odds model and The matched-pair design is superior in power compared
random effects to be effectively selected in the model. proportional hazard model as special cases and easily to a two-sample design, and is commonly used in small-
We demonstrate advantages of the proposed approach incorporate time-dependent covariates. We establish the sample trials. McNemar test is well-known in this setting.
over the other methods in the simulated example. The asymptotic properties of the proposed method, including Several Bayesian approaches were also developed, but
proposed method is further illustrated through an ap- selection consistent and semiparametric efficiency of the the implementation requires complicated computation
plication to dental data. post-selection estimator. Our simulations demonstrate a techniques (Ghosh et al. 2000; Kateri et al. 2001; Shi et al.
good small-sample performance of the proposed method 2008, 2009). We propose a simplistic Bayesian approach
email: bcai@sc.edu and indicate that the variable selection result is robust using continuous latent model for either difference or
even if transformation function is misspecified. A real ratio of marginal probabilities. External information can
data analysis shows that the proposed method outper- also be easily incorporated into the model. We conduct
STRUCTURED VARIABLE SELECTION WITH Q-VALUES forms an external risk score method in future prediction. simulation study and present a real data example.
Tanya P. Garcia*, Texas A&M University
Samuel Mueller, University of Sydney email: xiaoxi1@unc.edu email: DKwon@med.miami.edu
Raymond J. Carroll, Texas A&M University
Tamara N. Dunn, University of California, Davis
Anthony P. Thomas, University of California, Davis 34. CLUSTERED DATA METHODS THE VALIDATION OF A BETA-BINOMIAL MODEL
Sean H. Adams, U.S. Department of Agriculture, FOR INTRA-CORRELATED BINARY DATA
Agricultural Research Service Western Human VIRAL GENETIC LINKAGE ANALYSES IN THE PRESENCE Jongphil Kim*, Moffitt Cancer Center, University of
Nutrition Research Center OF MISSING DATA South Florida
Suresh D. Pillai, Texas A&M University Shelley H. Liu*, Harvard School of Public Health Ji-Hyun Lee, Moffitt Cancer Center, University of
Rosemary L. Walzem, Texas A&M University Victor DeGruttola, Harvard School of Public Health South Florida
When some of the regressors can act on both the re- Viral genetic linkage based on data from HIV prevention The beta-binomial model accounting for the overdisper-
sponse and other explanatory variables, the already chal- trials at the community level can provide insight into sion of binary data requires an assumption that the
lenging problem of selecting variables when the number HIV transmission dynamics and the impact of preven- success probability of binary data is distributed as a beta
of covariates exceeds the sample size becomes more tion interventions. Analysis of clustering which utilize distribution. If that assumption does not hold, the infer-
difficult. A motivating example is a metabolic study in phylogenetic methods have the potential to inform ence based upon the model may be incorrect. This paper
mice that has diet groups and gut microbial percentages whether recently-infected individuals are infected by investigates beta-binomial model validation using the
that may affect changes in multiple phenotypes related viruses circulating within or outside a community. In ad- intra-correlated binary data which are generated without
to body weight regulation. The data have more variables dition, they have the potential to identify characteristics any assumption on the distribution for success probabil-
than observations and diet is known to act directly on of chronically infected individuals that make their viruses ity. In addition, a nonparametric estimator for the success
the phenotypes as well as on some or potentially all of likely to cluster with others circulating within a com- probability and the intraclass correlation is compared to a
munity. Such clustering can be related to the potential of parametric estimator for those data.
such individuals to contribute to the spread of the virus,
either directly through transmission to their partners email: Jongphil.Kim@moffitt.org
email: tingtingzhan@gmail.com
Abstracts 44
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
36. STATISTICAL METHODS FOR NEXT INTEGRATIVE ANALYSIS OF *-SEQ DATASETS FOR A
S
AB
Abstracts 46
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
clinical assay variation (technical). The further step using patient (without labels); statistical label fusion is used to
S
AB
blinded samples can be retrospective or prospective and resolve conflicts and assign labels to the target. We will
is in some way similar to a phase-II clinical trial. A typical discuss our recent progress (and outstanding challenges)
followed-up in additional functional and translational statistical design issue to address here is determination in determining how to optimally fuse information in the
studies. Such methods include: step-wise integrative of the sample size needed to achieve high confidence form of spatial labels.
analysis, gene set and pathway analysis, interaction that the classifier indeed possesses the desired accuracy
analysis, molecular subtype analysis and functional for clinical use. I will discuss efficient study designs for email: bennett.landman@vanderbilt.edu
association analysis. marker validation, borrowing ideas from adaptive group
sequential clinical trials.
email: bfridley@kumc.edu IMAGING PATTERN ANALYSIS USING MACHINE
email: cheng.cheng@stjude.org LEARNING METHODS
Christos Davatzikos*, University of Pennsylvania
STATISTICAL CHALLENGES IN TRANSLATIONAL
BIOINFORMATICS DRUG-INTERACTION RESEARCH 39. TRANSLATIONAL METHODS FOR During the past decade there has been increased interest
in the medical imaging community for advanced pattern
Lang Li*, Indiana University, Indianapolis STRUCTURAL IMAGING analysis and machine learning methods, which capture
Novel drug interactions can be predicted through large- complex imaging phenotypes. This interest has been
STATISTICAL TECHNIQUES FOR THE NORMALIZATION
scale text mining and knowledge discovery from the amplified by the fact that many diseases and disorders,
AND SEGMENTATION OF STRUCTURAL MRI
published literature. Using natural language processing particularly in neurology and neuropsychiatry, involve
Russell T. Shinohara*, University of Pennsylvania
(NLP), the key challenge is to extract drug interaction spatio-temporally complex patterns that are not easily
Perelman School of Medicine
relationship through the machine learning. We propose detected or quantified. One of the most important chal-
Elizabeth M. Sweeney, Johns Hopkins Bloomberg School
a hybrid mixture model and tree-based approach to lenges in this field has been appropriate dimensionality
of Public Health
extract drug interaction relationship. This two-pronged reduction and feature extraction/selection, i.e. finding
Jeff Goldsmith, Columbia University Mailman School
approach takes advantage of both the numerical features which combination of imaging features forms most
of Public Health
of reported drug interaction results and the linguistic discriminatory imaging patterns. We present work along
Ciprian M. Crainiceanu, Johns Hopkins Bloomberg School
styles of presenting drug interactions. In this talk, I will these lines, by describing a joint generative-discrimina-
of Public Health
discuss the concept and method of literature-based tive approach based on constrained non-negative matrix
knowledge discovery in drug interaction research and factorization, aiming at extracting imaging patterns of
While computed tomography and other imaging
data mining-based drug interaction research using large maximal discriminatory power. Applications in structural
techniques are measured in absolute units with physical
electronic medical record databases. I will discuss the pros imaging of Alzheimer’s Disease and in functional MRI
meaning, magnetic resonance images are expressed in
and cons and multiple design and analyses strategies for are presented. We also give a broader overview of other
arbitrary units that are difficult to interpret and differ
large-scale drug interaction screening studies that use applications in this field.
between study visits and subjects. Much work in the
large-scale electronic medical record databases. I will image processing literature has centered on histogram
illustrate these concepts in the context of a translational email: Christos.Davatzikos@uphs.upenn.edu
matching and other histogram mapping techniques,
bioinformatics drug interaction study on myopathy, a but little focus has been on normalizing images to have
muscle weakness adverse drug event, elucidating on both biologically interpretable units. We explore this key goal
the clinical significance and the molecular pharmacology A SPATIALLY VARYING COEFFICIENTS MODEL FOR
for statistical analysis and the impact of normalization
significance. THE ANALYSIS OF MULTIPLE SCLEROSIS MRI DATA
on cross-sectional and longitudinal segmentation of
Timothy D. Johnson*, University of Michigan
pathology.
email: lali@iupui.edu Thomas E. Nichols, University of Warwick
Tian Ge, University of Warwick
email: taki.shinohara@gmail.com
STUDY DESIGN AND ANALYSIS OF BIOMARKER AND Multiple Sclerosis (MS) is an autoimmune disease af-
GENOMIC CLASSIFIER VALIDATION fecting the central nervous system by disrupting nerve
STATISTICAL METHODS FOR LABEL FUSION:
Cheng Cheng*, St. Jude Children’s Research Hospital transmission. This disruption is caused by damage to
ROBUST MULTI-ATLAS SEGMENTATION
the myelin sheath surrounding nerves that acts as an
Bennett A. Landman*, Vanderbilt University
Validation study of the discovered biomarkers and clas- insulator. Patients with MS have a multitude of symptoms
sifiers is an indispensable step in translating genomic that depend on where lesions occur in the brain and/or
Mapping individual variation of head and neck anatomy
findings into clinical practice. The Institute of Medicine spinal cord. Patient symptoms are rated by the Kurtzke
is essential for radiotherapy and surgical interventions.
has issued guidelines (Evolution of Translational Omics: Functional Systems (FS) scores and the paced auditory
Precise localization of affected structures enables effective
http://www.iom.edu/Reports/2012/Evolution-of- serial addition test (PASAT) score. The eight functional
treatment while minimizing impact to vulnerable systems.
Translational-Omics.aspx) for biomarker discovery systems are: 1) pyramidal; 2) cerebellar; 3) brainstem; 4)
Modern image processing techniques enable one to estab-
and validation before clinical application in deciding sensory; 5) bowel and bladder; 6) visual; 7) cerebral; and
lish point-wise correspondence between scans of different
how to treat patients, setting high bars for biomarker 8) other. Of interest is whether lesion locations can be
patients using non-rigid registration, and, in theory, allow
validation. The Test Validation Phase requires careful and predicted using these FS and PASAT scores. We propose an
for extremely rapid labeling of medical images via label
rigorous consideration of the validation study design. autoprobit regression model with spatially varying coef-
transfer (i.e., copying of labels from atlas patients to target
The classification /prediction accuracy assessed in the ficients. The data of interest are binary lesion maps. The
patients). To compensate for algorithmic and anatomical
analytical validation depends on, among many factors, model incorporates both spatially varying covariates as
mismatch, the state of the art for atlas-based segmenta-
the biomarkers’ classification capabilities (biological) and well as patient specific, non-spatially varying covariates.
tion is to use a multi-atlas approach in which multiple
In contrast to most spatial applications, in which only
canonical patients (with labels) are registered to a target
Abstracts 48
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
patients with Alzheimer’s disease, patients with mild ESTIMATING WEIGHTED KAPPA UNDER
S
AB
Abstracts 50
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
by this result, existing solutions are reviewed and literature. The proposed methods are semi-parametric in
S
AB
further improved. The improved solutions aim to achieve the sense that no models are assumed for the cause-specif-
minimum compromises, and thus are cost-efficient in ic hazards or the subdistribution function. Observed event
ESTIMATING COVARIATE EFFECTS BY TREATING practices. A Bayesian framework is adopted and discussed counts are weighted using Inverse Probability of Treatment
COMPETING RISKS in the statistical inference. Weighting (IPTW) and Inverse Probability of Censoring
Bo Fu*, University of Pittsburgh Weighting (IPCW). We apply the proposed method to
Chung-Chou H. Chang, University of Pittsburgh email: ye.liang@okstate.edu national kidney transplantation data from the Scientific
Registry of Transplant Recipients (SRTR).
In analyses of time-to-event data from clinical trials
or observational studies, it is important to account for ADJUSTING FOR OBSERVATIONAL SECONDARY email: lfan@umich.edu
informative dropouts that are due to competing risks. If TREATMENTS IN ESTIMATING THE EFFECTS OF
researchers fail to account for the association between RANDOMIZED TREATMENTS
the event of interest and informative dropouts, they may Min Zhang*, University of Michigan IMPROVING MEDIATION ANALYSIS BASED ON
encounter unknown amplitude bias when they identify Yanping Wang, Eli Lilly and Company PROPENSITY SCORES
the effects of potential risk factors related to time to Yeying Zhu*, The Pennsylvania State University
the main cause of failure. In this article, we propose an In randomized clinical trials, for example, on cancer pa- Debashis Ghosh, The Pennsylvania State University
approach that jointly models time to the main event and tients, it is not uncommon that patients may voluntarily Donna L. Coffman, The Pennsylvania State University
time to the competing events. The approach uses a set of initiate a secondary treatment post randomization, which
random terms to capture the dependence between the needs to be properly adjusted for in estimating the true In mediation analysis, researchers are interested in exam-
main and competing events. It offers two fundamental effects of randomized treatments. As an alternative to ining whether a randomized treatment or intervention
likelihood functions that have different structures for the approach based on a marginal structural Cox model may affect the outcome through an intermediate factor.
the random terms but may be combined in practice. To (MSCM) in Zhang and Wang (2012), we propose methods Traditional mediation analysis (Baron and Kenny, 1986)
estimate the unknown covariate effects by optimizing that view the time to start a secondary treatment applies a structure equation modeling (SEM) approach
the joint likelihood functions, we used three methods: as a dependent censoring process, which is handled and decomposes the intent-to-treat (ITT) effect into
the Gaussian quadrature method, the Bayesian-Markov separately from the usual censoring such as loss to direct and indirect effects. More recent approaches inter-
chain Monte Carlo method, and the hierarchical likelihood follow-up. Two estimators are proposed, both based on pret the mediation effects based on potential outcome
method. We compared the performance of these methods the idea of inverse weighting by the probability of having framework. In practice, there often exist confounders,
via simulations and then applied them to identify risk fac- not started a secondary treatment yet, and the second pre-treatment covariates that jointly influence the
tors for Alzheimer’s disease and other forms of dementia. estimator focuses on improving efficiency of inference mediator and the outcome. Under the sequential ignor-
by a robust covariate-adjustment that does not require ability assumption, propensity-score-based methods
email: bof5@pitt.edu any additional assumptions. The proposed methods are are often used to adjust for confounding and reduce
evaluated and compared with the MSCM-based method the dimensionality of confounders simultaneously. In
in terms of bias and variance tradeoff using simulations this article, we show that combining machine learning
IDENTIFIABILITY OF MASKING PROBABILITIES and application to a cancer clinical trial. algorithms (such as a generalized boosting model) and
IN THE COMPETING RISKS MODEL logistic regression to estimate propensity scores can be
Ye Liang*, Oklahoma State University email: mzhangst@umich.edu more accurate and efficient in estimating the controlled
Dongchu Sun, University of Missouri direct effects, compared to logistic regression only. The
proposed methods are general in the sense that we
Masked failure data arise in both reliability engineering COMPARING CUMULATIVE INCIDENCE FUNCTIONS can combine multiple candidate models to estimate
and epidemiology. The phenomenon of masking occurs BETWEEN NON-RANDOMIZED GROUPS THROUGH propensity scores and use the cross-validation criterion
when a subject is exposed to multiple risks. A failure of DIRECT STANDARDIZATION to select the optimal subset of the candidate models for
the subject can be caused by one of the risks, but the Ludi Fan*, University of Michigan combining.
cause is unknown or known up to a subset of all risks. In Douglas E. Schaubel, University of Michigan
reliability engineering, a device may fail because of one email: yxz165@psu.edu
of its defective components. However, the precise failure Competing risks data arise naturally in many biomedical
cause is often unknown due to lack of proper diagnos- studies since the subject is often at risk for one of many
tic equipment or prohibitive costs. In epidemiology, types of events that would preclude him/her from expe- ON THE NONIDENTIFIABILITY PROPERTY
sometimes the cause of death for a patient is not known riencing all other events. It is often of interest to compare OF ARCHIMEDEAN COPULA MODELS
exactly due to missing or partial information on the state outcomes between subgroups of subjects. In the presence Antai Wang*, Columbia University
death certificate. A competing risks model with masking of observational data, group is typically not randomized, so
probabilities is widely used for the masked failure data. that adjustment must be made for differences in covariate In this talk, we present a peculiar property shared
However, in many cases, the model suffers from an distributions across groups. The proposed method aims to by the Archimedean copula models, that is, different
identification problem. Without proper restrictions, the compare the cumulative incidence function (CIF) between Archimedean copula models with distinct dependent
masking probabilities in the model could be nonestima- subgroups of subjects from an observational study by a levels can have the same crude survival functions for de-
ble. Our work reveals that the identifiability of masking measure based on direct standardization that contrasts pendent censored data. This property directly shows the
probabilities depends on both the masking structure of the population average cumulative incidence under two nonidentifiability property of the Archimedean copula
data and the cause-specific hazard functions. Motivated scenarios: (i) subjects are distributed across groups as per models. The proposed procedure is then demonstrated by
the existing population (ii) all subjects are members of a two examples.
particular group. The proposed comparison of CIFs has a
strong connection to measures used in the causal inference email: aw2644@columbia.edu
email: tebbs@stat.sc.edu
Abstracts 52
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
email: mclaina@mailbox.sc.edu
ON
R
ST
and our temporal gradient process. After demonstrating lution model (SWADM) to estimate these parameters by
S
AB
the effectiveness of our new model via simulation, we accounting for the complex spatio-temporal dependence
reanalyze the asthma hospitalization data and compare our and patterns in the images adaptively. SWADM has three
SPATIAL-TEMPORAL MODELING OF THE CRITICAL findings to those from previous work. features: being spatial, being hierarchical, being adaptive.
WINDOWS OF AIR POLLUTION EXPOSURE FOR To hierarchically and spatially denoise functional images,
PRETERM BIRTH email: quic0038@umn.edu SWADM creates adaptive ellipsoids at each location to
Joshua Warren*, University of North Carolina, Chapel Hill capture spatio-temporal dependence among imaging ob-
Montserrat Fuentes, North Carolina State University servations in neighboring voxels and times. A simulation
Amy Herring, University of North Carolina, Chapel Hill BAYESIAN SEMIPARAMETRIC MODEL FOR SPATIAL study is used to demonstrate the meth and examine its
Peter Langlois, Texas Department of State Health Services INTERVAL-CENSORED DATA finite sample performance. Our simulation study confirms
Chun Pan*, University of South Carolina that SWADM outperforms the voxel-wise deconvolution
Exposure to high levels of air pollution during the pregnan- Bo Cai, University of South Carolina approach and SVD. Then the method is applied to the real
cy is associated with increased probability of preterm birth Lianming Wang, University of South Carolina data and compared with the results from the voxel-wise
(PTB), a major cause of infant morbidity and mortality. New Xiaoyan Lin, University of South Carolina approach and SVD.
statistical methodology is required to specifically determine
when a particular pollutant impacts the PTB outcome, to Interval-censored survival data are often recorded in email: jwang@bios.unc.edu
determine the role of different pollutants, and to character- medical practice. Although some methods have been
ize the spatial variability in these results. We introduce developed for analyzing such data, issues still remain in
a new Bayesian spatial model for PTB which identifies terms of efficiency and accuracy in estimation. In addi- 47. INNOVATIVE DESIGN AND ANALYSIS
tion, interval-censored data with spatial correlation are
susceptible windows throughout the pregnancy jointly for
not unusual but less studied. In this paper, we propose an
ISSUES IN FETAL GROWTH STUDIES
multiple pollutants while allowing these windows to vary
continuously across space and time. A directional Bayesian efficient Bayesian approach under proportional hazards
CLINICAL IMPLICATIONS OF THE NATIONAL
approach is implemented to correctly characterize the un- model to analyze general interval-censored survival data
STANDARD FOR NORMAL FETAL GROWTH
certainty of the climatic and pollution variables throughout with spatial correlation. Specifically, a linear combina-
S. Katherine Laughon*, Eunice Kennedy Shriver
the modeling process. We apply our methods to geo-coded tion of monotone splines is used to model the unknown
National Institute of Child Health and Human
birth outcome data from the state of Texas (2002-2004). baseline cumulative hazard function, leading to a finite
Development, National Institutes of Health
Our results indicate the susceptible window for higher number of parameters to estimate while maintain-
preterm probabilities is mid-first trimester for the fine PM ing adequate modeling flexibility. A two-step data
Normal fetal growth is a marker of an optimal intra-
and beginning of the first trimester for the ozone. augmentation through Poisson latent variables is used to
uterine environment and is important for the long-term
facilitate the computation of posterior distributions that
health of the offspring. Defining normal and abnormal
email: joshuawa@email.unc.edu are essential in the MCMC sampling algorithm proposed.
fetal growth in clinical practice and research has not been
A conditional autoregressive distribution is employed
straightforward. Many clinical and epidemiologic studies
to model the spatial dependency. A simulation study is
have classified abnormal growth as small for gesta-
HETEROSCEDASTIC VARIANCES IN AREALLY conducted to evaluate the performance of the proposed
tional age (SGA) or large for gestational age (LGA) using
REFERENCED TEMPORAL PROCESSES WITH method. The approach is illustrated through a geographi-
normative birth weight references that may not reflect
AN APPLICATION TO CALIFORNIA ASTHMA cally referenced smoking cessation data in southeastern
patterns of under- or overgrowth. An SGA neonate may
HOSPITALIZATION DATA Minnesota where time to relapse is modeled and spatial
be constitutionally small, while a normal birth weight
Harrison S. Quick*, University of Minnesota structure is examined.
percentile can occur in the setting of suboptimal fetal
Sudipto Banerjee, University of Minnesota growth. In addition, only several longitudinal ultrasound
Bradley P. Carlin, University of Minnesota email: chunpan2003@hotmail.com
studies have been conducted, and most of the larger
studies were performed in Europe with the majority of
Often in regionally aggregated spatial models, a single vari- subjects being Caucasian. I will discuss the study design
ance parameter is used to capture variability in the spatial SPATIO-TEMPORAL WEIGHTED ADAPTIVE
for the NICHD Fetal Growth Study which is enrolling
association structure of the model. In real world phenom- DECONVOLUTION MODEL TO ESTIMATE THE
2400 women to represent the diversity of the United
ena, however, spatially-varying factors such as climate and CEREBRAL BLOOD FLOW FUNCTION IN DYNAMIC
States. I will also discuss the importance of developing
geography may impact the variability in the underlying SUSCEPTIBILITY CONTRAST MRI
biostatistical methodology for establishing both distance
process. Here, our interest is in modeling monthly asthma Jiaping Wang*, University of North Texas, Denton
(cross-sectional) and velocity (longitudinal) reference
hospitalization rates over an 18 year period in the counties Hongtu Zhu, University of North Carolina, Chapel Hill
curves. Further, I will discuss the methodological chal-
of California. Earlier work has accounted for both spatial Hongyu An, University of North Carolina, Chapel Hill
lenges in establishing personalized reference curves and
and temporal association using a process-based method predictions of abnormal birth outcomes. This talk will
that permits inference on the underlying temporal rates of Dynamic susceptibility contrast MRI measures the perfu-
serve as a prelude to more technical talks on statistical
change, or gradients, and has revealed progressively muted sion in numerical diagnostic and therapy-monitoring
model development and design.
transitions into and out of the summer months. We extend settings. One approach to estimate the blood flow param-
this methodology to allow for region-specific variance eters assume a convolution relation between the arterial
email: laughonsk@mail.nih.gov
components and separate, purely temporal processes, both input function and the tissue enhancement profile of
of which we believe can simultaneously help avoid over- the ROIs via a residue function, then extract the residue
and undersmoothing in our overall spatiotemporal process functions by some deconvolution techniques like the
singular value decomposition (SVD), or Fourier transform
based method for each voxel independently. This paper
develops a spatio-temporal weighted adaptive deconvo-
Abstracts 56
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
AN FDR APPROACH FOR MULTIPLE responses to prior treatments received, and it is not
S
AB
CHANGE-POINT DETECTION clear how such a model can be adapted to clinical trials
Ning Hao, University of Arizona employing more than one randomization. Besides, since
treatment is modified post-baseline, the hazards are
49. HUNTING FOR SIGNIFICANCE IN The detection of change points has attracted a great deal unlikely to be proportional across treatment regimes.
HIGH-DIMENSIONAL DATA of attention in many fields. From the hypothesis testing Although Lokhnygina and Helterbrand (Biometrics. 2007
perspective, a multiple change-point problem can be Jun;63(2):422-8.) introduced the Cox regression method
DISCOVERING SIGNALS THROUGH viewed as a multiple testing problem by testing every for two-stage randomization designs, their method can
NONPARAMETRIC BAYES data point as a potential change point. The false discovery only be applied to test the equality of two treatment
Linda Zhao*, University of Pennsylvania rate (FDR) approach to multiple testing problems has regimes that share the same maintenance therapy.
been studied extensively since the seminal paper of Moreover, their method does not allow auxiliary variables
Many classification problems can be conveniently Benjamini and Hochberg (1995). However, the multiple to be included in the model nor does it account for
formulated in terms of Bayesian mixture prior models. testing problem derived from change-point detection treatment effects that are not constant over time. In this
The mixture prior structure lends itself especially well presents a problem that beyond the classical framework. article, we propose a model that assumes proportionality
for adapting to varying degrees of sparsity. Typically, In this talk, we will introduce an FDR approach for across covariates within each treatment regime but not
parametric assumptions are made about the components change-point detection, based on a screening and rank- across treatment regimes. Comparisons among treatment
of the mixture priors. In the following, we propose a ing algorithm. Both simulated and real data analyses will regimes are performed by testing the log ratio of the
parametric and a nonparametric classification procedures be presented to demonstrate the use of the SaRa. estimated cumulative hazards. The ratio of the cumulative
using a mixture prior Bayesian approach for a risk func- hazard across treatment regimes is estimated using a
tion that combines misclassification loss and an $L_2$ ninghao008@gmail.com weighted Breslow-type statistic. A simulation study was
penalty. While the parametric procedure is closer to conducted to evaluate the performance of the estimators
traditional approaches, in simulations, we show that the and proposed tests.
nonparametric classifier typically outperforms it when ESTIMATION OF FDP WITH UNKNOWN
the parametric prior is misspecified; the two procedures COVARIANCE DEPENDENCE email: wahed@pitt.edu
have comparable performance even when the shape of Jianqing Fan; Princeton University
the parametric prior is specified correctly. We illustrate Xu Han*, Temple University
the properties of the two classifiers on a publicly available ADAPTIVE TREATMENT POLICIES FOR
gene expression dataset. This is a joint work with Fuki, I. Multiple hypothesis testing is a fundamental problem INFUSION STUDIES
and Raykar, V. in high dimensional inference, with wide applications in Brent A. Johnson*, Emory University
many scientific fields. When test statistics are correlated,
email: lzhao@wharton.upenn.edu false discovery control becomes very challenging under In post-operative medical care, some drugs are
arbitrary dependence. In Fan, Han & Gu (2011), the administered intravenously through an infusion pump.
authors gave a method to consistently estimate false Comparing infusion drugs and rates of infusion are typi-
OPTIMAL MULTIPLE TESTING PROCEDURE discovery proportion when the covariance matrix of test cally conducted through randomized controlled clinical
FOR LINEAR REGRESSION MODEL statistics is known. The method is based on eigenvalues trials across two or more arms and summarized through
Jichun Xie, Temple University and eigenvectors of the covariance matrix. However, in standard statistical analyses. However, the presence of
Zhigen Zhao*, Temple University practice the covariance matrix is usually unknown. Con- infusion-terminating events can adversely affect primary
sistent estimate of an unknown covariance matrix is itself endpoints and complicate statistical analyses of second-
Multiple testing problems are thoroughly understood a difficult problem. In the current paper, we will derive ary endpoints. A secondary analysis of considerable
for independent normal vectors but remain vague for some results to consistently estimate FDP even when the interest is to assess the effects of infusion length once the
dependent data. In this paper, we construct an optimal covariance matrix is unknown. test drug has been shown superior to standard of care.
multiple testing procedure for the linear regression model This analysis is complicated due to presence or absence of
when the dimension $p$ is much larger than the sample email: hanxu3@temple.edu treatment-terminating events and potential time-varying
size $n$. Linear regression model can be viewed as a gen- confounding in treatment assignment. Connections to
eralization of the normal random vector model with an dynamic treatment regimes offer a principled approach
arbitrary dependency structure. The proposed procedure to this secondary analysis and related problems, such
can control FDR under any specified levels; meanwhile,
50. NEW DEVELOPMENTS IN THE
as adaptive, personalized infusion policies, robust and
it can asymptotically minimize the FNR. In other word, CONSTRUCTION AND OPTIMIZATION efficient estimation, and the analysis of infusion policies
it an achieve validity and efficiency at the same time. OF DYNAMIC TREATMENT REGIMES in continuous time. These concepts will be illustrated with
The numerical study shows that the proposed procedure data from Duke University Medical Center.
has better performance compared with the competitive COVARIATE-ADJUSTED COMPARISON OF DYNAMIC
methods. We also applied the procedure to a genome- TREATMENT REGIMES IN SEQUENTIALLY email: bajohn3@emory.edu
wide association study of hypertension for American RANDOMIZED CLINICAL TRIALS
African population and got interesting results. Xinyu Tang, University of Arkansas for Medical Sciences
Abdus S. Wahed*, University of Pittsburgh
email: zhaozhg@temple.edu
Cox proportional hazards model is widely used in survival
analysis to allow adjustment for baseline covariates.
The proportional hazard assumption may not be valid
for treatment regimes that depend on intermediate
Abstracts 58
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
Abstracts 60
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
and tests are very robust in terms of type I error evalua- tions using a probability model. We use the profile hidden
S
AB
tions, and are powerful by empirical power evaluations. Markov model to model a protein sequence. A kernel
The methods are applied to analyze cleft palate data of logistic model models the effect of protein sequences as a
PATHWAY SELECTION AND AGGREGATION USING TGFA gene of an Irish study. random effect whose covariance matrix is parameterized
MULTIPLE KERNEL LEARNING FOR RISK PREDICTION by the kernel. To test the null hypothesis that the protein
Jennifer A. Sinnott*, Harvard University email: fanr@mail.nih.gov sequence has an effect on the outcome, we approximate
Tianxi Cai, Harvard University the score test statistics with a chi-squared distribution and
take the maximum over a grid of a scale parameter which
Attempts to predict risk using high dimensional genomic GENOTYPE CALLING AND HAPLOTYPING FOR only exists under the alternative hypothesis. A parametric
data can be made difficult by the large number of fea- FAMILY-BASED SEQUENCE DATA bootstrap approach is used to obtain the reference distri-
tures and the potential complexity of the relationship Wei Chen*, University of Pittsburgh School of Medicine bution. We apply our method to the HIV-1 vaccine study to
between features and the outcome. Integrating prior Bingshan Li, Vanderbilt University Medical Center identify regions of the gp120 protein sequence where IgA
biological knowledge into risk prediction with such Zhen Zeng, University of Pittsburgh School antibody binding correlates with infection risk.
data by grouping genomic features into pathways and of Public Health
networks reduces the dimensionality of the problem and Serena Sanna, Centro Nazionale di Ricerca (CNR), Italy email: youyifong@gmail.com
could improve models by making them more biologically Carlo Sidore, Centro Nazionale di Ricerca (CNR), Italy
grounded and interpretable. Pathways could have com- Fabio Busonero, Centro Nazionale di Ricerca (CNR), Italy
plex signals, so our approach to model pathway effects Hyun Min Kang, University of Michigan ESTIMATION AND INFERENCE OF THE THREE-LEVEL
should allow for this complexity. The kernel machine Yun Li, University of North Carolina, Chapel Hill INTRACLASS CORRELATION COEFFICIENT
framework has been proposed to model pathway effects Gonçalo R. Abecasis, University of Michigan Mat D. Davis*, University of Pennsylvania and
because it allows for nonlinear relationships within Theorem Clinical Research
pathways; it has been used to make predictions for vari- Emerging sequencing technologies allow common and J. Richard Landis, University of Pennsylvania
ous types of outcomes from individual pathways. When rare variants to be systematically assayed across the Warren Bilker, University of Pennsylvania
multiple pathways are under consideration, we propose human genome in many individuals. In order to improve
a multiple kernel learning approach to select important variant detection and genotype calling, raw sequence Since the early 1900's, the intraclass correlation coefficient
pathways and efficiently combine information across data are typically examined across many individuals. We (ICC) has been used to quantify the level of agreement
pathways. We derive our approach for a general survival describe a method for genotype calling in settings where among different assessments on the same object. By
modeling framework with a convex objective function, sequence data are available for unrelated individuals comparing the level of variability that exists within
and illustrate its application under the Cox proportional and parent-offspring trios and show that modeling trio subjects to the overall error, a measure of the agreement
hazards and accelerated failure time (AFT) models. Nu- information can greatly increase the accuracy of inferred among the different assessments can be calculated.
merical studies with the AFT model demonstrate that this genotypes and haplotypes, especially on low to modest Historically, this has been performed using subject as the
approach performs well in predicting risk. The methods depth sequence data. Our method considers both linkage only random effect. However, there are many cases where
are illustrated with an application to breast cancer data. disequilibrium patterns and the constraints imposed by other nested effects, such as site, should be controlled
family structure when assigning individual genotypes for when calculating the ICC to determine the chance
email: jsinnott@hsph.harvard.edu and haplotypes. Using both simulations and real data, we corrected agreement adjusted for other nested factors.
show trios provide higher genotype calling and phasing We will present a unified framework to estimate both
accuracy across the frequency spectrum than the existing the two-level and three-level ICC for continuous and
ASSOCIATION ANALYSIS OF COMPLEX DISEASES methods that ignores family structure. Our method can categorical data. In addition, the corresponding standard
USING TRIADS, PARENT-CHILD PAIRS AND be extended to handle nuclear and multi-generational errors and confidence intervals for both continuous and
SINGLETON CASES families in a computationally feasible manner. We categorical ICC measurements will be presented. Finally,
Ruzong Fan*, Eunice Kennedy Shriver National Institute anticipate our method will facilitate genotype calling an example of the effect that controlling for site can have
of Child Health and Human Development, National and haplotype inference for many ongoing sequencing on ICC measures will be presented for subjects within
Institutes of Health projects. genotyping plates comparing genetically determined race
to patient reported race.
Separate analysis of triad families or case-control data email: chenw8@hotmail.com
is routinely used in association study. Triad studies are email: davismat@mail.med.upenn.edu
important since they are robust in terms of less prone to
false positive due to population structure. Case control 55. AGREEMENT MEASURES FOR
design is widely used in association study and it is EFFECTS AND DETECTION OF RANDOM-EFFECTS
LONGITUDINAL/SURVIVAL DATA MODEL MISSPECIFICATION IN GLMM
powerful but it is prone to false positive due to popula-
tion structure. By doing separate analysis using triads or Shun Yu*, University of South Carolina, Columbia
MUTUAL INFORMATION KERNEL LOGISTIC MODELS
case-control data, it does not fully take the advantage of Xianzheng (Shan) Huang, University of
WITH APPLICATION IN HIV VACCINE STUDIES
each study and it can be less powerful than a combined South Carolina, Columbia
Saheli Datta, Fred Hutchinson Cancer Research Center
analysis. In this paper, we develop likelihood-based Youyi Fong*, Fred Hutchinson Cancer Research Center
statistical models and likelihood ratio tests to test as- We develop a diagnostic method for identifying the
Georgia Tomaras, Duke University
sociation between complex diseases and genetic markers skewness of the true distribution of random effects in
by using combinations of full triads, parent-child pairs, generalized linear mixed models with binary responses.
We propose a mutual information kernel logistic model to
and affected singleton cases for a unified analysis. By We investigate large-sample properties of maximum
study the effect of protein sequences. A mutual informa-
simulation studies, we show that the proposed models likelihood estimators under different ways of misspecify-
tion kernel measures the similarity between two observa-
Abstracts 62
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
ON
R
ST
EXPERIENCE WITH STATISTICAL CONSULTING ON PERMUTATION TESTS FOR SUBGROUP ANALYSES WITH
S
AB
email: mmwall@columbia.edu
65 ENAR 2013 • Spring Meeting • March 10–13
COMPARISON OF ADDITIVE AND MULTIPLICATIVE TWO-SAMPLE NONPARAMETRIC COMPARISON 59. GRADUATE STUDENT AND RECENT
BAYESIAN MODELS FOR LONGITUDINAL COUNT FOR PANEL COUNT DATA WITH UNEQUAL
DATA WITH OVERDISPERSION PARAMETERS: A OBSERVATION PROCESSES
GRADUATE COUNCIL INVITED
SIMULATION STUDY Yang Li*, University of Missouri, Columbia SESSION: GETTING YOUR FIRST JOB
Mehreteab F. Aregay*, I-BioStat, Hasselt Universiteit Hui Zhao, Huazhong Normal University, China
& Katholieke Universiteit Leuven, Belgium Jianguo Sun, University of Missouri, Columbia THE GRADUATE STUDENT AND RECENT
Ziv Shkedy, I-BioStat, Hasselt Universiteit GRADUATE COUNCIL
& Katholieke Universiteit Leuven, Belgium This article considers two-sample nonparametric Victoria Liublinska*, Harvard University
Geert Molenberghs, I-BioStat, Hasselt Universiteit comparison based on panel count data. Most approaches
& Katholieke Universiteit Leuven, Belgium that have been developed in the literature require an The Graduate Student and Recent Graduate Council
equal observation process for all subjects. However, such (GSRGC) to allow ENAR to better serve the special needs
In applied statistical data analysis, overdispersion is a assumption may not hold in reality. A new class of test of students and recent graduates. I will describe the
common feature. It can be addressed using both multipli- procedures are proposed that allow unequal observa- activities we envision the GSRGC participating in, how it
cative and additive andom effects. A multiplicative model tion processes for the subjects from different treatment would be constituted, and how it will interface with RAB
for count data enters a gamma random effect as a multi- groups, and both univariate and multivariate panel count and ENAR leadership.
plicative factor into the mean, whereas an additive model data are considered. The asymptotic normality of the pro-
assume a normally distributed random effect, entered posed test statistics is established and a simulation study email: vliublin@fas.harvard.edu
into the linear predictor. Using Bayesian principles, these is conducted to evaluate the finite sample properties of
ideas are applied to longitudinal count data, based on the the proposed approach. The simulation results show that
work of Molenberghs, Verbeke, and Dem´etrio (2007). the proposed procedures work well for practical situations FINDING A POST-DOCTORAL FELLOWSHIP
The performance of the additive and multiplicative ap- and especially for sparsely distributed data. They are ap- OR A TENURE-TRACK JOB
proaches is compared using a simulation study. plied to a set of panel count data from a skin cancer study. Eric Bair*, University of North Carolina Center for
Neurosensory Disorders
email: mehreteabfantahun.aregay@med.kuleuven.be email: ylx33@mail.missouri.edu
This talk is primarily intended for doctoral students in
statistics and will discuss jobs for statisticians in academia
THE FOCUSED AND MODEL AVERAGE ESTIMATION A MARGINALIZED ZERO-INFLATED and strategies for finding jobs in academia. It will include
FOR PANEL COUNT DATA POISSON REGRESSION MODEL WITH OVERALL a discussion of the benefits and drawbacks of working in
HaiYing Wang*, University of Missouri EXPOSURE EFFECTS academia as well as the types of academic jobs that exist.
Jianguo Sun, University of Missouri D. Leann Long*, University of North Carolina, Chapel Hill It will also discuss strategies for finding job openings,
Nancy Flournoy, University of Missouri John S. Preisser, University of North Carolina, Chapel Hill preparing job applications, interviewing, negotiating
Amy H. Herring, University of North Carolina, Chapel Hill job offers, and other tactics for obtaining a job offer in
One of the main goals in model selection is to improve academia.
the quality of the estimators of interested parameters. For The zero-inflated Poisson (ZIP) regression model is often
this, Claeskens and Hjort proposed the focused informa- employed in public health research to examine the email: ebair@email.unc.edu
tion criterion (FIC) which emphasis on the accuracy of relationships between exposures of interest and a count
estimation of particular parameters of interest. In a com- outcome exhibiting many zeros, in excess of the amount
panion paper, they showed that the estimation efficiency expected under Poisson sampling. The regression coef- GETTING YOUR FIRST JOB IN THE
can be further improved by taking a weighted average ficients of the ZIP model have latent class interpretations FEDERAL GOVERNMENT
on sub-model estimators. The purpose of this paper is to that are not well suited for inference targeted at overall Lillian Lin*, Centers for Disease Control and Prevention
extend the aforementioned ideas to panel count data. exposure effects, specifically, in quantifying the effect of
Panel count data frequently occurs in long-term medical an explanatory variable in the overall mixture population. The federal government is the largest single employer of
follow-up studies, in which the primary object is often to We develop a marginalized ZIP model approach for inde- statisticians in the United States yet most statistics gradu-
evaluate the effectiveness of newly developed medicine pendent responses to model the population mean count ate students do not know which agencies hire statisti-
or treatments. In terms of statistical modeling, the directly, allowing straightforward inference for overall cians and are not familiar with the federal hiring process.
effectiveness is often depicted by only a few parameters, exposure effects and easy accommodation of offsets The presenter has managed a statistics group since 2002.
although the inclusion of other parameters and covariates representing individuals’ risk times and empirical robust She will introduce nomenclature, review the distribu-
affects the estimation of parameters of interest. So the variance estimation for overall log incidence density tion of federal statistician positions, describe typical job
focused and model average estimation fill the need of ratios. Through simulation studies, the performance of responsibilities, and advise on the application process.
this problem ideally. In the context of panel count data, maximum likelihood estimation of the marginalized ZIP
we define the FIC and derive the asymptotic distribution model is assessed and compared to existing post-hoc email: lel5@cdc.gov
of the model average estimator. A simulation study is methods for the estimation of overall effects in the
carried out to examine the finite sample performance and traditional ZIP model framework. The marginalized ZIP
a real data from a cancer study is analyzed to illustrate model is applied to a recent study of a safer sex counsel-
the practical application. ing intervention.
Abstracts 66
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
email: rajnrao@umich.edu
Abstracts 68
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
REDUCING EFFECT OF PLACEBO RESPONSE such modification has been devised for pseudo-likelihood
S
AB
WITH SEQUENTIAL PARALLEL COMPARISON DESIGN based strategies. We propose a suite of corrections to the
FOR CONTINUOUS OUTCOMES standard form of pseudo-likelihood, to ensure its validity
Michael J. Pencina*, Boston University and under missingness at random. Our corrections follow both
63. STATISTICAL METHODS FOR TRIALS single and double robustness ideas, and is relatively simple
Harvard Clinical Research Institute
WITH HIGH PLACEBO RESPONSE Gheorghe Doros, Boston University to apply. When missingness is in the form of dropout in
Denis Rybin, Boston University longitudinal data or incomplete clusters, such a structure
BEYOND CURRENT ENRICHMENT DESIGNS USING can be exploited towards further corrections. The proposed
Maurizio Fava, Massachusetts General Hospital
PLACEBO NON-RESPONDERS method is applied to data from a clinical trial in onychomy-
Yeh-Fong Chen*, U.S. Food and Drug Administration cosis and a developmental toxicity study.
The Sequential Parallel Comparison Design (SPCD) is a nov-
el approach intending to limit the effect of high placebo
Several designs based on the enriched population have email: geert.molenberghs@uhasselt.be
response in clinical trials. It can be applied to studies with
been proposed to deal with high placebo response com-
binary as well as ordinal or continuous outcomes. Analytic
monly seen in psychiatric trials. Among these are a design
methods proposed to date for continuous data included
with the placebo lead-in phase, a sequential parallel COMPOSITE LIKELIHOOD INFERENCE FOR
methods based on seemingly unrelated regression and
design (Fava et al., 2003), and a two-way enriched clinical COMPLEX EXTREMES
ordinary least squares. Both ignore some data in estimat-
trial design (Ivanova and Tamura, 2011). One common Emeric Thibaud*, Anthony Davison and Raphaël Huser
ing the analytic model and have to rely on imputation
feature of these designs is the focus on placebo non- École polytechnique fédérale de Lausanne
techniques to account for missing data. To overcome
responders in an attempt to eliminate the influence of
these issues we propose a repeated measures linear
placebo responders on the effect size. However, the onset Complex extreme events, such as heat-waves and flood-
mixed model which uses all outcome data collected in the
of treatments’ effect may vary by disease pathology, ing, have major effects on human populations and
trial and accounts for data that is missing at random. An
so an optimal duration for the placebo lead-in phase is environmental sustainability, and there is growing
appropriate contrast formulated based on the final model
uncertain, and in turn the effectiveness of these enrich- interest in modelling them realistically. Marginal
is used to test the primary hypothesis of no difference in
ment designs is debatable. In this presentation, we will modeling of extremes, through the use of the generalized
treatment effects between study arms pooled across the
share our evaluations of these enrichment designs and extreme-value and generalized Pareto distributions, is
two phases of the SPCD trial. Simulations show that our
compared them with our proposed new design strategy. well-developed, but spatial theory, which relies on max-
approach preserves the type I error even for small sample
sizes and offers adequate power under a wide variety of stable processes, is needed for more complex settings
email: yehfong.chen@fda.hhs.gov and is currently an active domain of research. Max-stable
assumptions.
models have been proposed for various types of data,
email: mpencina@bu.edu but unfortunately classical likelihood inference cannot
COMPARING STRATEGIES FOR PLACEBO CONTROLLED be used with them, because only their pairwise marginal
TRIALS WITH ENRICHMENT distributions can be calculated in general. The use of
Anastasia Ivanova*, University of North Carolina, composite likelihood makes inference feasible for such
Chapel Hill 64. COMPOSITE/PSEUDO LIKELIHOOD complex processes. This talk will describe the major issues
METHODS AND APPLICATIONS in such modelling, and illustrate them with an application
We describe several two-stage design strategies for a to extreme rainfall in Switzerland.
placebo controlled trial where treatment comparison in DOUBLY ROBUST PSEUDO-LIKELIHOOD ESTIMATION
the second stage is performed in an enriched population. FOR INCOMPLETE DATA email: e.thibaud@gmail.com
Examples include placebo lead-in, randomized with- Geert Molenberghs*, I-BioStat, Hasselt Universiteit
drawal and sequential parallel comparison design. Using & Katholieke Universiteit Leuven, Belgium
the framework of the recently proposed two-way enriched Geert Verbeke, I-BioStat, Hasselt Universiteit STANDARD ERROR ESTIMATION IN THE EM
design which includes all of these strategies as special & Katholieke Universiteit Leuven, Belgium ALGORITHM WHEN JOINT MODELING OF SURVIVAL
cases we give recommendation on which two-stage strat- Michael G. Kenward, London School of Hygiene AND LONGITUDINAL DATA
egy to use. Robustness of various designs is discussed. and Tropical Medicine, UK Cong Xu, University of California, Davis
Birhanu Teshome Ayele, I-BioStat, Hasselt Universiteit Paul Baines, University of California, Davis
email: aivanova@bios.unc.edu & Katholieke Universiteit Leuven, Belgium Jane-Ling Wang*, University of California, Davis
In applied statistical practice, incomplete measure- Joint modeling of survival and longitudinal data has been
ment sequences are the rule rather than the exception. studied extensively in recent literature. The likelihood
Fortunately, in a large variety of settings, the stochastic approach is one of the most popular estimation methods
mechanism governing the incompleteness can be ignored employed within the joint modeling framework. Typically
without hampering inferences about the measurement the parameters are estimated using maximum likelihood,
process. While ignorability only requires the relatively with computation performed by the EM algorithm. How-
general missing at random assumption for likelihood and ever, one drawback of this approach is that standard error
Bayesian inferences, this result cannot be invoked when (SE) estimates are not automatically produced when using
non-likelihood methods are used. A direct consequence of the EM algorithm. Many different procedures have been
this is that a popular non-likelihood-based method, such proposed to obtain the asymptotic variance-covariance
as generalized estimating equations, needs to be adapted matrix for the parameters when the number of parameters
towards a weighted version or doubly-robust version,
when a missing at random process operates. So far, no
Abstracts 70
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
time, battery voltage, and other unpredictable, but often analysis, which automatically takes into account the spa-
S
AB
occurring, events. The results of a small study of the asso- tial information among neighboring voxels. We conduct
ciation between our activity metrics and health outcomes extensive simulation studies to evaluate the prediction
shows promising initial results. performance of the proposed approach and its ability to
66. FUNCTIONAL DATA ANALYSIS identify related regions to the response variable, with the
email: jbai@jhsph.edu underlying assumption that only a few relatively small
TESTING THE EFFECT OF FUNCTIONAL COVARIATE FOR
subregions are associated with the response variable.
FUNCTIONAL LINEAR MODEL
We then apply the proposed approach to search for brain
Dehan Kong*, North Carolina State University
SPARSE SEMIPARAMETRIC NONLINEAR MODEL WITH subregions that are associated with cognitive impairment
Ana-Maria Staicu, North Carolina State University
APPLICATION TO CHROMATOGRAPHIC FINGERPRINTS using PET imaging data.
Arnab Maity, North Carolina State University
Michael R. Wierzbicki*, University of Pennsylvania
Li-bing Guo, Guangdong College of Pharmacy email: xuejwang@umich.edu
In this article, we consider the functional linear model
Qing-tao Du, Guangdong College of Pharmacy
with a scalar response. Our goal is to test for no effect of
Wensheng Guo, University of Pennsylvania
the model, that is to test whether the functional coeffi-
MECHANISTIC HIERARCHICAL GAUSSIAN PROCESSES
cient function equals zero. We use the functional principal
Chromatography is a popular tool in determining the Matthew W. Wheeler*, The National Institute for
component analysis and write the functional linear
chemical composition of biological samples. For example, Occupational Safety and Health and University of
model as a linear combination of the functional principal
traditional Chinese herbal medications are comprised of North Carolina, Chapel Hill
component scores. Various traditional tests such as Wald,
numerous compounds and identifying commonalities David B. Dunson, Duke University
score, likelihood ratio and F test are applied. We compare
across a set of samples is of interest in quality control and Amy H. Herring, University of North Carolina, Chapel Hill
the performance of these tests under both regular dense
identification of active compounds. Chromatographic Sudha P. Pandalai, The National Institute for
design and sparse irregular design. We also do some
experiments output a plot of all the detected abundances Occupational Safety and Health
research on how sample size affect the performance of
of the compounds over time. The resulting chromato- Brent A. Baker, The National Institute for Occupational
these tests. Both the asymptotic null distribution and
gram is characterized by a number of sharp spikes each Safety and Health
the alternative distribution are derived for those tests
of which corresponds to the presence of a different
that work well. We have also discussed about sample size
compound. Due to variation in experimental condi- The statistics literature on functional data analysis focuses
needed to achieve certain power. We demonstrate our
tions, a given spike is often not aligned in time across primarily on flexible black-box approaches, which are
results using simulations and real data.
a set of samples. We propose a sparse semiparametric designed to allow individual curves to have essentially
nonlinear model for the establishment of a standardized any shape while characterizing variability. Such methods
email: dkong2@ncsu.edu
chromatographic fingerprint from a set of chromatograms typically cannot incorporate mechanistic information,
under different experimental conditions. Our framework which is commonly expressed in terms of differential
results in simultaneous alignment, model selection, and equations. Motivated by studies of muscle activation, we
ACCELEROMETRY METRICS FOR EPIDEMIOLOGY
estimation of chromatograms. Wavelet basis expansion is propose a nonparametric Bayesian approach that takes
Jiawei Bai*, Johns Hopkins University
used to model the common shape function of the curves into account mechanistic understanding of muscle physi-
Bing He, Johns Hopkins University
nonparametrically. Curve registration is performed by ology. A novel class of hierarchical Gaussian processes
Thomas A. Glass, Johns Hopkins University
parametric modeling of the time transformations. Penal- is defined that favors curves consistent with differential
Ciprian M. Crainiceanu, Johns Hopkins University
ized likelihood with the adaptive lasso penalty provides a equations defined on motor, damper, spring systems. A
unified criterion for model selection and estimation. The Gibbs sampler is proposed to sample from the posterior
We introduce a set of metrics for human activity based
adaptive lasso estimators are shown to possess the oracle distribution and applied to a study of rats exposed to non-
on high density acceleration recordings from a hip worn
property. We apply the model to data of the medicinal injurious muscle activation protocols. Although motivated
three-axis accelerometer. Data were collected from 34
plant, rhubarb. by muscle force data, a parallel approach can be used to
older subjects who wore the devices for up to seven days
include mechanistic information in broad functional data
during their daily living activities. We propose simple
email: mwierz@mail.med.upenn.edu analysis applications.
metrics that are based on two concepts: 1) time active, a
measure of the length of time when the subject activity
email: mwheeler@cdc.gov
is distinguishable from rest; and 2) activity intensity, a
REGULARIZED 3D FUNCTIONAL REGRESSION
measure of relative amplitude of activity relative to rest.
FOR BRAIN IMAGING VIA HAAR WAVELETS
Both measurements are time dependent, but their means
Xuejing Wang*, University of Michigan VARIABILITY ANALYSIS ON REPEATABILITY
and standard deviations are reasonable and complemen-
Bin Nan, University of Michigan EXPERIMENT OF FLUORESCENCE SPECTROSCOPY
tary summaries of daily activity. All measurements are
Ji Zhu, University of Michigan DEVICES
normalized (have the same interpretation across subjects
Robert Koeppe, University of Michigan Lu Wang*, Rice University
and days), easy to explain and implement, and reproduc-
Dennis D. Cox, Rice University
ible across platforms and software implementations. This
There has been an increasing interest in the analysis
is a non-trivial task in an observational study where raw
of functional data in recent years. Samples of curves, This project is about to investigate the use of spectro-
acceleration can be dramatically affected by the location
images, or other functional observations are often scopic devices to detect cancerous and pre-cancerous
of the device, angle with respect to body, body geometry,
collected in many fields. Our primary motivation and lesions. One major problem with bio-medical applications
subject-specific size and direction of energy produced,
application come from brain imaging studies on cognitive of optical spectroscopy is the repeatability of the mea-
impairment in elderly subjects with brain disorders. We surements. The measured spectra cannot be accurately
propose a highly effective regularized Haar-wavelet- measured; they are functional data with variations in
based approach for the analysis of three-dimensional
brain imaging data in the framework of functional data
Abstracts 72
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
it without losing the interpretability. We focus on this issue is conducted to examine the association of BMI and
S
AB
by integrating random forests and the evolutionary algo- cause-specific mortalities based on a pooled data with
rithm into the interaction trees algorithm, while preserving 33,144 individuals. BMI was transformed to capture
TIME-SENSITIVE PREDICTION RULES FOR the tree structure. The advantage of this approach is that the curvature association of BMI and mortality. The
DISEASE RISK OR ONSET THROUGH LOCALIZED it allows for the identification of subgroups that benefit associations were analyzed using Cox model at first and
KERNEL MACHINE LEARNING most from the treatment. We evaluate the properties of the the proportional hazards (PH) assumptions were tested.
Tianle Chen*, Columbia University modified interaction trees algorithm and compare it with Time-dependent covariates Cox model was used to model
Huaihou Chen, New York University the original interaction trees algorithm via simulations. the dynamic associations when the PH assumptions for
Yuanjia Wang, Columbia University The strengths of the proposed method are demonstrated any BMI related term were violated. Three specific types
Donglin Zeng, University of North Carolina at Chapel Hill through a survival data example. of causes of mortalities were examined, including car-
diovascular disease (CVD), cancer, and other causes. For
Accurately classifying whether a presymptomatic subject email: yic33@pitt.edu women, no dynamic association of BMI and cause-specific
is at risk of a disease using genetic and clinical markers mortality was found. For men, the associations of BMI
offers the potential for early intervention well before and three cause-specific mortalities were all dynamic.
the onset of clinical symptoms. For many diseases, 68. E PIDEMIOLOGIC METHODS IN Time-dependent covariate Cox models were used to show
that the dynamic associations of BMI with CVD, cancer
the risk varies with age and varies with markers that SURVIVAL ANALYSIS and other causes mortalities were different.
are themselves dependent on age. This work aims at
identifying effective time-sensitive classification and MATCHING IN THE PRESENCE OF MISSING
prediction rules for age-dependent disease risk or email: winnie.huiquanzhang@gmail.com
DATA IN TIME-TO-EVENT STUDIES
onset using age-sensitive biomarkers through localized Ruta Brazauskas*, Medical College of Wisconsin
kernel machine learning. In particular, we develop a Mei-Jie Zhang, Medical College of Wisconsin
large-margin classifier implemented with localized kernel GENERALIZED CASE-COHORT STUDIES WITH
Brent R. Logan, Medical College of Wisconsin
support vector machine. We study the convergence rate MULTIPLE EVENTS
of the developed rules as a function of kernel bandwidth Soyoung Kim*, University of North Carolina, Chapel Hill
Matched pair studies in survival analysis are often done
and offer guidelines on the choice of the bandwidth as Jianwen Cai, University of North Carolina, Chapel Hill
to examine the survival experience of patients with rare
a function of the sample size. Ranking the biomarkers conditions or in the situation where extensive collection of
based on their cumulative effects provides an opportunity Case-cohort studies have been recommended for infre-
additional information on cases and/or controls is required.
to select important markers. We extend our approach to quent diseases or events in large epidemiologic studies.
Once matching is done, analysis is usually performed
longitudinal data through the use of a nonparametric This study design consists of a random sample of the
by using stratified or marginal Cox proportional hazards
decision function with random effects, where we model entire cohort, named the subcohort, and all the subjects
models. However, in many studies some patients will have
the main effects of biomarkers, time, and their interaction with the disease of interest. When the rate of disease is
missing values of the covariates they should be matched
through appropriate kernel functions. Subject-specific not low or the number of cases are not small, the general-
on. In this presentation, we will examine several methods
random intercepts and random slopes are included in the ized case-cohort study which selects subset of all cases is
that could be used to match cases and controls when some
decision function to account for patient heterogeneity used. When several diseases are of interest, several gener-
covariate values are missing. They range from matching
and improve prediction. We apply the proposed methods alized casecohort studies are usually conducted using the
using only individuals with complete data to more complex
to a real world Huntington’s disease data. same subcohort. The common practice is to analyze each
matching procedures performed after the imputation of the
disease separately ignoring data collected on sampled
missing values of the covariates. A simulation study is used
email: tc2411@columbia.edu subjects with the other diseases. This is not an efficient
to explore the performance of these matching techniques
use of the data. In this paper, we propose efficient esti-
under several patterns and proportions of missing data.
mation for proportional hazards model by making full use
IMPROVEMENTS TO THE INTERACTION of available covariate information for the other diseases.
email: ruta@mcw.edu
TREES ALGORITHM FOR SUBGROUP ANALYSIS We consider both joint analysis and separate analysis for
IN CLINICAL TRIALS the multiple diseases. We propose an estimating equation
Yi-Fan Chen*, University of Pittsburgh approach with a new weight function. We establish that
APPLICATION OF TIME-DEPENDENT COVARIATES
Lisa A. Weissfeld, University of Pittsburgh the proposed estimator is consistent and asymptotically
COX MODEL IN EXAMINING THE DYNAMIC
normally distributed. Simulation studies show that the
ASSOCIATIONS OF BODY MASS INDEX AND
With the advent of personalized medicine, the goal of proposed methods using all available information gain
CAUSE-SPECIFIC MORTALITIES
an analysis is to identify subgroups that will receive the efficiency. We apply our proposed method to the data
Jianghua He, University of Kansas Medical Center
greatest benefit from a given treatment. The approach con- from the Busselton Health Study.
Huiquan Zhang*, University of Kansas Medical Center
sidered here centers on methods for clinical trials that are
geared towards exploring heterogeneity between subjects email: kimso@live.unc.edu
Previous studies have shown that the association of body
and its impact on treatment response. One such approach mass index (BMI) and all-cause mortality is dynamic
is an extension of classification and regression trees (CART) that different study designs may lead to different or even
based on the development of interaction trees so that opposite results. To better understand the controversial
subgroups within treatment arms are better identified. association of BMI and all-cause mortality, this study
With these analyses it is possible to generate hypotheses
for future clinical trials and to present the results in a readily
accessible format. One major issue with this approach is the
greediness of the algorithm and the difficulty of addressing
Abstracts 74
OSTER PRESEN
S|P TAT
ACT IO
TR
NS
S
AB
A COMMENT ON SAMPLE SIZE CALCULATIONS STUDY DESIGN IN THE PRESENCE OF ERROR-PRONE
FOR BINOMIAL CONFIDENCE INTERVALS SELF-REPORTED OUTCOMES
Lai Wei*, State University of New York at Buffalo Xiangdong Gu*, University of Massachusetts, Amherst
Cai and Zeng (2004). However, the sample size/power Alan D. Hutson, State University of New York at Buffalo Raji Balasubramanian, University of Massachusetts,
calculation for stratified case-cohort (SCC) design has not Amherst
been addressed before. This article extends the results in We firstly examine sample size calculations for a binomial
Cai and Zeng (2004) to SCC by introducing the stratified proportion based on the confidence interval width of In this paper, we consider data settings in which the
log-rank type test statistic for the SCC design and deriving the Agresti-Coull, Wald and Wilson Score intervals. We outcome of interest is a time-to-event random variable
the sample size/power calculation formula. Simulation point out that the commonly used methods based on that is observed at intermittent time points through
studies show that the proposed test for SCC with small known and fixed standard errors cannot guarantee the error-prone tests. For this setting, using a likelihood-
sub-cohort sampling fractions is valid and efficient for the desired confidence interval width given a hypothesized based approach we develop methods for power and
situations where the disease rate is low. Furthermore, op- proportion. Therefore, a new adjusted sample size sample size calculations and compare the relative ef-
timization of sampling in SCC is discussed in comparison calculation method is introduced, which is based on the ficiency between perfect and imperfect diagnostic tests.
with the proportional and balanced sampling techniques, conditional expectation of the width of the confidence Our work is motivated by diabetes self-reports among the
and the corresponding sample size calculation formulas interval given the hypothesized proportion. This idea is approximately 160,000 women enrolled in the Women's
are derived for these three designs. Theoretical powers further extended to Poisson distribution and continuous Health Initiative. We also compare designs under different
from these three designs are compared for the situations distribution, such as exponential distribution. missing data mechanisms and evaluate the effect of
when the disease rates are homogeneous and hetero- error-prone disease ascertainment at baseline.
geneous over the strata. The results show that either the email: laiwei@buffalo.edu
proportional or balanced design can possibly yield higher email: xdgu@schoolph.umass.edu
power than the other. The optimal design yields the high-
est power with the smallest required sample size among OPTIMAL DESIGN FOR DIAGNOSTIC ACCURACY
all three designs. STUDIES WHEN THE BIOMARKER IS SUBJECT 70. MULTIPLE TESTING
TO MEASUREMENT ERROR
email: wenrong.hu@cslbehring.com Matthew T. White*, Boston Children’s Hospital MULTIPLICITY ADJUSTMENT OF MULTI-LEVEL
Sharon X. Xie, University of Pennsylvania HYPOTHESIS TESTING IN IMAGING BIOMARKER
RESEARCH
THE EFFECT OF INTERIM SAMPLE SIZE Biomarkers have become increasingly important in recent Shubing Wang*, Merck
RECALCULATION ON TYPE I AND II years for their ability to effectively distinguish diseased
ERRORS WHEN TESTING A HYPOTHESIS from non-diseased individuals, potentially avoiding In imaging biomarker research, longitudinal studies are
ON REGRESSION COEFFICIENTS unnecessary and invasive diagnostic testing. Given their often used for reproducibility and efficacy validation.
Sergey Tarima*, Medical College of Wisconsin importance, it is critical to design and analyze studies to The multiple biomarkers, multiple groups and repeated
Aniko Szabo, Medical College of Wisconsin produce reliable estimates of diagnostic performance. measures impose the multiple testing of biomarker ef-
Peng He, Medical College of Wisconsin Biomarkers are often obtained with measurement error, ficacies complex multiple-level structures. Less powerful
Tao Wang, Medical College of Wisconsin which may cause the biomarker to appear ineffective conventional multiplicity adjustment methods, either
if not taken into account in the analysis. We develop have no explicit assumption of correlation structures,
The dependence of sample size formulas on nuisance optimal design strategies for studying the effectiveness such as False Discovery Rate (FDR), or only assume simple
parameters inevitably forces investigators to use of an error prone biomarker in differentiating diseased one-dimensional correlations, such as Random Field
estimates of these nuisance parameters. We investigated from non-diseased individuals and focus on the area Theory. The author proposed a multi-level multiplicity
the effect of naive interim sample size recalculation based under the receiver operating characteristic curve (AUC) as adjustment method by first building the hierarchical
on re-estimated nuisance parameters on type I and II the primary measure of effectiveness. Using an internal structures of multiple tests. On the highest level are the
errors. More than 200 simulation studies with 10,000 reliability sample within the diseased and non-diseased biomarkers, which are usually correlated, but not ordered.
Monte-Carlo repetitions each were completed covering groups, we develop optimal study design strategies that Within a biomarker, the highly correlated tests, such
various scenarios: linear and logistic regressions; two 1) minimize the variance of the estimated AUC subject to as within-group change tests, are first identified. The
to ten predictors; binary and continuous predictors; no, constraints on the total number of observations or total joint distribution of these tests can be calculated based
moderate, and strong correlation between predictors; cost of the study or 2) achieve a pre-specified power. We on a linear mixed-effects model. Therefore, the explicit
different treatment effects; different internal pilot sample develop optimal allocations of the number of subjects distribution of the simultaneous confidence band can be
sizes and different upper bounds for the total sample size. in each group, the size of the reliability sample in each estimated, as well as the adjusted p-values. The rest of
Only Wald tests were considered in our investigation. For group, and the number of replicate observations per the tests belong to the less correlated test category. We
linear models we observed a minor positive inflation of subject in the reliability sample in each group under a then apply a double FDR procedure proposed by Mehrotra
the type I error increasing as the expected sample size variety of commonly seen study conditions. and Heyse to set up the different thresholds and to
getting closer to the sample size of the internal pilot. We calculate adjusted p-values at different levels.
also noted that power increases for these cases. If the email: matthew.thomas.white@gmail.com
expected sample size is getting closer to the upper bound email: shubing_wang@merck.com
of the total sample size, power decreases. Internal sample
size recalculation for logistic regression models often pro-
duces too conservative type I errors and/or overpowered
study designs.
email: sergey.s.tarima@gmail.com
e-mail: jmgt@umich.edu
Abstracts 76
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
an eight-parameter SWAT (Soil and Water Assessment to scale) yielding intensities (which are easy to scale).
S
AB
Tool) model where daily stream flows and phosphorus Fitting IPMs in our setting is quite challenging and is most
concentrations are modeled for the Town Brook watershed feasibly done either by working in the spectral domain
which is part of the New York City water supply. or with a pseudo-likelihood, in conjunction with Laplace
72. JABES SHOWCASE approximation. We illustrate with an investigation of
email: dr24@cornell.edu forest dynamics using data from Duke Forest as well as a
MODELING SPACE-TIME DYNAMICS OF AEROSOLS
U.S.national survey called the Forest Inventory Analysis.
USING SATELLITE DATA AND ATMOSPHERIC
TRANSPORT MODEL OUTPUT
IMPROVING CROP MODEL INFERENCE THROUGH email: alan@stat.duke.edu
Candace Berrett*, Brigham Young University
BAYESIAN MELDING WITH SPATIALLY-VARYING
Catherine A. Calder, The Ohio State University
PARAMETERS
Tao Shi, The Ohio State University
Andrew O. Finley*, Michigan State University 73. STATISTICAL CHALLENGES IN
Ningchuan Xiao, The Ohio State University
Sudipto Banerjee, University of Minnesota
Darla K.. Munroe, The Ohio State University
Bruno Basso, Michigan State University
LARGE-SCALE GENETIC STUDIES
OF COMPLEX DISEASES
Kernel-based models for space-time data offer a flexible
An objective for applying a Crop Simulation Model (CSM)
and descriptive framework for studying atmospheric GENE-GENE INTERACTION ANALYSIS FOR
in precision agriculture is to explain the spatial variability
processes. Nonstationary and anisotropic covariance NEXT-GENERATION SEQUENCING
of crop performance. CSMs require inputs related to soil,
structures can be readily accommodated by allow- Momiao Xiong*, University of Texas School of
climate, management, and crop genetic information
ing kernel parameters to vary over space and time. In Public Health
to simulate crop yield. In practice, however, measuring
addition, dimension reduction strategies make model Yun Zhu, Tulane University
these inputs at the desired high spatial resolution is
fitting computationally feasible for large datasets. Fitting Futao Zhang, University of Texas School of Public Health
prohibitively expensive. We propose a Bayesian modeling
these models to data derived from instruments onboard
framework that melds a CSM with sparse data from
satellites, which often contain significant amounts of Critical barriers in interaction analysis for rare variants
a yield monitoring system to deliver location specific
missingness due to cloud cover and retrieval errors, can is that most traditional statistical methods for testing
posterior predicted distributions of yield and associated
be difficult. In this presentation, we propose to overcome gene-gene interaction were originally designed for testing
unobserved spatially-varying CSM parameter inputs. The
the challenges of missing satellite-derived data by the interaction for common variants and are difficult to
proposed Bayesian melding model consists of a systemic
supplementing an analysis with output from a computer be applied to rare variants due to their low power. The
component representing output from the physical model
model, which contains valuable information about the great challenges for successful detection of interactions
and a residual spatial process that compensates for the
space-time dependence structure of the process of inter- with next-generation sequencing data are (1) lack of deep
bias in the physical model. The spatially-varying inputs
est. We illustrate our approach through a case study of understanding measure of interaction and statistics with
to the systemic component arise from a multivariate
aerosol optical depth across mainland Southeast Asia. We high power to detect interaction, (2) lack of concepts,
Gaussian process while the residual component is mod-
include a crossvalidation study to assess the strengths and methods and tools for detection of interactions for rare
eled using a univariate Gaussian process. Due to the large
weaknesses of our approach. variants, (3) severe multiple testing problems, and (4)
number of observed locations in the motivating dataset
we seek dimension reduction using low-rank predic- heavy computations. To meet this challenge, we take
email: cberrett@stat.byu.edu a genome region or a gene as a basic unit of interac-
tive processes to ease the computational burden. The
proposed model is illustrated using the Crop Environment tion analysis and use high dimensional data reduction
Resources Synthesis (CERES)-Wheat CSM and wheat yield techniques to develop a novel statistic for collectively test
UNCERTAINTY ANALYSIS FOR COMPUTATIONALLY interaction between all possible pairs of SNPs within two
data collected in Foggia, Italy.
EXPENSIVE MODELS genome regions or genes. By large-scale simulations,
David Ruppert*, Cornell University we demonstrate that the proposed new statistic has the
email: finleya@msu.edu
Christine A. Shoemaker, Cornell University correct type 1 error rates and much higher power than
Yilun Wang, University of Electronic Science and the existing methods to detect gene-gene. To further
Technology of China evaluate its performance, the developed statistic is applied
DEMOGRAPHIC ANALYSIS OF FOREST DYNAMICS
Yingxing Li, Xiamen University to the lip metabolism trait exome sequence data from
USING STOCHASTIC INTEGRAL PROJECTION MODELS
Nikolay Bliznyuk, University of Florida the NHLBI’s Exome Sequencing Project (ESP) and whole
Alan E. Gelfand*, Duke University
Souparno Ghosh, Texas Tech University genome-sequence data.
MCMC is infeasible for Bayesian calibration and uncer-
James S. Clark, Duke University
tainty analysis of computationally expensive models if one email: momiao.xiong@uth.tmc.edu
must compute the model at each iteration. To address this
Demographic analysis for plant and animal populations
problem we introduced SOARS (Statistical and Optimiza-
is a prominent problem in studying ecological processes,
tion Analysis using Response Surfaces) methodology. ASSOCIATION MAPPING OF RARE VARIANTS
typically using Matrix Projection Models. Integral projec-
SOARS uses an interpolator as a surrogate, also known IN SAMPLES WITH RELATED INDIVIDUALS
tion models (IPMs) offer a continuous version of this
as an emulator or meta-model, for the logarithm of the Duo Jiang, University of Chicago
approach. These models are a class of integro-differential
posterior density. To prevent wasteful evaluations of the Mary Sara McPeek*, University of Chicago
equations which, for demography, we specify a redistribu-
expensive model, the emulator is only built on a high pos-
tion kernel mechanistically using demographic functions,
terior density region (HPDR), which is located by a global One fundamental problem of interest is to identify
i.e., parametric models for demographic processes such
optimization algorithm. The set of points in the HPDR genetic variants that contribute to observed variation in
as survival, growth, and replenishment. With interest in
where the expensive model is evaluated is determined human complex traits. With the increasing availability of
scaling in space, we work with data in the form of point
sequentially by the GRIMA algorithm. A case study uses high-throughput sequencing data, there is the possibility
patterns rather than with individual level data (hopeless
of identifying rare variants that influence a trait, but there
Abstracts 78
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
HEART-TO-HEART DIARY OF PHYSICAL ACTIVITY on the criteria proposed by Troiano (2007), which has
S
AB
Vadim Zipunnikov*, Johns Hopkins Bloomberg School been used in several population based studies, including
of Public Health the National Health and Nutrition Examination Survey
usually involves an extensive search among a large Jennifer Schrack, Johns Hopkins Bloomberg School (NHANES). We evaluated and improved this algorithm for
number of hypotheses to separate signals of interest of Public Health both uniaxial and triaxial ActiGraph data. The improved
and also recognize their patterns. The situation can be Ciprian Crainiceanu, Johns Hopkins Bloomberg School algorithm classified wear and nonwear intervals more
described as finding needles of various shapes in a hay- of Public Health accurately, and may lead to more accurate estimation of
stack. Despite the enormous progress on methodological Jeff Goldsmith, Columbia University time spent in sedentary and active behaviors.
work in data screening, pattern recognition and related Luigi Ferrucci, National Institute on Aging, National
fields, there have been little theoretical studies on the Institutes of Health email: leena.choi@vanderbilt.edu
issues of optimality and error control in situations where
a large number of decisions are made sequentially and Physical activity energy expenditure is a modifiable risk
simultaneously. We develop a compound decision theo- factor for multiple chronic diseases. Accurate measure- QUANTIFYING PHYSICAL ACTIVITY USING
retic framework and propose a new loss matrix approach ment of free-living energy expenditure is vital to under- ACCELEROMETERS
to generalize the current multiple testing framework for standing and quantifying changes in energy metabolism Julia Kozlitina*, University of Texas Southwestern
error control in pattern recognition, by allowing more and their effects on activity, disability, and disease in Medical Center
than two states of nature, sequential decision-making population-based studies. Actiheart is a novel device that William R. Schucany, Southern Methodist University
and new concepts of false positive rates in large-scale monitors minute-by-minute activity counts and heart
simultaneous inference. rate and uses both to estimate energy expenditure. We Accelerometers have become a widely used tool for the
will first illustrate key statistical challenges with data objective assessment of physical activity (PA) in large
email: wenguans@marshall.usc.edu collected on 794 subjects wearing Actiheart for one week epidemiological and surveillance studies. Despite their
as a part of the Baltimore Longitudinal Study of Aging. widespread use, many questions remain regarding the
We will then describe the methods to address those processing and interpretation of accelerometer output
challenges and parsimoniously model a complex inter- in order to derive accurate measures of PA. Traditionally,
75. STATISTICAL BODY LANGUAGE: dependence between activity and heart rate. At the end, accelerometer data have been summarized as time spent
ANALYTICAL METHODS FOR we will show how our models can be used to construct in different PA intensities, using established cut-points
WEARABLE COMPUTING a population-based dynamic diary that describes and to classify activity into intensity levels. To reflect the
quantifies the most common daily patterns of activity. accumulation of activity that may be meaningful for
A NOVEL METHOD TO ESTIMATE FREE-LIVING energy balance, the time spent above various intensity
ENERGY EXPENDITURE FROM AN ACCELEROMETER email: vzipunni@jhsph.edu thresholds is further summarized into continuous 10-min
bouts. This suggests that local averaging may be helpful
John W. Staudenmayer*, University of Massachusetts, in identifying intervals of PA corresponding to different
Amherst A NEW ACCELEROMETER WEAR AND NONWEAR intensity levels. We use different smoothing techniques
Kate Lyden, University of Massachusetts, Amherst TIME CLASSIFICATION ALGORITHM when extracting PA bout information and examine their
impact on the resulting outcome variables. We illustrate
The purpose of this paper is to develop and validate a Leena Choi*, Vanderbilt University School of Medicine the analysis using objectively measured activity data from
novel method for estimating energy expenditure in free- Suzanne C. Ward, Vanderbilt University School a large population-based study.
living people. The method uses an accelerometer, a device of Medicine
that measures and records the quantity and intensity John F. Schnelle, Vanderbilt University School of Medicine email: Julia.Kozlitina@UTSouthwestern.edu
of movement, and an algorithm to estimate energy Maciej S. Buchowski, Vanderbilt University School
expenditure from the resulting accelerometer signals. The of Medicine
proposed method is a two-step process. In the first step, 76. BIOMARKER UTILITY
we use simple characteristics of the acceleration signal The use of accelerometers for measuring physical activity IN CLINICAL TRIALS
to identify where bouts activity and inactivity start and (PA) in intervention and population-based studies is
stop. In the second step, we estimate energy expenditure becoming a standard methodology for the objective DESIGN AND ANALYSIS OF BIOMARKER THRESHOLD
(METs) for each bout using methods that were previously measurement of sedentary and active behaviors. Data STUDIES IN RANDOMIZED CLINICAL TRIALS
developed and validated in the laboratory on several hun- collected by accelerometers such as ActiGraph in a natural Glen Laird*, Bristol-Myers Squibb
dred people. We compare the proposed algorithm, which free-living environment can be divided into wear and Yafeng Zhang, University of California, Los Angeles
we call the “sojourn algorithm,” to existing methods using nonwear time intervals. Since these accelerometer data
data from a group of individuals who were each directly are very large, it is not feasible to manually classify As the importance of individualized medicine grows, the
observed over the course of two 10-hour days. The nonwear and wear time intervals without using an use of biomarkers to guide population selection is becom-
sojourn algorithm is more accurate and precise than exist- automated algorithm. Thus, a vital step in PA measure- ing more common. Testing for significant effects in a
ing methods. The new algorithm is specifically designed ment is the classification of daily time into accelerometer biomarker-defined subpopulation can improve statistical
for use in free-living environments where behavior is not wear and nonwear intervals using its recordings (counts) power and permit focused treatment on patients most
planned and does not occur in intervals of known dura- and an accelerometer-specific algorithm. Typically, an likely to benefit. A key biomarker may be measured on a
tion. It also has the potential to provide more detailed automated algorithm uses monitor-specific criteria to continuous scale, but a binary classification of patients
information about information about the duration and detect and to eliminate the nonwear time intervals, is convenient for treatment purposes. Towards this goal,
frequency of bouts of activity and inactivity. during which no activity is detected. The most commonly determination of a biomarker threshold level may be be
used automated algorithm for ActiGraph data is based
email: jstauden@math.umass.edu
Abstracts 80
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
as a function of subject-specific random effects and time dinal studies. Simulation studies were performed which
S
AB
varying covariates. Our model allows us to measure how demonstrate that the proposed method performs well
covariates can influence both the mean and the variance and yields better estimations compared to other single
A LOCATION SCALE ITEM RESPONSE THEORY (IRT) of physical activity over time. It also allows us to measure time point meta-analysis methods. We apply our method
MODEL FOR ANALYSIS OF ORDINAL QUESTIONNAIRE whether changes in variability over the course of the to a set of studies from patients with type 2 diabetes.
DATA study are predictive of treatment relapse.
Donald Hedeker*, University of Illinois at Chicago email: fuhaoda@gmail.com
Robin J. Mermelstein, University of Illinois at Chicago email: siddique@northwestern.edu
Questionnaires are commonly used in studies of health to ADAPTIVE TRIAL DESIGN IN THE PRESENCE OF
measure severity of illness, for example, and the items are 78. EVIDENCE SYNTHESIS FOR HISTORICAL CONTROLS
Brian P. Hobbs*, University of Texas MD Anderson
often scored on an ordinal scale. For such questionnaires, ASSESSING BENEFIT AND RISK Cancer Center
item response theory (IRT) models provide a useful
approach for obtaining summary scores for subjects ( Bradley P. Carlin, University of Minnesota
SYSTEMATIC REVIEWS IN COMPARATIVE
the model's random subject effect) and characteristics Daniel J. Sargent, Mayo Clinic
EFFECTIVENESS RESEARCH
of the items ( item difficulty and discrimination). We Sally C. Morton*, University of Pittsburgh
describe an extended IRT model that allows the items to Prospective trial design often occurs in the presence of
exhibit different within-subject (WS) variance, and also ‘acceptable’ (Pocock, 1976) historical control data. Typi-
Systematic reviews in comparative effectiveness research
include a subject-level random effect to the WS variance cally this data is only utilized for treatment comparison
(CER) are essential to identify gaps in evidence for making
specification. This permits subjects to be characterized in a posteriori, retrospective analysis. We propose an
decisions that matter to stakeholders, most notably
in terms of their mean level, or location, and their vari- adaptive trial design in the context of an actual random-
patients. In this talk, I will discuss unique issues that
ability, or scale. We illustrate application of this location ized controlled colorectal cancer trial. The proposed trial
may arise in a CER systematic review. For example, such
scale IRT model using data from the Nicotine Dependence implements an adaptive randomization procedure for
systematic reviews may include observational data in
Syndrome Scale (NDSS) assessed in an adolescent smok- allocating patients aimed at balancing total information
order to assess both benefits and harms. Methodologi-
ing study. We show that there is an interaction between (concurrent and historical) among the study arms. This
cal techniques such as network meta-analysis may be
a subject’s mean and scale in predicting future smoking is accomplished by assigning more patients to receive
required to compare treatments that have the potential
level, such that, for low-level smokers, increased scale is the novel therapy in the absence of strong evidence
to be best practice. The risk of bias for individual studies
associated with subsequent higher smoking levels. The for heterogeneity among the concurrent and historical
must be assessed to estimate effectiveness in real-world
proposed location scale IRT model has useful applications controls. Allocation probabilities adapt as a function of
settings. Finally, the strength of the body of evidence will
in research where questionnaires are often rated on an or- the effective historical sample size (EHSS) character-
need to be determined to inform decision-making. Cur-
dinal scale, and there is interest in characterizing subjects izing relative informativeness defined in the context of
rent guidance from the Agency for Healthcare Research
in terms of both their mean and variance. a piecewise exponential model for evaluating time to
and Quality (AHRQ); the Cochrane Collaboration; and the
disease progression. A Bayesian hierarchical model is
Institute of Medicine (IOM) will be included.
email: hedeker@uic.edu used to assess historical and concurrent heterogeneity
at interim analyses and to borrow strength from the
email: scmorton@pitt.edu
historical data in the final analysis. Using the proposed
BAYESIAN MIXED-EFFECTS LOCATION SCALE hierarchical model to borrow strength from the historical
MODELS FOR THE ANALYSIS OF OBJECTIVELY data, after balancing total information with the adap-
BAYESIAN INDIRECT AND MIXED TREATMENT
MEASURED PHYSICAL ACTIVITY DATA FROM tive randomization procedure, provides preposterior
COMPARISONS ACROSS LONGITUDINAL TIME POINTS
A LIFESTYLE INTERVENTION TRIAL admissible estimators of the novel treatment effect with
Haoda Fu*, Eli Lilly and Company
Juned Siddique*, Northwestern University Feinberg desirable bias-variance trade-offs.
Ying Ding, Eli Lilly and Company
School of Medicine
Donald Hedeker, University of Illinois at Chicago email: bphobbs@mdanderson.org
Meta-analysis has become an acceptable and powerful
tool for pooling quantitative results from multiple studies
Objective measurement of physical activity using addressing the same question. It estimates the effect
wearable accelerometers is now used in large-scale INCORPORATING EXTERNAL INFORMATION TO ASSESS
difference between two treatments when they have been
epidemiological studies and clinical trials of lifestyle ROBUSTNESS OF COMPARATIVE EFFECTIVENESS
compared head-to-head. However, limitations occur
and exercise interventions. These devices measure the fre- ESTIMATES TO UNOBSERVED CONFOUNDING
when there are more than two treatments of interest
quency, intensity, and duration of physical activity at the Mary Beth Landrum*, Harvard Medical School
and some of them have not been compared in the same
momentary level, often using measurement epochs of 1 Alfa Alsane, Harvard Medical School
study. Indirect and mixed treatment modeling extends
minute or less yielding a large number of observations meta-analysis methods to enable data from different
per subject. In this talk, we describe a Bayesian mixed- Successful reform of the health care delivery system
treatments and trials to be synthesized, without requiring
effects location scale model for the analyses of physical relies on improved information about the effectiveness
head-to-head comparisons among all treatments; thus,
activity as measured by accelerometer using data from of therapies in real world practice. While comparative ef-
allowing different treatments can be compared. Tradi-
a randomized lifestyle intervention trial of 204 men and fectiveness research often relies on synthesis of evidence
tional indirect and mixed treatment comparison methods
women. We model both the mean and variance over time from randomized clinical trials to infer effectiveness of
consider a single endpoint for each trial. We extend the
current methods and propose a Bayesian indirect and
mixed treatment comparison longitudinal model. That
incorporates multiple time points and allows indirect
comparisons of treatment effects across different longitu-
Abstracts 82
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
gain statistical power. We propose novel statistical testing STATISTICAL INFERENCE OF COVARIATE-ADAPTIVE
S
AB
Abstracts 84
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
screen for possible incident cases and perform the ‘gold BAYESIAN REGRESSION ANALYSIS OF MULTIVARIATE
S
AB
Abstracts 86
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
of our method through a simulation study and an applica- the same study participants used to develop the original
S
AB
tion to a real multiple-batch data set and show that our prediction tool, necessitating the merging of a separate
method are useful, justifiable, and very robust in practice. study of different participants, which may be much
STATISTICAL CHARACTERIZATION AND EVALUATION The method is implemented in the 'ComBat' function in smaller in sample size and of a different design. This talk
OF MICROARRAY META-ANALYSIS METHODS: the ‘sva’ Bioconductor package. reports on the application of Bayes rule for updating risk
A PRACTICAL APPLICATION GUIDELINE prediction tools to include a set of biomarkers measured
Lun-Ching Chang*, University of Pittsburgh email: wej@bu.edu in an external study to the original study used to develop
Hui-Min Lin, University of Pittsburgh the risk prediction tool. The procedure is illustrated in the
George C. Tseng, University of Pittsburgh context of updating the online Prostate Cancer Prevention
83. STATISTICAL METHODS IN CANCER Trial Risk Calculator (PCPTRC) to incorporate the new
markers %freePSA and [-2]proPSA and single nucleotide
As high-throughput genomic technologies become more APPLICATIONS polymorphisms recently identified through genomewide
accurate and affordable, an increasing number of data
sets have accumulated in the public domain and genomic association studies. The updated PCPTRC models are
RECLASSIFICATION OF PREDICTIONS FOR COMPARING
information integration and meta-analysis have become compared to the existing PCPTRC through validation on
RISK PREDICTION MODELS
routine in biomedical research. In this paper, we focus an external data set, based on discrimination, calibration
Swati Biswas*, University of Texas, Dallas
on microarray meta-analysis, where multiple microarray and clinical net benefit metrics.
Banu Arun, University of Texas MD Anderson
studies with relevant biological hypotheses are combined Cancer Center
in order to improve candidate marker detection. Many email: ankerst@ma.tum.de
Giovanni Parmigiani, Dana Farber Cancer Institute
methods have been developed and applied, but their and Harvard School of Public Health
performance and properties have only been minimally
investigated. There is currently no clear conclusion or PARAMETRIC AND NON PARAMETRIC ANALYSIS
Risk prediction models play an important role in preven-
guideline as to the proper choice of a meta-analysis OF COLON CANCER
tion and treatment of several diseases. Models that are
method given an application. Here we perform a com- Venkateswara Rao Mudunuru*, University of
in clinical use are often refined and improved. In many
prehensive comparative analysis for twelve microarray South Florida
instances, the most efficient way to improve a ‘successful’
meta-analysis methods through simulations and six Chris P. Tsokos, University of South Florida
model is to identify subgroups of populations for which a
large-scale applications using four evaluation criteria. specific biological rationale exists and tailor the improved
We elucidate hypothesis settings behind the methods The object of the present study is to perform statistical
model to those subjects, an approach especially in line
and further apply multi-dimensional scaling (MDS) and analysis of malignant colon tumor with the tumor size
with personalized medicine. At present, we lack statistical
an entropy measure to characterize the meta-analysis being the response variable. We determined that the
tools to evaluate improvements targeted to specific
methods and data structure, respectively. The aggregated tumor sizes of whites, African Americans and other races
sub-groups. Here we propose simple tools to fill this gap.
results provide an insightful and practical guideline to the are statistically different. The probability distribution that
First, we extend a recently proposed measure, Integrated
choice of the most suitable method in a given application. characterize the behavior of the response variable was
Discrimination Improvement, using a linear model with
obtained along with the confidence limits. The malignant
covariates representing the sub-groups. Next, we develop
email: lunching@gmail.com tumor size as a function of age was partitioned into three
graphical and numerical tools that compare reclassifica-
significant age intervals and the mathematical function
tion of two models but focusing only on those subjects
that characterizes the size of the tumor as a function of
for whom the two models reclassify differently. We apply
UNCONFOUNDING THE CONFOUNDED: ADJUSTING age was determined for each age interval.
these approaches to the genetic risk prediction model
FOR BATCH EFFECTS IN COMPLETELY CONFOUNDED for breast cancer BRCAPRO, using clinical data from MD
DESIGNS IN GENOMIC STUDIES email: vmudunur@mail.usf.edu
Anderson Cancer Center. We also conduct a simulation
W. Evan Johnson*, Boston University School of Medicine study to investigate properties of the new reclassification
Timothy M. Bahr, University of Iowa School of Medicine measure and compare it with currently used measures.
ASSESSING INTERACTIONS FOR FIXED-DOSE DRUG
Our results show that the proposed tools can successfully
Batch effects are often observed across multiple batches COMBINATIONS IN TUMOR XENOGRAFT STUDIES
uncover sub-group specific model improvements.
of high-throughput data. Supervised and unsupervised Jianrong Wu*, St. Jude Children’s Research Hospital
methods have previously been developed to account for Lorriaine Tracey, St. Jude Children’s Research Hospital
email: swati.biswas@utdallas.edu
simple batch effects forexperiments where each batch Andrew Davidoff, National University of Singapore
contains multiple control and treatment samples. How-
ever, no method has been designed for data sets where Statistical methods for assessing the joint action of
UPDATING EXISTING RISK PREDICTION TOOLS
the control samples are completely contained in one set compounds administered in combination have been es-
FOR NEW BIOMARKERS
of batches and the treatment samples are contained in tablished for many years. However, there is little literature
Donna P. Ankerst*, Technical University, Munich
another set of batches. Data sets that are completely available on assessing the joint action of fixed-dose drug
Andreas Boeck, Technical University, Munich
confounded in this manner are generally discarded due combinations in tumor xenograft experiments. Here an
to lack of identifiability between treatment and batch ef- interaction index for fixed-dose two-drug combinations
Online risk prediction tools for common cancers are now
fects. Here we propose a method that uses a rank test and is proposed. Furthermore, a regression analysis is also
easily accessible and widely used by patients and doctors
an Empirical Bayes framework to model systematically discussed. Actual tumor xenograft data were analyzed to
for informed decision-making concerning screening. A
varying batch effects in completely confounded designs illustrate the proposed methods.
practical problem is as cancer research moves forward
or in batches that contain a single sample. We then are and new biomarkers are discovered, there is a need to
able to adjust data sets for batch effects and accurately email: jianrong.wu@stjude.org
update the risk algorithms to include them. Typically,
estimate treatment effects. We illustrate the robustness the new markers cannot be retrospectively measured on
Abstracts 88
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
high resolution found that most local genomic regions blood pressures were selected for whole-exome sequenc-
S
AB
exhibit homogeneous Hi-C dataset generated from mouse ing. In the NHLBI Cohorts for Heart and Aging Research in
embryonic stem cells, we 3D chromosomal structures. We Genomic Epidemiology (CHARGE) resequencing project,
MODELING THE DISTRIBUTION OF further constructed a model for the spatial arrangement of subjects with the highest values on one of twenty quan-
PERIODONTAL DISEASE WITH A GENERALIZED chromatin, which reveals structural properties associated titative traits were selected for target sequencing, along
VON MISES DISTRIBUTION with euchromatic and heterochromatic regions in the ge- with a random sample. Failure to account for such trait-
Thomas M. Braun*, University of Michigan nome. We observed strong associations between structural dependent sampling can lead to severe inflation of type I
Samopriyo Maitra, University of Michigan properties and several genomic and epigenetic features error and substantial loss of power in the quantitative trait
of the chromosome. Using BACH-MIX, we further found analysis. We present valid and efficient likelihood-based
Periodontal disease is a common cause of tooth loss that the structural variations of chromatin are correlated inference procedures under general trait-dependent
in adults. The severity of periodontal disease is usually with these genomic and epigenetic features. Our results sampling. Our methods can be used to perform quantita-
quantified based upon the magnitudes of several tooth- demonstrate that BACH and BACH-MIX have the potential tive trait analysis not only for the trait that is used to select
level clinical parameters, the most common of which is to provide new insights into the chromosomal architecture subjects for sequencing but also for any other traits that
clinical attachment level (CAL). Recent clinical studies of mammalian cells. are measured. We also investigate the relative efficiency
have presented data on the distribution of periodontal of various sampling strategies. The proposed methods
disease in hopes of providing information for localized email: jliu1600@gmail.com are demonstrated through simulation studies and the
treatments that can reduce the prevalence of periodontal aforementioned NHLBI sequencing projects.
disease. However, these findings have been descriptive
without consideration of statistical modeling for estima- MICROBIOME, METAGENOMICS AND HIGH email: lin@bios.unc.edu
tion and inference. To this end, we visualize the mouth as a DIMENSIONAL COMPOSITION DATA
circle and the teeth as points located on the circumference Hongzhe Li*, University of Pennsylvania
of the circle to allow the use of circular statistical methods AN EMPIRICAL BAYESIAN FRAMEWORK
to determine the mean locations of diseased teeth. We With the development of next generation sequencing FOR ASSESSMENT OF INDIVIDUAL-SPECIFIC
assume the directions of diseased teeth, as determined by technology, researchers have now been able to study the RISK OF RECURRENCE
their tooth averaged CAL values, to be observations from a microbiome composition using direct sequencing, whose Kevin Eng, University of Wisconsin, Madison
Generalized von Mises distribution. Because multiple teeth output are bacterial taxa counts for each microbiome Shuyun Ye, University of Wisconsin, Madison
from a subject are correlated, we use a bias-corrected sample. One goal of microbiome studies is to associate the Ning Leng, University of Wisconsin, Madison
generalized estimating equation approach to obtain microbiome composition with environmental covariates Christina Kendziorski*, University of Wisconsin, Madison
robust variance estimates for our parameter estimates. or clinical outcomes, including (1) identification of the
Via simulations of data motivated from an actual study of biological/environemental factors that are associated with Accurate assessment of an individual's risk for disease
periodontal disease, we demonstrate that our methods bacterial compositions; (2) identification of the bacterial recurrence following an initial treatment has implications
have excellent performance in the moderately small taxa that are associated with clinical outcomes. Statistical for patient health as it enables an individual to receive
sample sizes common to most periodontal studies. models to address these problems need to account for the interventionary measures, potentially enabling a later
high-dimensional , sparse and compositional nature of the recurrence, or preventing the recurrence altogether.
email: tombraun@umich.edu data. In addition, the prior phylogenetic tree among the Toward this end, we have developed an empirical Bayes-
bacterial species provides useful information on bacterial ian framework that combines measurements from high-
phylogeny. In this talk, I will present several statisti- throughput genomic technologies with medical records
cal methods we developed for analyzing the bacterial to estimate an individual’s risk of a time-to-event pheno-
85. FRONTIERS IN STATISTICAL compositional data, including kernel-based regression, type. The framework has high sensitivity and specificity
GENETICS AND GENOMICS sparse Dirichlet-multinomial regression, compositional for predicting those at risk for ovarian cancer recurrence,
data regression and construction of bacterial taxa network is validated in independent data sets and experimentally,
BAYESIAN INFERENCE OF SPATIAL ORGANIZATIONS OF based on compositional data. I demonstrate the methods and is used to suggest alternative treatments.
CHROMOSOMES using a data set that links human gut microbiome to diet
Ming Hu, Harvard University intake in order to identify the micro-nutrients that are as- email: kendzior@biostat.wisc.edu
Ke Deng, Harvard University sociated with the human gut microbiome and the bacteria
Zhaohui Qin, Emory University that are associated with body mass index.
Bing Ren, University of California, San Diego 86. BIG DATA: WEARABLE COMPUTING,
Jun S. Liu*, Harvard University email: hongzhe@upenn.edu
CROWDSOURCING, SPACE
Knowledge of spatial chromosomal organizations is critical TELESCOPES, AND BRAIN IMAGING
for the study of transcriptional regulation and other DESIGNS AND ANALYSIS OF SEQUENCING STUDIES
nuclear processes in the cell. Recently, chromosome con- WITH TRAIT-DEPENDENT SAMPLING STATISTICAL CHALLENGES IN LARGE ASTRONOMICAL
formation capture (3C) based technologies, such as Hi-C Danyu Lin*, University of North Carolina, Chapel Hill DATA SETS
and TCC, have been developed to provide a genome-wide, Alexander S. Szalay*, Johns Hopkins University
three-dimensional (3D) view of chromatin organiza- It is not economically feasible to sequence all study
tion. Here we describe a novel Bayesian probabilistic subjects in a large cohort. A cost-effective strategy is to Starting with the Sloan Digital Sky Survey, astronomy
approach, BACH, to infer the consensus 3D chromosomal sequence only the subjects with the extreme values of a has started to collect very large data sets, covering a
structure. In addition, we describe a variant algorithm quantitative trait. In the NHLBI Exome Sequencing Project, large part of the sky. Once these have been made publicly
BACH-MIX to study the structural variations of chromatin subjects with the highest or lowest values of BMI, LDL or available, their analyses have created several nontrivial
in a cell population. Applying BACH and BACH-MIX to a statistical and computational challenges. The talk will
Abstracts 90
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
88. SAMPLE SIZE PLANNING FOR SAMPLE SIZE EVALUATION IN CLINICAL TRIALS
S
AB
We aim for estimating the number and locations of STEPWISE SIGNAL EXTRACTION VIA SPARSE ESTIMATION OF CONDITIONAL GRAPHICAL
change-points of a piecewise constant regression func- MARGINAL LIKELIHOOD MODELS WITH APPLICATION TO GENE NETWORKS
tion by minimizing the number of change-points over the Chao Du*, Stanford University Bing Li*, The Pennsylvania State University
acceptance region of a multiscale test. Deviation bounds Samuel C. Kou, Harvard University Hyonho Chun, Purdue University
for the estimated number of change points as well as Hongyu Zhao, Yale University
convergence rates for the change-point locations close We propose a new method to estimate the number
to the sampling rate 1/n are proven. We further derive and locations of change-points in stepwise signal. Our In many applications the graph structure in a network
the asymptotic distribution of the test statistic which can approach treats each possible set of change-points as arises from two sources: intrinsic connections and
be used to construct asymptotically honest confidence an individual model and uses marginal likelihood as the connections due to external effects. We introduce a
bands for the regression function. We show how dynamic model section tool. Under an independence assumption sparse estimation procedure for graphical models that is
programming techniques can be employed for efficient of the parameters between successive change-points, capable of isolating the intrinsic connections by removing
computation of estimators and confidence regions and the computational complexity of this approach is at most the external effects. Technically, this is formulated as a
compare our method with state-of-the-art change-point quadratic in the number of observations using a dynamic conditional graphical model, in which the external effects
detection methods in the recent literature. Finally, the programming algorithm. The asymptotic properties of are modeled as predictors, and the graph is determined
performance of the proposed multiscale approach is il- the marginal likelihood are studied. This paper further by the conditional precision matrix. We introduce two
lustrated, when applied to cutting-edge applications such discusses the impact of the prior on the estimation sparse estimators of this matrix using reproducing kernel
as genetic engineering and photoemission spectroscopy. and provide guidelines in choosing the prior. Detailed Hilbert space combined with lasso and adaptive lasso.
simulation study is carried out to compare the effective- We establish the sparsity, variable selection consistency,
e-mail: munk@math.uni-goettingen.de ness of this method with other existing methods. We oracle property, and the asymptotic distributions of the
demonstrate this approach on DNA array CGH data and proposed estimators. We also develop their convergence
single molecule enzyme data. Our study shows that this rate when the dimension of the conditional precision
CHANGE POINT SEGMENTATION FOR TIME DYNAMIC method is capable of coping with a wide range of models matrix goes to infinity. The methods are compared with
VOLTAGE DEPENDENT ION CHANNEL RECORDINGS and has appealing properties in applications. sparse estimators for unconditional graphical models,
Rebecca von der Heide, Georgia Augusta and with the constrained maximum likelihood estimate
University of Goettingen e-mail: chaodu@stanford.edu that assumes a known graph structure. The methods are
Thomas Hotz*, Ilmenau University of Technology applied to a genetic data set to construct a gene network
Hannes Sieling, Georgia Augusta University of Goettingen conditioning on single-nucleotide polymorphisms.
Claudia Steinem, Georgia Augusta University 90. NEW CHALLENGES FOR NETWORK
of Goettingen e-mail: bing@stat.psu.edu
Ole Schuette, Georgia Augusta University of Goettingen
DATA AND GRAPHICAL MODELING
Ulf Diederichsen, Georgia Augusta University
CONSISTENCY OF COMMUNITY DETECTION
of Goettingen MODEL-BASED CLUSTERING OF LARGE NETWORKS
Yunpeng Zhao, George Mason University
Tatjana Polupanow, Georgia Augusta University Duy Q. Vu, University of Melbourne
Elizaveta Levina, University of Michigan
of Goettingen David R Hunter*, The Pennsylvania State University
Ji Zhu*, University of Michigan
Katarzyna Wasilczuk, Georgia Augusta University Michael Schweinberger, The Pennsylvania State University
of Goettingen
Community detection is a fundamental problem in
Axel Munk, Georgia Augusta University of Goettingen We describe a network clustering framework, based
network analysis, with applications in many diverse
on finite mixture models, that can be applied to
areas. The stochastic block model is a common tool for
The characterization and reconstruction of ion channel discrete-valued networks with hundreds of thousands
model-based community detection, and asymptotic tools
functionalities is an important issue to understand of nodes and billions of edge variables. Relative to other
for checking consistency of community detection under
cellular processes. For this purpose we suggest a new recent model-based clustering work for networks, we
the block model have been recently developed (Bickel
change-point detection technique to analyze current introduce a more flexible modeling framework, improve
and Chen, 2009). However, the block model is limited
traces of ion channels. The underlying ion channel the variational-approximation estimation algorithm,
by its assumption that all nodes within a community
signal is assumed to be piecewise constant. For many discuss and implement standard error estimation via a
are stochastically equivalent, and provides a poor fit
membrane gating proteins the ion channel signals are parametric bootstrap approach, and apply these methods
to networks with hubs or highly varying node degrees
in the order of picoampere resulting in a low signal-to- to much larger datasets than those seen elsewhere in
within communities, which are common in practice. The
noise ratio. To overcome this burden we suggest a locally
degree-corrected block model (Karrer and Newman,
adaptive approach which is built on a multiscale statistic
2010) was proposed to address this shortcoming, and
and only assumes a locally constant signal. Furthermore,
allows variation in node degrees within a community
we generalize our approach to time-varying exogenous
Abstracts 92
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
91. BAYESIAN ANALYSIS OF HIGH upon the probability of SC between brain regions, with
S
AB
Abstracts 94
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
e-mail: nrhyun@live.unc.edu
Abstracts 96
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
on empirical frequencies. When applied to the motivat- calibration adjustments valid when a covariate that is
S
AB
ing cigarette consumption data, the frequency-based associated only with the measurement error process and
gravity model produced the better fit. This method holds not the outcome itself is included in the primary regres-
PROPORTIONAL HAZARDS MODEL WITH potential for application to a wide range of self-reported sion model? Does the inclusion of such a variable in the
FUNCTIONAL COVARIATE MEASUREMENT ERROR count data. primary regression model induce extraneous variation in
AND INSTRUMENTAL VARIABLES the resulting estimator? Clear answers to these questions
Xiao Song*, University of Georgia e-mail: griffis5@ccf.org would provide valuable insight and improve estimation
Ching-Yun Wang, Fred Hutchinson Cancer Research Center of exposure disease associations measured with error. In
the paper, we address these questions analytically and
In biomedical studies, covariates with measurement error PATHWAY ANALYSIS OF GENE-ENVIRONMENT develop extended regression calibration estimators as
may occur in survival data. Existing approaches mostly INTERACTIONS IN THE PRESENCE OF MEASUREMENT needed based on assumptions about the underlying as-
require certain replications on the error-contaminated ERROR IN THE ENVIRONMENTAL EXPOSURE sociation between disease and covariates, for both linear
covariates, which may not be available in the data. In this Stacey E. Alexeeff*, Harvard School of Public Health and logistic regression models. The methods are applied
paper, we develop a simple nonparametric correction Xihong Lin, Harvard School of Public Health to data from the Nurses’ Health Study.
approach for the proportional hazards model using
measurements on instrumental variables observed in a Many complex disease processes are thought to be e-mail: stxia@channing.harvard.edu
subset of the sample. The instrumental variable is related influenced by a number of genetic and environmental
to the covariates through a general nonparametric model, factors. Pathway analysis is a growing area of method-
and no distributional assumptions are placed on the error ological research, where the objective is to identify a set DISK DIFFUSION BREAKPOINT DETERMINATION
and the underlying true covariates. We further propose a of genetic and environmental risk factors that can explain USING A BAYESIAN NONPARAMETRIC VARIATION
novel generalized methods of moments nonparametric a meaningful proportion of disease susceptibility. Since OF THE ERRORS-IN-VARIABLES MODEL
correction estimator to improve the efficiency over the genes and environmental exposures on the same biologi- Glen DePalma*, Purdue University
simple correction approach. The efficiency gain can be cal pathway may interact functionally, there is growing Bruce A. Craig, Purdue University
substantial when the calibration subsample is small scientific interest to study these sets of factors in pathway
compared to the whole sample. The estimators are shown analysis. A score test can account for the correlation Drug dilution (MIC) and disk diffusion (DIA) are the two
to be consistent and asymptotically normal. Performance among the covariates in a test for pathway effect. Genes antimicrobial susceptibility tests used by hospitals and
of the estimators is evaluated via simulation studies and on the same pathway are expected to be correlated and clinics to determine an unknown pathogen’s susceptibil-
by an application to data from an HIV clinical trial. environmental exposures may also be correlated with ge- ity to various antibiotics. Both tests classify the pathogen
netic covariates. We consider the impact of measurement as either being susceptible, indeterminant, or resistant
e-mail: xsong@uga.edu error in the environmental exposure in a pathway test for to a drug. Since only one of these tests will typically be
the effect of a gene-environment interaction. Measure- used in practice, it is imperative to have the two tests
ment error in the environmental exposure impacts the calibrated. The MIC test deals with concentrations of a
DISTANCE AND GRAVITY: MODELING CONDITIONAL gene-environment interaction terms. We investigate drug so its classification breakpoints are based primarily
DISTRIBUTIONS OF HEAPED SELF-REPORTED how this error propogates through a linear health effects on a drug’s pharmacokinetics and pharmacodynamics.
COUNT DATA model, biases the coefficients and ultimately affects the The DIA test, on the other hand, does not and therefore its
Sandra D. Griffith*, Cleveland Clinic score test for pathway effect. breakpoints are determined by minimizing classification
Saul Shiffman, University of Pittsburgh discrepancies between pairs of MIC and DIA test results. It
Daniel F. Heitjan, University of Pennsylvania e-mail: salexeeff@fas.harvard.edu has been shown that this minimization procedure does not
adequately account for the inherent variability and unique
Self-reported daily cigarette counts typically exhibit mea- properties of each test and as a result, produces biased and
surement error, often manifesting as a preponderance of VARIABLE SELECTION FOR MULTIVARIATE imprecise breakpoints. In this paper we present a hierar-
round numbers. Heaping, a form of measurement error REGRESSION CALIBRATION WITH ERROR-PRONE chical errors-in-variables model that explicitly accounts
that occurs when quantities are reported with varying AND ERROR-FREE COVARIATES for these various factors of uncertainty and uses estimated
levels of precision, offers one explanation. A doubly- Xiaomei Liao*, Harvard School of Public Health probabilities from the model to determine appropriate
coded data set with both a conventional retrospective Kathryn Fitzgerald, Harvard School of Public Health breakpoints. We show through a simulation study that this
recall measurement (timeline followback) and an Donna Spiegelman, Harvard School of Public Health method leads to more accurate and precise results.
instantaneous measurement with a smooth distribution
(ecological momentary assessment), allows us to model Regression calibration is a popular method for correcting e-mail: gdepalma@purdue.edu
the conditional distribution of a self-reported count for bias in effect estimates when a disease risk factor is
given the underlying true count. Our model incorporates measured with error. However, the development of such
notions from cognitive psychology to conceptualize a methods has thus far been focused on unbiased estima-
subject’s selection of a self-reported count as a function tion and inference for the primary exposure or exposures
of both its distance from the true value and an intrinsic of interest, and not on finer aspects of model building and
attractiveness of the reported numeral, which we variable selection. Adjusting for measurement error using
denote its gravity. We develop a flexible framework for the regression calibration method evokes several ques-
parameterizing the model, allowing gravities based on tions concerning valid model construction in the presence
the roundness of numerals or data-driven gravities based of covariates. For instance, are the standard regression
e-mail: yanxun.xu@rice.edu Estimation of the skeleton of a directed acyclic graph GRAPHICAL NETWORK MODELS FOR MULTI-
(DAG) is of great importance for understanding the DIMENSIONAL NEUROCOGNITIVE PHENOTYPES OF
underlying DAG and causal effects can be assessed from PEDIATRIC DISORDERS
BAYESIAN INFERENCE OF MULTIPLE GAUSSIAN the skeleton when the DAG is not identifiable. We propose Vivian H. Shih*, University of California, Los Angeles
GRAPHICAL MODELS a novel method named ‘PenPC’ to estimate the skeleton of Catherine A. Sugar, University of California, Los Angeles
Christine B. Peterson*, Rice University a high-dimensional DAG by a two-step approach. We first
Francesco C. Stingo, University of Texas estimate the non-zero entries of a concentration matrix The rapidly emerging field of phenomics -- the study of
MD Anderson Cancer Center using penalized regression, and then fix the difference dimensional patterns of deficits characterizing specific
Marina Vannucci, Rice University between the concentration matrix and the skeleton by disorders -- plays a transformative role in fostering
evaluating a set of conditional independence hypotheses. breakthroughs in neuropsychiatric research. Multi-di-
We propose a Bayesian method for inferring multiple As illustrated by extensive simulations and real data mensional relationships among neurocognitive constructs
Gaussian graphical models that are believed to share studies, PenPC has significantly higher sensitivity and and intricate interactions between genes and behaviors
common features, but may differ in scientifically specificity than the standard-of-the-art method, the PC appear not only within a specific disorder but also cut
important respects. In our model, we place a Markov algorithm. We systematically study the asymptotic prop- across current diagnostic boundaries. Traditional dimen-
Random Field prior on the network structure for each erty of PenPC on high dimensional problem (the number sion reduction approaches such as principle component
sample group that both encourages similar structure of vertices p is in either polynomial or exponential scale of analysis and factor analysis generate new domains by
between related groups and accounts for reference sample size n) of traditional random graph model where collapsing across phenotypes before analysis, possibly
networks established by previous research. This formula- the number of connections of each vertex is limited and losing substantial information. On the other hand,
tion improves the reliability of the estimated networks scale-free DAGs where one vertex may be connected to a graphical network models constructively search for sparse
by allowing us to borrow strength across related sample large number of neighbors. relational structures of phenotypes within and across
groups and encouraging similarity to a known network. groups without the need of collapsing. This sparse covari-
Applications include comparison of the cellular metabolic e-mail: mjha@live.unc.edu ance estimation technique provides a holistic view of the
networks for control vs. disease groups, and the inference interconnectedness of phenotypic measures as well as
of protein-protein interaction networks for multiple specific hotspots within the underlying data structure. We
cancer subtypes.
e-mail: cbpeterson@gmail.com
Abstracts 98
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
which have been proposed in the literature together weighted GEEs and joint modeling approaches, and
S
AB
with the corresponding robust inferential procedures. present our recent extensions that address binary data
Then we discuss robust variable selection procedures and incorporate latent variables.
uncover vital phenotypes for childhood neuropsychiatric (including a generalized version of Mallows’s Cp) and we
disorders (e.g., ADHD, autism, 22q11.2 deletion syndrome, examine their performance in the longitudinal setup. e-mail: atroxel@mail.med.upenn.edu
and tic disorder) using the conventional estimation and Finally, longitudinal data typically derive from medical
modify the algorithm to reflect adjustments for other or other large-scale studies where often large numbers
covariates and longitudinal patterns across time. of potential explanatory variables and hence even larger 97. COMPLEX DESIGN AND
numbers of candidate models must be considered. In this
case we discuss a cross-validation Markov Chain Monte
ANALYTIC ISSUES IN GENETIC
e-mail: vivianhshih@gmail.com
Carlo procedure as a general variable selection tool which EPIDEMIOLOGIC STUDIES
avoids the need to visit all candidate models. Inclusion of
a “one-standard error” rule provides users with a collec- USING FAMILY MEMBERS TO AUGMENT
96. ADVANCES IN ROBUST ANALYSIS OF tion of good models. GENETIC CASE-CONTROL STUDIES OF A
LONGITUDINAL DATA LIFE-THREATENING DISEASE
e-mail: Elvezio.Ronchetti@unige.ch Lu Chen*, University of Pennsylvania School of Medicine
NONPARAMETRIC RANDOM COEFFICIENT MODELS Clarice R. Weinberg, National Institute of Environmental
FOR LONGITUDINAL DATA ANALYSIS: ALGORITHMS; Health Sciences, National Institutes of Health
ROBUSTNESS, AND EFFICIENCY ROBUST ANALYSIS OF LONGITUDINAL DATA WITH Jinbo Chen*, University of Pennsylvania School
John M. Neuhaus*, University of California, San Francisco INFORMATIVE DROP-OUTS of Medicine
Charles E. McCulloch, University of California, Sanjoy Sinha*, Carleton University
San Francisco Abdus Sattar, Case Western Reserve University Survival bias in case-control genetic association studies
Mary Lesperance, University of Victoria, Canada may arise due to an association between survival and
Rabih Saab, University of Victoria, Canada In this talk, I will discuss a robust method for analyzing genetic variants under study. It is difficult to adjust for
longitudinal data when there are informative drop-outs. the bias if no genetic data are available from deceased
Generalized linear mixed models with random intercepts This robust method is developed in the framework of cases. We propose to incorporate genotype data from
and slopes provide useful analyses of longitudinal data. weighted generalized estimating equations, and is useful family members (such as offspring, spouses, or parents)
Since little information exists to guide the choice of a for bounding the influence of potential outliers in the of deceased cases into retrospective maximum likelihood
parametric model for the distribution of random effects, data when estimating the model parameters. The weights analysis. Our method provides a partial data approach
several investigators have proposed approaches that leave considered are inverse probabilities of responses, and are for correcting survival bias and for obtaining unbiased
this distribution unspecified, but these approaches have estimated robustly in the framework of a pseudo-likeli- estimates of association parameters with di-allelic SNPs.
focussed on models with only random intercepts. In this hood. The empirical properties of the robust estimators This method faces an identifiability issue under a co-
talk we present an algorithm for fitting mixed effects will be discussed using simulations. An application of the dominant model for both penetrance and survival given
models with a nonparametric joint distribution of slopes proposed robust method will also be presented using real disease, so model simplifications are required. We derived
and intercepts. Using analytic and simulation studies, we data from genetic and inflammatory markers of a sepsis closed-form maximum likelihood estimates for associa-
compare the performance of this approach to that of fully study, which is a large cohort clinical study. tion parameters under the widely used log-additive and
parametric models with regard to bias and efficiency/ dominant association models. Our proposed method
mean square error in settings with correct and incorrect e-mail: sinha@math.carleton.ca can improve both validity and study power by enabling
specification of the joint distribution of random intercepts inclusion of deceased cases, and we provide simulations
and slopes. Fits of the nonparametric and fully parametric to demonstrate achievable improvements in efficiency.
mixed effects models to example data from longitudinal INFORMATIVE OBSERVATION
studies further illustrate the findings. TIMES IN LONGITUDINAL STUDIES e-mail: jinboche@mail.med.upenn.edu
Kay-See Tan, University of Pennsylvania
e-mail: john@biostat.ucsf.edu Benjamin French*, University of Pennsylvania
Andrea B. Troxel, University of Pennsylvania CASE-SIBLING STUDIES THAT
ACKNOWLEDGE UNSTUDIED PARENTS AND
ROBUST INFERENCE FOR MARGINAL LONGITUDINAL In longitudinal studies in medicine, subjects may be PERMIT UNMATCHED INDIVIDUALS
GENERALIZED LINEAR MODELS repeatedly observed according to a specified schedule, Min Shi*, National Institute of Environmental Health
Elvezio M. Ronchetti*, University of Geneva, Switzerland but they may also have additional visits for a variety of Sciences, National Institutes of Health
reasons. In many applications, these additional visits, David M. Umbach, National Institute of Environmental
Longitudinal models are commonly used for studying and the times at which they occur, are informative in Health Sciences, National Institutes of Health
data collected on individuals repeatedly through time the sense that they are associated with the outcome of Clarice R. Weinberg, National Institute of Environmental
and classical statistical methods are readily available interest. An example is warfarin dosing and maintenance, Health Sciences, National Institutes of Health
to carry out estimation and inference. However, in the in which subjects must be assessed regularly to ensure
presence of small deviations from the assumed model, that their blood clotting activity is within the normal Family-based designs enable assessment of genetic
these techniques can lead to biased estimates, p-values, range. Subjects outside the normal range must return associations without bias from population stratification.
and confidence intervals. Robust statistics deals with this to the clinic at much closer intervals until they are once However, parents are not always available - especially
problem and develops techniques that are not unduly again within range. In this talk, I will review methods for diseases with onset later in life - and the case-sibling
influenced by such deviations. In this talk we first review for addressing this problem, including inverse-intensity- design, where each case is matched with one or more un-
several robust estimators for marginal longitudinal GLM affected siblings, is useful. Analysis typically accounts for
within-family dependencies by using conditional logistic
ON
R
ST
IMPLEMENTATION OF A BIVARIATE DECONVOLUTION lustrative models, we find the optimal doses and compare
S
AB
APPROACH TO ESTIMATE THE JOINT DISTRIBUTION OF the probability of success, for fixed total sample sizes,
TWO NON-NORMAL RANDOM VARIABLES OBSERVED when one or two active doses are included in phase III.
biomarkers for status. The remaining talks in the session WITH MEASUREMENT ERROR
address some of the methodological challenges that arise Alicia L. Carriquiry, Iowa State University e-mail: carl-fredrik.burman@astrazeneca.com
when analyzing and interpreting this type of information Guillermo Basulto-Elías, Iowa State University
in the context of nutrition epidemiology. Eduardo A. Trujillo-Rivera*, Iowa State University
SIMULATION-GUIDED DESIGN FOR MOLECULARLY
e-mail: alicia@iastate.edu Replicate observations of 25(OH)D (biomarker for vitamin TARGETED THERAPIES IN ONCOLOGY
D status) and iPTH are available on a sample of individu- Cyrus R. Mehta*, Cytel Inc.
als. We assume that measurements are subject to non-
BIOMARKERS OF NUTRITIONAL STATUS: normal measurement error. We estimate the joint density The development of molecularly targeted therapies for
METHODOLOGICAL CHALLENGES of these bivariate data via non-parametric deconvolution. certain types of cancers (e.g., Vemurafenib for advanced
Victor Kipnis*, National Cancer Institute, The estimated density is used to compute statistics of melanoma with mutant BRAF; Cetuximab for metastatic
National Institutes of Health public health interest, such as the proportion of persons colorectal cancer with KRAS wild type) has let to the
in a group with 25(OH)D values below iPTH, or the value consideration of population enrichement designs that
Studies in nutritional epidemiology often fail to produce of 25(OH)D above which iPTH is approximately constant. explicitly factor-in the possibility that the experimental
consistent associations between dietary intake and We use a bootstrap approach to compute confidence compound might differentially benefit different biomark-
chronic disease. One of the main reasons may be substan- intervals. Several bivariate kernel density estimators for er subgroups. In such designs, enrollment would initially
tial measurement error, both random and systematic, in the noisy data and estimators for the characteristic func- be open to a broad patient population with the option to
self-reported assessment of dietary consumption. In the tion of the error are compared. restrict future enrollment, following an interim analysis,
absence of true observed dietary intakes, statistical meth- to only those biomarker subgroups that appeared to be
ods for adjusting for this error usually require a substudy e-mail: eduardo@iastate.edu benefiting from the experimental therapy. While this
with additional unbiased measurements of intake. In the strategy could greatly improve the chances of success for
talk, I will consider three classes of objective biomarkers the trial, it poses several statistical and logistical design
of dietary consumption and discuss challenges involved 100. UTILITIES OF STATISTICAL challenges. Since late-stage oncology trials are typically
event driven, one faces a complex trade-off between
in their use for mitigating the effect of dietary measure- MODELING AND SIMULATION power, sample size, number of events and study duration.
ment error.
FOR DRUG DEVELOPMENT This trade-off is further compounded by the importance
e-mail: kipnisv@mail.nih.gov of maintaining statistical independence of the data
GUIDED CLINICAL TRIAL DESIGN: DOES IT IMPROVE before and after the interim analysis and of optimizing
THE FINAL DESIGN? the timing of the interim analysis. This talk will highlight
J. Kyle Wathen*, Janssen Research & Development the crucial role of simulation-guided design for resolving
A SEMIPARAMETRIC APPROACH TO ESTIMATION IN
MEASUREMENT ERROR MODELS WITH ERROR-IN- these difficulties while nevertheless maintaining strong
Many important details of a clinical trial are often ignored control of the type-1 error.
THE-EQUATION: APPLICATION TO SERUM VITAMIN D
in order to simplify the statistical design. In this talk I will
Maria L. Joseph*, Iowa State University
present a case study of a trial where simulation was used e-mail: mehta@cytel.com
Alicia L. Carriquiry, Iowa State University
to gain insight into the impact of various adaptations
Wayne A. Fuller, Iowa State University
such as 2-stage design vs a single stage design, safety
Christopher T. Sempos, Office of Dietary Supplements,
rules based on two correlated outcomes and futility/su-
National Institutes of Health
periority decisions based on a third outcome. Simulation
101. RECENT ADVANCES IN SURVIVAL
Bess Dawson-Hughes, Human Nutrition Research
was also used to investigate the impact of many design AND EVENT-HISTORY ANALYSIS
Center on Aging at Tufts University
considerations as well as statistical modeling. Through
the use of simulation several logistical and statistical ANALYSIS OF DIRECT AND INDIRECT EFFECTS
The nonlinear relationship between observed serum IN SURVIVAL ANALYSIS
issues were raised, however, did the simulation improve
intact parathyroid hormone (iPTH) and observed serum Odd O. Aalen*, University of Oslo, Norway
the final clinical trial design?
25-hydroxyvitamin D (25(OH)D) has been studied com-
prehensively. Many studies ignore the measurement error Mediation analysis of survival data has been sorely miss-
e-mail: kwathen@its.jnj.com
in these observed quantities. We use a nonlinear function ing. In many settings one runs Cox analyses with baseline
to model the relationship between usual iPTH and usual covariates while internal time-dependent covariates are
25(OH)D, where usual represents the long run average of not included in the analysis due to perceived difficulties
ON THE CHOICE OF DOSES FOR PHASE III
the daily observations of these quantities. A semipa- of interpreting results. At the same time it is clear that
CLINICAL TRIALS
rametric maximum likelihood approach is proposed to the time-dependent covariates may contain information
Carl-Fredrik Burman*, AstraZeneca Research
estimate the nonlinear relationship in a measurement about the mechanism of the treatment effects. We shall
& Development
error model with error-in-the-equation. The estimation discuss the use of dynamic path analysis for studying this
procedures are applied to sample data. issue. The relation to causal inference will be pointed out.
It is important to consider how to optimize the choice of
dose or doses that continue into the confirmatory phase.
e-mail: emme.jay11@gmail.com e-mail: o.o.aalen@medisin.uio.no
Phase IIB dose-finding trials are relatively small and often
lack the ability of precisely estimating the dose-response
curves for efficacy and tolerability. Using simple but il-
e-mail: george.whitmore@mcgill.ca
Abstracts 102
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
context of a sequential least squares recursion. By asymp- with a terminal event for each patient. The terminal event
S
AB
totic comparison and simulation study, we demonstrate is modeled by a process of two types of competing risks:
that assuming homoscedasticity achieves only a modest safety events of interest and other terminal events. Statis-
cholera vaccine trial indicates a significant indirect effect efficiency gain when compared to nonparametric vari- tical properties of the proposed method are investigated
of vaccination. For example, among placebo recipients ance estimation: when homoscedasticity in truth holds, via simulations. An application is presented with data from
the incidence of cholera was reduced by 5.5 cases per the latter is at worst 88% as efficient as the former in the a phase II oncology trial.
1000 individuals, (95% CI: 3.2, 7.7) in neighborhoods limiting case, and often achieves well over 90% efficiency
with 60% vaccine coverage compared to neighborhoods for most practical situations. e-mail: gongqi@gmail.com
with 33% coverage.
e-mail: fsnycfan@gmail.com
e-mail: mhudgens@bios.unc.edu SMALLER, FASTER PHASE III TRIALS: A BETTER WAY
TO ASSESS TARGETED AGENTS?
BAYESIAN ENROLLMENT AND STOPPING RULES FOR Karla V. Ballman*, Mayo Clinic
103. CLINICAL TRIALS MANAGING TOXICITY REQUIRING LONG FOLLOW-UP Marie-Cecile Le Deley, Institut Gustave Roussy,
IN PHASE II ONCOLOGY TRIALS Université Paris-Sud 11
A MULTISTAGE NON-INFERIORITY STUDY ANALYSIS Guochen Song*, Quintiles Daniel J. Sargent, Mayo Clinic
PLAN TO EVALUATE SUCCESSIVELY MORE STRINGENT Anastasia Ivanova, University of North Carolina,
CRITERIA FOR A CLINICAL TRIAL WITH RARE EVENTS Chapel Hill Traditional clinical trial designs aim to definitively estab-
Siying Li*, University of North Carolina, Chapel Hill lish the superiority, which results in large sample sizes.
Gary G. Koch, University of North Carolina, Chapel Hill Stopping rules for toxicity are routinely used in phase Increasingly, common cancers are recognized to consist of
II oncology trials. If the follow-up for toxicity is long, it small subgroups making large trials infeasible. We com-
We address a multistage clinical trial to assess a sequence is desirable to have a stopping rule that uses all toxicity pared trial design strategies with different combinations
of hypotheses in the non-inferiority and also rare events information available not only information from patients of sample size and alpha to determine which performs
setting. Three successive hypotheses are used to evaluate with full follow-up. Further, to prevent excessive toxicity best over a 15 yr research horizon. We simulated a series of
whether the new treatment meets the criteria for new in such trials we propose an enrollment rule that informs two-treatment superiority trials using different values for
drug approval. Sample sizes for a five stage trial for all an investigator about the maximum number of patients the alpha-level and trial sample size (SS). Different disease
hypotheses are calculated using Poisson and Logrank that can be enrolled depending on current enrollment scenarios, accrual rates, and distributions of treatment ef-
sample size methods. Three strategies and corresponding and all available information about toxicity. We give fects were used. Metrics used included: impact on hazard
analysis plans are developed to evaluate the sequential recommendations on how to construct Bayesian stopping ratio (comparing yr 15 vs yr 0), overall survival benefit
hypotheses. Simulations show the design is satisfactory and enrollment rules to monitor toxicity continuously in (difference in median survival between year 15 and year
with respect to controlled Type I error, sufficient power, Phase II oncology trials with a long follow-up. 0), and risk of worse survival at year 15 compared to year
and early success at interim analyses. 0. Overall survival gains were greater as alpha increased
e-mail: guochens@gmail.com from 0.025 to 0.20. Gains in survival were achieved with
e-mail: siying@live.unc.edu SSs smaller than required under traditional criteria. Reduc-
ing the SS and increasing alpha increased the likelihood of
ANALYSIS OF SAFETY DATA IN CLINICAL TRIALS having a poorer survival rate at yr 15, but this probability
ON THE EFFICIENCY OF NONPARAMETRIC VARIANCE USING A RECURRENT EVENT APPROACH remained small. Results were consistent under different
ESTIMATION IN SEQUENTIAL DOSE-FINDING Qi Gong*, Amgen Inc. assumed distributions for treatment effect. As patient
Chih-Chi Hu*, Columbia University Mailman School Yansheng Tong, Genentech Inc. populations become more restricted (and thus smaller),
of Public Health Alexander Strasak, F. Hoffmann-La Roche Ltd. the current risk adverse trial design strategy may slow long
Ying Kuen K. Cheung, Columbia University Mailman Liang Fang, Genentech Inc. term progress and deserves re-examination.
School of Public Health
As an important aspect of the clinical evaluation of an e-mail: ballman@mayo.edu
Typically, phase I trials are designed to determine the investigational therapy, safety data are routinely collected
maximum tolerated dose, defined as the maximum test in clinical trials. To date, the analysis of safety data has
dose that causes a toxicity with a target probability. In largely been limited to descriptive summaries of incidence SUPERIORITY TESTING IN GROUP SEQUENTIAL
this talk, we formulate dose finding as a quantile estima- rates, or contingency tables aiming to compare simple NON-INFERIORITY TRIALS
tion problem and focus on situations where toxicity is rates between treatment arms. Many have argued this Vandana Mukhi*, U.S. Food and Drug Administration
defined by dichotomizing a continuous outcome, for traditional approach failed to take into account important Heng Li, U.S. Food and Drug Administration
which a correct specification of the variance function information including severity, onset time, and duration
of the outcomes is important. This is especially true for of a safety signal. In this article, we propose a framework In non-inferiority clinical trials it is often of interest to
sequential study where the variance assumption directly to summarize safety data with mean frequency function pre-specify that if the non-inferiority null hypothesis is re-
involves in the generation of the design points and hence and compare safety profiles between treatments with a jected then superiority will be tested. In group-sequential
sensitivity analysis may not be performed after the data generalized log-rank test, taking into account the afore- non-inferiority trials, this pre-specification would need to
are collected. In this light, there is a strong reason for mentioned characteristics ignored in traditional analysis contain not only a decision boundary associated with the
avoiding parametric assumptions on the variance func- approaches. In addition, a multivariate generalized log- non-inferiority hypothesis, but also a rejection rule for the
tion, although this may incur efficiency loss. We investi- rank test to compare the overall safety profile of different superiority null hypothesis at interim and final stages. We
gate how much information one may retrieve by making treatments is proposed. In the proposed method, safety will consider some design issues in this setup, in particular
additional parametric assumptions on the variance in the events are considered to follow a recurrent event process type I error rate for the superiority test.
e-mail: vandana26@yahoo.com
Abstracts 104
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
develop a novel Autoregressive Hidden Markov Model is equivalent to the BIC method when the subsample
S
AB
(AR-HMM) accounting for covariate effects and violations size equals to n/(log n-1). The proposed cross-validation
of the independence assumption. We demonstrate that methodology is more generally applicable than AIC
variation into the model. The new model performs more our AR-HMM leads to improved performance in identify- and BIC. In addition, with an appropriate estimate for
reasonable normalization and offer more accurate esti- ing enriched regions in both simulated and real datasets, the variance of a general U-statistic, one can test which
mation of base-level expression. The corrected expression especially in those with broader regions of DAE-seq model has the smallest risk based on the proposed U
can be modelled as random functions and be expanded signal enrichment. We also introduce a variable selection model selection tool. A real data example is provided to
in orthogonal functional principal components through procedure in the context of the HMM when the mean of study our estimator. In addition to determining the low-
Karhunen-Loeve decomposition. Testing differential each state-specific emission distribution is modeled by est risk model in the BIC sense, we compare the proposed
expressions here is comparing FPCA scores, instead of some set of covariates. We study the theoretical proper- U-statistic cross-validation tool with the standard criteria.
difference in read-counts of gene. The proposed methods ties of this variable selection method and demonstrate its
are applied to drosophila and schizophrenia and bipolar efficacy in simulated and real DAE-seq data. e-mail: qing.w.wang@williams.edu
RNA-Seq datasets.
e-mail: naim@unc.edu
e-mail: xiongha@gmail.com MAXIMUM LIKELIHOOD ESTIMATION FOR
SEMIPARAMETRIC EXPONENTIAL TILT MODELS
105. NONPARAMETRIC METHODS WITH ADJUSTMENT OF COVARIATES
CAN HUMAN ETHNIC SUBGROUPS BE UNCOVERED Jinsong Chen*, University of Illinois at Chicago
BY NEXT GENERATION SEQUENCING DATA? VARIABLE SELECTION IN MONOTONE SINGLE-INDEX George R. Terrell, Virginia Tech University
Yiwei Zhang*, University of Minnesota MODELS VIA THE ADAPTIVE LASSO Inyoung Kim, Virginia Tech University
Wei Pan, University of Minnesota Jared Foster*, University of Michigan
We propose a semiparametric exponential tilt model al-
Population stratification is of primary interest in genetic We consider the problem of variable selection for mono- lowing the adjustment of covariates. Furthermore, we add
studies to imply human evolution history and to avoid tone single-index models. A single-index model assumes flexible log-concave qualitative constraint on nonpara-
spurious findings in association testing. Next generation that the expectation of the outcome is an unknown metric density estimation of proposed model. Maximum
sequencing data brings greater chance as well as chal- function of a linear combination of covariates. Assuming likelihood method is used for estimate exponential tilt
lenges to uncover population structure in finer scales. For monotonicity of the unknown function is often reason- parameters and density functions. Asymptotic normality
SNP data, the most commonly used method is principal able, and allows for more straightforward inference. of the estimates is developed. Likelihood ratio test, which
component analysis (PCA), while two recently proposed We present an adaptive LASSO penalized least squares is proved to follow a chi-square distribution, is constructed
methods are Spectral Clustering (Spectral-GEM) and approach to estimating the index parameter and the un- to test the significance of exponential tilt parameter
Locally Linear Embedding (LLE). In this talk we apply and known function in these models for continuous outcome. estimation. Simulation study is conducted to assess the
compare these three methods using the whole-genome Monotone function estimates are achieved using the performance of our method. Our model is also applied to
sequence data of the European and African samples from pooled adjacent violators algorithm, followed by kernel analyze the data from Chicago Healthy Aging Study.
the 1000 Genomes Project to uncover ethnic subgroups. regression. In the iterative estimation process, a linear
approximation to the unknown function is used, therefore e-mail: jschen24@hotmail.com
e-mail: zhan1447@umn.edu reducing the situation to that of linear regression, and
allowing for the use of standard LASSO algorithms, such
as coordinate descent. Results of a simulation study TWO STEP ESTIMATION OF PROPORTIONAL HAZARDS
AUTOREGRESSIVE MODELING AND VARIABLE indicate that the proposed methods perform well under REGRESSION MODELS WITH NONPARAMETRIC
SELECTION PROCEDURES IN HIDDEN MARKOV a variety of circumstances, and that an assumption of ADDITIVE EFFECTS
MODELS WITH COVARIATES, WITH APPLICATIONS monotonicity, when appropriate, noticeably improves Rong Liu*, University of Toledo
TO DAE-seq DATA performance. The proposed methods are applied to data
Naim U. Rashid*, University of North Carolina, Chapel Hill from a randomized clinical trial for the treatment of a The Cox proportional hazards model usually assumes that
Wei Sun, University of North Carolina, Chapel Hill critical illness in the intensive care unit (ICU). the covariate has a log-linear effect on the hazard func-
Joseph G. Ibrahim, University of North Carolina, tion. Many studies had been done for removing the linear
Chapel Hill e-mail: jaredcf@umich.edu restriction. Sleeper and Harrington (1990) used additive
models to model the nonlinear covariate effects in the Cox
In DAE (DNA After Enrichment)-seq experiments, DNA model. But the asymptotic properties of the estimation
related with certain biological processes are isolated and CROSS-VALIDATION AND A U-STATISTIC MODEL of component functions were not obtained. We propose
sequenced on a high-throughput sequencing platform SELECTION TOOL spline-backfitted kernel (SBK) estimator for the compo-
to determine their genomic positions. Statistical analysis Qing Wang*, Williams College nent functions, and establish oracle properties for the
of DAE-seq data aims to detect genomic regions with Bruce G. Lindsay, The Pennsylvania State University two-step estimator of each function component such that
significant aggregations of isolated DNA. However, sev- it performs as well as the univariate function estimator by
eral confounding factors and their interactions may bias In this talk we turn our attention to the problem of assuming that all other function components are known.
DAE-seq data, which leads to a challenging variable se- model selection and will propose an alternative model Asymptotic distributions and consistency properties of
lection problem. In addition, signals in adjacent genome selection method, akin to the BIC criterion. We construct a the estimators are obtained. Simulation evidence strongly
regions may exhibit strong correlations, invalidating the U-statistic form estimate for the likelihood risk that is the corroborates with the asymptotic theory. We illustrate the
independence assumption of many existing methods basis of the generalized AIC methods. The U-statistic risk method with a real data example.
for DAE-seq data analysis. To mitigate these issues, we estimate, sometimes called likelihood cross-validation,
is an alternative estimator to the generalized AIC and e-mail: rong.liu@utoledo.edu
Abstracts 106
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
JOINT MODELING OF LONGITUDINAL AND data. In longitudinal studies with potentially informative
S
AB
We describe a Bayesian model in a simulated clinical PRINCIPAL COMPONENT ANALYSIS ON HIGH 108. NEW STATISTICAL CHALLENGES
trial setting for a bivariate distribution with one binary DIMENSIONAL NON-GAUSSIAN DEPENDENT DATA FOR LONGITUDINAL/
and one continuous response. The marginal distribution Fang Han*, Johns Hopkins University
of the binary response is given a Bernoulli distribution Han Liu, Princeton University
MULTIVARIATE ANALYSIS
with a logit link function and the conditional distribution WITH MISSING DATA
of the continuous response given the binary response is In this paper, we propose a new principal component
given a normal distribution with a linear link function. In analysis (PCA) that has the potential to handle large, OUTCOME DEPENDENT SAMPLING FOR
the simulation, the Bayesian credible sets were obtained complex, and noisy datasets. In particular, we study the CONTINUOUS-RESPONSE LONGITUDINAL DATA
through Markov chain Monte Carlo methods using scenario where the observations are each from a semi- Paul J. Rathouz*, University of Wisconsin, Madison
OpenBUGS through R. Parameter estimation is fairly con- parametric model and drawn from non-i.i.d. processes Jonathan S. Schildcrout, Vanderbilt University School
sistent with respect to coverage of the 95% credible sets, (m-dependency or a more general phi mixing case). of Medicine
however, the posterior estimates of the parameters for We show that our method can allow weak dependence. Lee McDaniel, University of Wisconsin, Madison
the binary response vary more across simulated samples In particular, we provide the generalization bounds of
as the probability of the binary response decreases. convergence for both support recovery and parameter es- In outcome dependent sampling (ODS) designs for
The marginal posterior variances also increase in the timation of the proposed method for the non-i.i.d. data. longitudinal data, the subjects and/or the observations
parameters for the binary response as the probability of We provide explicit sufficient conditions on the degree are sampled as a stochastic function of the longitudinal
the binary response decreases, but the marginal posterior of dependence, under which the same parametric rate vector of responses. The sampling may for example be a
variances decrease in the parameters for the conditional can be achieved. To our knowledge, this is the first work variant on case-control sampling for extreme subjects, or
continuous response. analyzing the theoretical performance of PCA for the may alternatively sample individual observations from
dependent data in high dimensional settings. Our results subjects as a function of a surrogate process. ODS results
e-mail: ross_bray@baylor.edu strictly generalize the analysis in Liu et al. (2012) and the in a type of missingness-by-design, and important ques-
techniques we used have the separate interest for analyz- tions of optimal design and robust analysis ensue. Several
ing a variety of other multivariate statistical methods. methods have been developed recently for longitudinal
Our theoretical results are backed up by experiments on binary responses, but less work has been carried out for
synthetic data and real-world genomic and equities data.
e-mail: fhan@jhsph.edu
Abstracts 108
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
Abstracts 110
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
cardiovascular risk (e.g., minimum 2 years) for these BAYESIAN DESIGN OF SUPERIORITY CLINICAL
S
AB
chronically used therapies. In this context, we develop TRIALS FOR RECURRENT EVENTS DATA WITH
a new Bayesian sequential meta-analysis approach APPLICATIONS TO BLEEDING AND TRANSFUSION
PERSONALIZED EVALUATION OF BIOMARKER VALUE: using survival regression models to assess whether the EVENTS IN MYELODYPLASTIC SYNDROME
A COST-BENEFIT PERSPECTIVE size of a clinical development program is adequate to Ming-Hui Chen*, University of Connecticut
Ying Huang*, Fred Hutchinson Cancer Research Center evaluate a particular safety endpoint. We propose a Joseph G. Ibrahim, University of North Carolina,
Bayesian sample size determination methodology for Chapel Hill
A biomarker or medical test that has a potential to inform sequential meta-analysis clinical trial design with a focus Donglin Zeng, University of North Carolina, Chapel Hill
treatment decisions in clinical practice may be costly on controlling the family-wise type I error and power. The Kuolung Hu, Amgen Inc.
to measure. Understanding the extra benefit provided proposed methodology is applied to the design of a new Catherine Jia, Amgen Inc.
by the marker is therefore important to patients and anti-diabetic drug development program for evaluating
clinicians who are making the decisions about whether to cardiovascular risk. In many biomedical studies, patients may experience
have a patient’s biomarker measured. Common methods the same type of recurrent event repeatedly over time,
for evaluating a biomarker’s utility in a general popula- e-mail: ibrahim@bios.unc.edu such as bleeding, multiple infections and disease. In this
tion are not ideal for this purpose: a biomarker that is paper, we aim to design a pivotal clinical trial in which
useful for guiding treatment decisions to the general lower risk Myelodysplastic syndrome (MDS) patients are
population will have different values to different patients USING DATA AUGMENTATION TO FACILITATE treated with MDS disease modifying therapies. One of the
due to the individual differences in their response to CONDUCT OF PHASE I/II CLINICAL TRIALS WITH key study objectives is to demonstrate the investigational
treatment and in their tolerance of the disease harm and DELAYED OUTCOMES product (treatment) effect on reduction of platelet
the treatment cost. In this talk, we propose a new tool to Ying Yuan*, University of Texas MD Anderson transfusion and bleeding events while receiving MDS
quantify a biomarker's treatment-selection value to indi- Cancer Center therapies. In this context, we propose a new Bayesian
vidual patients, which integrates two pieces of personal Ick Hoon Jin, University of Texas MD Anderson approach for the design of superiority clinical trials using
information including a patient’s baseline risk factors Cancer Center recurrent events regression models. The recurrent events
and the patient’s input about the ratio of treatment cost Peter Thall, University of Texas MD Anderson data from a completed phase 2 trial is incorporated into
relative to disease cost. We develop estimation methods Cancer Center the Bayesian design via the power prior of Ibrahim and
for both randomized trials and cohort studies. Chen (2000). An efficient MCMC sampling algorithm, a
Phase I/II clinical trial designs combine conventional predictive data generation algorithm, and a simulation-
e-mail: yhuang@fhcrc.org phase I and phase II trials by using both toxicity and based algorithm are developed for sampling from the
efficacy to determine an optimal dose of a new agent. fitting posterior distribution, generating the predictive
While many phase I/II designs have been proposed, they recurrent events data, and computing various design
have seen very limited use. A major practical impedi- quantities such as type I error and power, respectively.
112. DESIGN OF CLINICAL TRIALS ment, for phase I/II and many other adaptive clinical trial Various properties of the proposed methodology are ex-
FOR TIME-TO-EVENT DATA designs, is that outcomes used by adaptive decision rules amined and an extensive simulation study is conducted.
must be observed soon after the start of therapy in order
BAYESIAN SEQUENTIAL META-ANALYSIS DESIGN to apply the rules to choose treatments or doses for new e-mail: ming-hui.chen@uconn.edu
IN EVALUATING CARDIOVASCULAR RISK IN A NEW patients. In phase I/II, a severe logistical problem occurs
ANTIDIABETIC DRUG DEVELOPMENT PROGRAM if either toxicity or efficacy cannot be scored quickly,
Joseph G. Ibrahim*, University of North Carolina, for example if either outcome takes up to six weeks to
Chapel Hill
113. STATISTICAL ANALYSIS OF
evaluate but two or more patients are accrued per month.
Ming-Hui Chen, University of Connecticut We propose a general methodology for this problem
SUBSTANCE ABUSE DATA
Amy Xia, Amgen Inc. that treats late-onset outcomes as missing data. Given a
Thomas Liu, Amgen Inc. TIME-VARYING COEFFICIENT MODELS
probability model for the times to toxicity and efficacy as
Violeta Hennessey, Amgen Inc. FOR LONGITUDINAL MIXED RESPONSES
functions of dose, we use data augmentation to impute
Esra Kurum, Istanbul Medeniyet University,
missing binary outcomes from their posterior predictive
Recently, the Center for Drug Evaluation and Research at Istanbul, Turkey
distributions based on both partial follow-up information
the Food and Drug Administration (FDA) released a guid- Runze Li*, The Pennsylvania State University
and complete outcome data. Using the completed data,
ance document that makes recommendations about how Saul Shiffman, University of Pittsburgh
we apply the phase I/II design's decision rules, subject
to demonstrate that a new anti-diabetic therapy to treat Weixin Yao, Kansas State University
to dose safety and efficacy admissibility requirements.
type 2 diabetes is not associated with an unacceptable We illustrate the method with two cancer clinical trials,
increase in cardiovascular risk. One of the recommenda- Motivated by an empirical analysis of ecological mo-
including computer stimulations.
tions from the guidance is that phase 2 and 3 trials should mentary assessment data (EMA) collected in a smoking
be appropriately designed and conducted so that a meta- cessation study, we propose a joint modeling technique
e-mail: yyuan@mdanderson.org
analysis can be performed; the phase 2 and 3 programs for estimating the time-varying association between
should include patients at higher risk of cardiovascular two intensively measured longitudinal responses: a
events; and it is likely that the controlled trials will need continuous one and a binary one. A major challenge in
to last more than the typical 3 to 6 months duration to joint modeling these responses is the lack of a multivari-
obtain enough events and to provide data on longer-term ate distribution. We suggest introducing a normal latent
variable underlying the binary response and factorizing
Abstracts 112
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
S
AB
e-mail: ridab2009@knights.ucf.edu
ON
R
ST
S
AB
Abstracts 116
TER PRESE
| PO S NT
C TS AT
A I
ON
R
ST
performance of these models regarding survival curve IMAGE DETAILS PRESERVING IMAGE DENOISING
S
AB
Abstracts 118
ENAR 2013
Spring Meeting
March 10 – 13
E X
IN D
Abstracts 120
IN DEX | INDE
D EX | X
| IN
X
DE
| IN
Han, Yu 5d
Abstracts 122
IN DEX | INDE
D EX | X
| IN
X
DE
| IN
Abstracts 124
IN DEX | INDE
D EX | X
| IN
X
DE
| IN
Abstracts 126
IN DEX | INDE
D EX | X
| IN
X
DE
| IN
Abstracts 128
IN DEX | INDE
D EX | X
| IN
X
DE
| IN
Abstracts 130
12100 Sunset Hills Road | Suite 130
Reston, Virginia 20190
Phone 703-437-4377 | Fax 703-435-4390