2015-12-02T01:28:21Z
http://oai.repec.openlib.org/oai.php
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:363-3782014-11-17RePEc:oup:biomet
article
Likelihood approaches for the invariant density ratio model with biased-sampling data
The full likelihood approach in statistical analysis is regarded as the most efficient means for estimation and inference. For complex length-biased failure time data, computational algorithms and theoretical properties are not readily available, especially when a likelihood function involves infinite-dimensional parameters. Relying on the invariance property of length-biased failure time data under the semiparametric density ratio model, we present two likelihood approaches for the estimation and assessment of the difference between two survival distributions. The most efficient maximum likelihood estimators are obtained by the <sc>em</sc> algorithm and profile likelihood. We also provide a simple numerical method for estimation and inference based on conditional likelihood, which can be generalized to k-arm settings. Unlike conventional survival data, the mean of the population failure times can be consistently estimated given right-censored length-biased data under mild regularity conditions. To check the semiparametric density ratio model assumption, we use a test statistic based on the area between two survival distributions. Simulation studies confirm that the full likelihood estimators are more efficient than the conditional likelihood estimators. We analyse an epidemiological study to illustrate the proposed methods. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
363
378
http://hdl.handle.net/10.1093/biomet/ass008
application/pdf
Access to full text is restricted to subscribers.
Yu Shen
Jing Ning
Jing Qin
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:325-3332014-11-17RePEc:oup:biomet
article
Objective Bayesian analysis for the Student-t regression model
We develop a Bayesian analysis based on two different Jeffreys priors for the Student-t regression model with unknown degrees of freedom. It is typically difficult to estimate the number of degrees of freedom: improper prior distributions may lead to improper posterior distributions, whereas proper prior distributions may dominate the analysis. We show that Bayesian analysis with either of the two considered Jeffreys priors provides a proper posterior distribution. Finally, we show that Bayesian estimators based on Jeffreys analysis compare favourably to other Bayesian estimators based on priors previously proposed in the literature. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
325
333
http://hdl.handle.net/10.1093/biomet/asn001
application/pdf
Access to full text is restricted to subscribers.
Thaís C. O. Fonseca
Marco A. R. Ferreira
Helio S. Migon
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1005-10112014-11-17RePEc:oup:biomet
article
A note on automatic variable selection using smooth-threshold estimating equations
This paper develops smooth-threshold estimating equations that can automatically eliminate irrelevant parameters by setting them as zero. The resulting estimator enjoys the oracle property in the sense of Fan & Li (2001), even in estimators for which the covariance assumption of Wang & Leng (2007) is violated, such as the Buckley--James estimator. Furthermore, the estimator can be obtained without solving a convex optimization problem. A <sc>bic</sc>-type criterion for tuning parameter selection is also proposed. It is shown that the criterion achieves consistent model selection. A numerical study confirms the performance of the method. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1005
1011
http://hdl.handle.net/10.1093/biomet/asp060
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:387-4022014-11-17RePEc:oup:biomet
article
Estimating a treatment effect with repeated measurements accounting for varying effectiveness duration
To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured over time for the same subject. The difference in their means may characterize a treatment effect. Since treatment effectiveness lag and saturation times may exist, erosion of treatment effect often occurs during the observation period. Instead of using models based on ad hoc parametric or purely nonparametric time-varying coefficients, we model the treatment effectiveness durations, which are the time intervals between the lag and saturation times. Then we use some mean response models to include such treatment effectiveness durations. Our methodology is demonstrated by simulations and analysis of a landmark <sc>HIV</sc>/<sc>AIDS</sc> clinical trial of short-course nevirapine against mother-to-child <sc>HIV</sc> vertical transmission during labour and delivery. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
387
402
http://hdl.handle.net/10.1093/biomet/asm019
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Chen
J. Yang
S. Cheng
J. B. Jackson
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:135-1522014-11-17RePEc:oup:biomet
article
Extending conventional priors for testing general hypotheses in linear models
We consider that observations come from a general normal linear model and that it is desirable to test a simplifying null hypothesis about the parameters. We approach this problem from an objective Bayesian, model-selection perspective. Crucial ingredients for this approach are 'proper objective priors' to be used for deriving the Bayes factors. Jeffreys-Zellner-Siow priors have good properties for testing null hypotheses defined by specific values of the parameters in full-rank linear models. We extend these priors to deal with general hypotheses in general linear models, not necessarily of full rank. The resulting priors, which we call 'conventional priors', are expressed as a generalization of recently introduced 'partially informative distributions'. The corresponding Bayes factors are fully automatic, easily computed and very reasonable. The methodology is illustrated for the change-point problem and the equality of treatments effects problem. We compare the conventional priors derived for these problems with other objective Bayesian proposals like the intrinsic priors. It is concluded that both priors behave similarly although interesting subtle differences arise. We adapt the conventional priors to deal with nonnested model selection as well as multiple-model comparison. Finally, we briefly address a generalization of conventional priors to nonnormal scenarios. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
135
152
http://hdl.handle.net/10.1093/biomet/asm014
application/pdf
Access to full text is restricted to subscribers.
M.J. Bayarri
Gonzalo García-Donato
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:773-7782014-11-17RePEc:oup:biomet
article
A note on conditional <sc>aic</sc> for linear mixed-effects models
The conventional model selection criterion, the Akaike information criterion, <sc>aic</sc>, has been applied to choose candidate models in mixed-effects models by the consideration of marginal likelihood. Vaida & Blanchard (2005) demonstrated that such a marginal <sc>aic</sc> and its small sample correction are inappropriate when the research focus is on clusters. Correspondingly, these authors suggested the use of conditional <sc>aic</sc>. Their conditional <sc>aic</sc> is derived under the assumption that the variance-covariance matrix or scaled variance-covariance matrix of random effects is known. This note provides a general conditional <sc>aic</sc> but without these strong assumptions. Simulation studies show that the proposed method is promising. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
773
778
http://hdl.handle.net/10.1093/biomet/asn023
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Hulin Wu
Guohua Zou
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:767-7672014-11-17RePEc:oup:biomet
article
'Nonparametric inference in multivariate mixtures'<break/>Biometrika (2005), 92, pp. 667–678
The left-hand side of equation (2·8), on p. 671, should read {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) rather than {(1 − π<sub>1</sub>)/π<sub>1</sub>}-super-1/2 (2π<sub>1</sub> − 1). Reflecting this change, the left-hand side of equation (3·1) on the same page should be altered to <inline-formula><mml:math><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mo ver><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>Ȣ 7;</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn>< /mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml :msub><mml:mrow><mml:mover><mml:mrow><mml:mi>π</mml:mi></mml:mrow>< mml:mrow><mml:mi>∧</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:m row><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mo>−</mml:mo>< mml:mn>1/2</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mrow><mml:move r><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>∧ </mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></m ml:mrow></mml:msub><mml:mo>−</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and the formula at the foot of p. 677 should be modified to {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) + O<sub>p</sub>(n-super-−1/2). No other formula is affected, and the left-hand side of (2·8) is still increasing in π<sub>1</sub>. The numerical results, discussed in §4, are influenced in minor ways. In the simulation study, absolute bias is reduced, and variance is either slightly increased or slightly decreased. In the real-data example, using the nonparametric approach to analysis, mean squared error is further reduced, from 0·0011 to 0·0004. We are grateful to Hiro Kasahara and Katsumi Shimotsu for pointing out the error. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
767
767
http://hdl.handle.net/10.1093/biomet/asm042
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:513-5272014-11-17RePEc:oup:biomet
article
Adaptive regularization using the entire solution surface
Several sparseness penalties have been suggested for delivery of good predictive performance in automatic variable selection within the framework of regularization. All assume that the true model is sparse. We propose a penalty, a convex combination of the L<sub>1</sub>- and L<sub>∞</sub>-norms, that adapts to a variety of situations including sparseness and nonsparseness, grouping and nongrouping. The proposed penalty performs grouping and adaptive regularization. In addition, we introduce a novel homotopy algorithm utilizing subgradients for developing regularization solution surfaces involving multiple regularizers. This permits efficient computation and adaptive tuning. Numerical experiments are conducted using simulation. In simulated and real examples, the proposed penalty compares well against popular alternatives. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
513
527
http://hdl.handle.net/10.1093/biomet/asp038
application/pdf
Access to full text is restricted to subscribers.
S. Wu
X. Shen
C. J. Geyer
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:583-5982014-11-17RePEc:oup:biomet
article
Functional mixed effects spectral analysis
In many experiments, time series data can be collected from multiple units and multiple time series segments can be collected from the same unit. This article introduces a mixed effects Cramér spectral representation which can be used to model the effects of design covariates on the second-order power spectrum while accounting for potential correlations among the time series segments collected from the same unit. The transfer function is composed of a deterministic component to account for the population-average effects and a random component to account for the unit-specific deviations. The resulting log-spectrum has a functional mixed effects representation where both the fixed effects and random effects are functions in the frequency domain. It is shown that, when the replicate-specific spectra are smooth, the log-periodograms converge to a functional mixed effects model. A data-driven iterative estimation procedure is offered for the periodic smoothing spline estimation of the fixed effects, penalized estimation of the functional covariance of the random effects, and unit-specific random effects prediction via the best linear unbiased predictor. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
583
598
http://hdl.handle.net/10.1093/biomet/asr032
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
Martica Hall
Wensheng Guo
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:679-6942014-11-17RePEc:oup:biomet
article
Improving the efficiency of the log-rank test using auxiliary covariates
Under the assumption of proportional hazards, the log-rank test is optimal for testing the null hypothesis <inline-formula><inline-graphic xlink:href="asn003ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asn003ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> denotes the logarithm of the hazard ratio. However, if there are additional covariates that correlate with survival times, making use of their information will increase the efficiency of the log-rank test. We apply the theory of semiparametrics to characterize a class of regular and asymptotically linear estimators for <inline-formula><inline-graphic xlink:href="asn003ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> when auxiliary covariates are incorporated into the model, and derive estimators that are more efficient. The Wald tests induced by these estimators are shown to be more powerful than the log-rank test. Simulation studies are used to illustrate the gains in efficiency. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
679
694
http://hdl.handle.net/10.1093/biomet/asn003
application/pdf
Access to full text is restricted to subscribers.
Xiaomin Lu
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:877-8912014-11-17RePEc:oup:biomet
article
Adaptive cluster double sampling
We present a multi-phase variant of adaptive cluster sampling which allows the sampler to control the number of measurements of the variable of interest. A first-phase sample is selected using an adaptive cluster sampling design based on an inexpensive auxiliary variable associated with the survey variable. Then the network structure of the adaptive cluster sample is used to select an ordinary one-phase or two-phase subsample of units and the values of the survey variable associated with those units are recorded. The population mean is estimated by either a regression-type estimator or a Horvitz--Thompson-type estimator. The results of a simulation study show good performance of the proposed design, and suggest that in many real situations this design might be preferred to the ordinary adaptive cluster sampling design. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
877
891
http://hdl.handle.net/10.1093/biomet/91.4.877
text/html
Access to full text is restricted to subscribers.
Martín H. Felix-Medina
Steven K. Thompson
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:533-5502014-11-17RePEc:oup:biomet
article
Inferring stochastic dynamics from functional data
In most current data modelling for time-dynamic systems, one works with a prespecified differential equation and attempts to estimate its parameters. In contrast, we demonstrate that in the case of functional data, the equation itself can be inferred. Assuming only that the dynamics are described by a first-order nonlinear differential equation with a random component, we obtain data-adaptive dynamic equations from the observed data via a simple smoothing-based procedure. We prove consistency and introduce diagnostics to ascertain the fraction of variance that is explained by the deterministic part of the equation. This approach is shown to yield useful insights into the time-dynamic nature of human growth. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
533
550
http://hdl.handle.net/10.1093/biomet/ass015
application/pdf
Access to full text is restricted to subscribers.
Nicolas Verzelen
Wenwen Tao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:529-5382014-11-17RePEc:oup:biomet
article
Bayesian nonparametric multiple imputation of partially observed data with ignorable nonresponse
We present a new, nonparametric Bayesian method for multiple imputation of partially observed data for which the pattern of missingness is arbitrary and the data are missing at random with ignorable nonresponse with respect to the model specification. Motivation for the method is provided, followed by an overview of Pólya trees and their application to multiple imputation, and a comparison of the new method to existing approaches is presented. The method is illustrated on a dataset of colleges and universities in the United States. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
529
538
Susan M. Paddock
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:257-2632014-11-17RePEc:oup:biomet
article
Asymptotic inference for a nonstationary double <sc>AR</sc>(1) model
We investigate the nonstationary double <sc>ar(1)</sc> model, <disp-formula><graphic xlink:href="asm084ueq1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></disp-formula> where ω > 0, α > 0, the η<sub>t</sub> are independent standard normal random variables and Elog |φ + η<sub>t</sub>√α| ⩾ 0. We show that the maximum likelihood estimator of (φ, α) is consistent and asymptotically normal. Combination of this result with that in Ling ([11]) for the stationary case gives the asymptotic normality of the maximum likelihood estimator of φ for any φ in the real line, with a root-n rate of convergence. This is in contrast to the results for the classical <sc>ar(1)</sc> model, corresponding to α = 0. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
257
263
http://hdl.handle.net/10.1093/biomet/asm084
application/pdf
Access to full text is restricted to subscribers.
Shiqing Ling
Dong Li
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:741-7552014-11-17RePEc:oup:biomet
article
Kernel smoothed profile likelihood estimation in the accelerated failure time frailty model for clustered survival data
Clustered survival data frequently arise in biomedical applications, where event times of interest are clustered into groups such as families. In this article we consider an accelerated failure time frailty model for clustered survival data and develop nonparametric maximum likelihood estimation for it via a kernel smoother-aided <sc>em</sc> algorithm. We show that the proposed estimator for the regression coefficients is consistent, asymptotically normal, and semiparametric efficient when the kernel bandwidth is properly chosen. An <sc>em</sc>-aided numerical differentiation method is derived for estimating its variance. Simulation studies evaluate the finite sample performance of the estimator, and it is applied to the diabetic retinopathy dataset. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/ast012
application/pdf
Access to full text is restricted to subscribers.
Bo Liu
Wenbin Lu
Jiajia Zhang
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:235-2542014-11-17RePEc:oup:biomet
article
Bayesian alignment using hierarchical models, with applications in protein bioinformatics
An important problem in shape analysis is to match configurations of points in space after filtering out some geometrical transformation. In this paper we introduce hierarchical models for such tasks, in which the points in the configurations are either unlabelled or have at most a partial labelling constraining the matching, and in which some points may only appear in one of the configurations. We derive procedures for simultaneous inference about the matching and the transformation, using a Bayesian approach. Our hierarchical model is based on a Poisson process for hidden true point locations; this leads to considerable mathematical simplification and efficiency of implementation of <EM t="s">EM and Markov chain Monte Carlo algorithms. We find a novel use for classical distributions from directional statistics in a conditionally conjugate specification for the case where the geometrical transformation includes an unknown rotation. Throughout, we focus on the case of affine or rigid motion transformations. Under a broad parametric family of loss functions, an optimal Bayesian point estimate of the matching matrix can be constructed that depends only on a single parameter of the family. Our methods are illustrated by two applications from bioinformatics. The first problem is of matching protein gels in two dimensions, and the second consists of aligning active sites of proteins in three dimensions. In the latter case, we also use information related to the grouping of the amino acids, as an example of a more general capability of our methodology to include partial labelling information. We discuss some open problems and suggest directions for future work. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
235
254
http://hdl.handle.net/10.1093/biomet/93.2.235
text/html
Access to full text is restricted to subscribers.
Peter J. Green
Kanti V. Mardia
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:761-7742014-11-17RePEc:oup:biomet
article
Non-Gaussian spatiotemporal modelling through scale mixing
We construct non-Gaussian processes that vary continuously in space and time with nonseparable covariance functions. Starting from a general and flexible way of constructing valid nonseparable covariance functions through mixing over separable covariance functions, the resulting models are generalized by allowing for outliers as well as regions with larger variances. We induce this through scale mixing with separate positive-valued processes. Smooth mixing processes are applied to the underlying correlated processes in space and in time, thus leading to regions in space and time of increased spread. An uncorrelated mixing process on the nugget effect accommodates outliers. Posterior and predictive Bayesian inference with these models is implemented through a Markov chain Monte Carlo sampler. An application to temperature data in the Basque country illustrates the potential of this model in the identification of outliers and regions with inflated variance, and shows that this improves the predictive performance. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
761
774
http://hdl.handle.net/10.1093/biomet/asr047
application/pdf
Access to full text is restricted to subscribers.
Thaís C. O. Fonseca
Mark F. J. Steel
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:847-8582014-11-17RePEc:oup:biomet
article
Estimating equations for spatially correlated data in multi-dimensional space
We use the quasilikelihood concept to propose an estimating equation for spatial data with correlation across the study region in a multi-dimensional space. With appropriate mixing conditions, we develop a central limit theorem for a random field under various L<sub>p</sub> metrics. The consistency and asymptotic normality of quasilikelihood estimators can then be derived. We also conduct simulations to evaluate the performance of the proposed estimating equation, and a dataset from East Lansing Woods is used to illustrate the method. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
847
858
http://hdl.handle.net/10.1093/biomet/asn046
application/pdf
Access to full text is restricted to subscribers.
Pei-Sheng Lin
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:209-2222014-11-17RePEc:oup:biomet
article
Bürmann expansion and test for additivity
We propose a Lagrange multiplier test for additivity based on the Bürmann expansion of a conditional mean function. The asymptotic null distribution of the test is shown to be x-super-2, under some regularity conditions. In contrast, the Lagrange multiplier test proposed by Chen et al. (1995) is based on the Volterra expansion of the conditional mean function. We discuss some desirable advantages of the Bürmann expansion over the Volterra expansion for nonlinear time series modelling. We also reported an empirical study which shows that, in terms of empirical power, the Lagrange multiplier test motivated by the Bürmann expansion outperforms the test of Chen et al. (1995) for the cases for which the Lagrange multiplier test is designed. For other cases for which none of the tests is specifically designed, the empirical powers of the two tests are comparable. Finally, we illustrated the use of the Lagrange multiplier test with a blowfly experimental system. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
209
222
K. S. Chan
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:539-5522014-11-17RePEc:oup:biomet
article
A sequential particle filter method for static models
Particle filter methods are complex inference procedures, which combine importance sampling and Monte Carlo schemes in order to explore consistently a sequence of multiple distributions of interest. We show that such methods can also offer an efficient estimation tool in 'static' set-ups, in which case &pgr;(&thgr; | y-sub-1, …, y-sub-N) (n < N) is the only posterior distribution of interest but the preliminary exploration of partial posteriors &pgr;(&thgr; | y-sub-1, …, y-sub-n) makes it possible to save computing time. A complete algorithm is proposed for independent or Markov models. Our method is shown to challenge other common estimation procedures in terms of robustness and execution time, especially when the sample size is important. Two classes of examples, mixture models and discrete generalised linear models, are discussed and illustrated by numerical results. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
539
552
Nicolas Chopin
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:449-4582014-11-17RePEc:oup:biomet
article
Optimal design for additive partially nonlinear models
We develop optimal design theory for additive partially nonlinear regression models, showing that Bayesian and standardized maximin D-optimal designs can be found as the products of the corresponding optimal designs in one dimension. A sufficient condition under which analogous results hold for D<sub>s</sub>-optimality is derived to accommodate situations in which only a subset of the model parameters is of interest. To facilitate prediction of the response at unobserved locations, we prove similar results for Q-optimality in the class of all product designs. The usefulness of this approach is demonstrated through an application from the automotive industry, where optimal designs for least squares regression splines are determined and compared with designs commonly used in practice. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
449
458
http://hdl.handle.net/10.1093/biomet/asr001
application/pdf
Access to full text is restricted to subscribers.
S. Biedermann
H. Dette
D. C. Woods
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:261-2682014-11-17RePEc:oup:biomet
article
Non-restarting cumulative sum charts and control of the false discovery rate
Cumulative sum or <sc>cusum</sc> charts are typically used to detect a change in the distribution of a sequence of observations, e.g., shifts in the mean. Usually, after signalling, the chart is restarted by setting it to some value below the signalling threshold. We propose a non-restarting <sc>cusum</sc> chart which is able to detect periods during which the stream is out of control. Further, we advocate an upper boundary to prevent the <sc>cusum</sc> chart rising too high, which helps to detect a change back into control. We present an algorithm to control the false discovery rate when considering <sc>cusum</sc> charts based on multiple streams of data. We consider two definitions of a false discovery: signalling out-of-control when the observations have been in control since the start and signalling out-of-control when the observations have been in control since the last time the chart was at zero. We prove that the false discovery rate is controlled under both these definitions simultaneously. Simulations reveal the difference in false discovery rate control when using these and other desirable definitions of a false discovery. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
261
268
http://hdl.handle.net/10.1093/biomet/ass066
application/pdf
Access to full text is restricted to subscribers.
Axel Gandy
F. Din-Houn Lau
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:111-1282014-11-17RePEc:oup:biomet
article
Varying-coefficient models and basis function approximations for the analysis of repeated measurements
<?Pub Caret> A global smoothing procedure is developed using basis function approximations for estimating the parameters of a varying-coefficient model with repeated measurements. Inference procedures based on a resampling subject bootstrap are proposed to construct confidence regions and to perform hypothesis testing. Conditional biases and variances of our estimators and their asymptotic consistency are developed explicitly. Finite sample properties of our procedures are investigated through a simulation study. Application of the proposed approach is demonstrated through an example in epidemiology. In contrast to the existing methods, this approach applies whether or not the covariates are time-invariant and does not require binning of the data when observations are sparse at distinct observation times. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
111
128
Jianhua Z. Huang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:875-8892014-11-17RePEc:oup:biomet
article
Pairwise curve synchronization for functional data
Data collected by scientists are increasingly in the form of trajectories or curves. Often these can be viewed as realizations of a composite process driven by both amplitude and time variation. We consider the situation in which functional variation is dominated by time variation, and develop a curve-synchronization method that uses every trajectory in the sample as a reference to obtain pairwise warping functions in the first step. These initial pairwise warping functions are then used to create improved estimators of the underlying individual warping functions in the second step. A truncated averaging process is used to obtain robust estimation of individual warping functions. The method compares well with other available time-synchronization approaches and is illustrated with Berkeley growth data and gene expression data for multiple sclerosis. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
875
889
http://hdl.handle.net/10.1093/biomet/asn047
application/pdf
Access to full text is restricted to subscribers.
Rong Tang
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:181-1982014-11-17RePEc:oup:biomet
article
On Bayesian testimation and its application to wavelet thresholding
We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse l<sub>p</sub>-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense l<sub>p</sub>-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
181
198
http://hdl.handle.net/10.1093/biomet/asp080
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
Athanasia Petsa
Theofanis Sapatinas
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:647-6592014-11-17RePEc:oup:biomet
article
Simulation of hyper-inverse Wishart distributions in graphical models
We introduce and exemplify an efficient method for direct sampling from hyper-inverse Wishart distributions. The method relies very naturally on the use of standard junction-tree representation of graphs, and couples these with matrix results for inverse Wishart distributions. We describe the theory and resulting computational algorithms for both decomposable and nondecomposable graphical models. An example drawn from financial time series demonstrates application in a context where inferences on a structured covariance model are required. We discuss and investigate questions of scalability of the simulation methods to higher-dimensional distributions. The paper concludes with general comments about the approach, including its use in connection with existing Markov chain Monte Carlo methods that deal with uncertainty about the graphical model structure. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asm056
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Hélène Massam
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:645-6612014-11-17RePEc:oup:biomet
article
Markov models for accumulating mutations
We introduce and analyze a waiting time model for the accumulation of genetic changes. The continuous-time conjunctive Bayesian network is defined by a partially ordered set of mutations and by the rate of fixation of each mutation. The partial order encodes constraints on the order in which mutations can fixate in the population, shedding light on the mutational pathways underlying the evolutionary process. We study a censored version of the model and derive equations for an <sc>em</sc> algorithm to perform maximum likelihood estimation of the model parameters. We also show how to select the maximum likelihood partially ordered set. The model is applied to genetic data from cancer cells and from drug resistant human immunodeficiency viruses, indicating implications for diagnosis and treatment. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
645
661
http://hdl.handle.net/10.1093/biomet/asp023
application/pdf
Access to full text is restricted to subscribers.
N. Beerenwinkel
S. Sullivant
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:741-7472014-11-17RePEc:oup:biomet
article
Construction of φ<sub>p</sub>-optimal exact designs with minimum experimental run size for a linear log contrast model in mixture experiments
We propose a new method with minimum experimental run size using the properties of Hadamard matrices through which some φ<sub>p</sub>-optimal exact designs including A-, D- and E-optimal designs are constructed for a linear log contrast model in mixture experiments. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asr014
application/pdf
Access to full text is restricted to subscribers.
Baisuo Jin
Mong-Na Lo Huang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:197-2102014-11-17RePEc:oup:biomet
article
Spectral methods for nonstationary spatial processes
<?Pub Caret> We propose a nonstationary periodogram and various parametric approaches for estimating the spectral density of a nonstationary spatial process. We also study the asymptotic properties of the proposed estimators via shrinking asymptotics, assuming the distance between neighbouring observations tends to zero as the size of the observation region grows without bound. With this type of asymptotic model we can uniquely determine the spectral density, avoiding the aliasing problem. We also present a new class of nonstationary processes, based on a convolution of local stationary processes. This model has the advantage that the model is simultaneously defined everywhere, unlike 'moving window' approaches, but it retains the attractive property that, locally in small regions, it behaves like a stationary spatial process. Applications include the spatial analysis and modelling of air pollution data provided by the US Environmental Protection Agency. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
197
210
Montserrat Fuentes
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:791-8052014-11-17RePEc:oup:biomet
article
Additive modelling of functional gradients
We consider the problem of estimating functional derivatives and gradients in the framework of a regression setting where one observes functional predictors and scalar responses. Derivatives are then defined as functional directional derivatives that indicate how changes in the predictor function in a specified functional direction are associated with corresponding changes in the scalar response. For a model-free approach, navigating the curse of dimensionality requires the imposition of suitable structural constraints. Accordingly, we develop functional derivative estimation within an additive regression framework. Here, the additive components of functional derivatives correspond to derivatives of nonparametric one-dimensional regression functions with the functional principal components of predictor processes as arguments. This approach requires nothing more than estimating derivatives of one-dimensional nonparametric regressions, and thus is computationally very straightforward to implement, while it also provides substantial flexibility, fast computation and consistent estimation. We illustrate the consistent estimation and interpretation of the resulting functional derivatives and functional gradient fields in a study of the dependence of lifetime fertility of flies on early life reproductive trajectories. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
791
805
http://hdl.handle.net/10.1093/biomet/asq056
application/pdf
Access to full text is restricted to subscribers.
Hans-Georg Müller
Fang Yao
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:653-6662014-11-17RePEc:oup:biomet
article
Generalized varying coefficient models for longitudinal data
We propose a generalization of the varying coefficient model for longitudinal data to cases where not only current but also recent past values of the predictor process affect current response. More precisely, the targeted regression coefficient functions of the proposed model have sliding window supports around current time t. A variant of a recently proposed two-step estimation method for varying coefficient models is proposed for estimation in the context of these generalized varying coefficient models, and is found to lead to improvements, especially for the case of additive measurement errors in both response and predictors. The proposed methodology for estimation and inference is also applicable for the case of additive measurement error in the common versions of varying coefficient models that relate only current observations of predictor and response processes to each other. Asymptotic distributions of the proposed estimators are derived, and the model is applied to the problem of predicting protein concentrations in a longitudinal study. Simulation studies demonstrate the efficacy of the proposed estimation procedure. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
653
666
http://hdl.handle.net/10.1093/biomet/asn006
application/pdf
Access to full text is restricted to subscribers.
Damla Şentürk
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:591-6022014-11-17RePEc:oup:biomet
article
Testing model adequacy for dynamic panel data with intercorrelation
We give several definitions of residual autocorrelations and derive their joint asymptotic distribution for the panel time series model of Hjellvik & Tjøstheim (1999a). A portmanteau goodness-of-fit test arises naturally from the asymptotic distribution. Simulation results show that the asymptotic standard errors compared satisfactorily with the empirical standard errors, that the goodness-of-fit test has reasonable empirical size, and that it is powerful enough to be useful with a modest sample size. The results of this paper are illustrated with a real-data example. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
591
602
Bo Fu
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:451-4582014-11-17RePEc:oup:biomet
article
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis-Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
451
458
http://hdl.handle.net/10.1093/biomet/93.2.451
text/html
Access to full text is restricted to subscribers.
J. Møller
A. N. Pettitt
R. Reeves
K. K. Berthelsen
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:553-5682014-11-17RePEc:oup:biomet
article
Tuning parameter selectors for the smoothly clipped absolute deviation method
The penalized least squares approach with smoothly clipped absolute deviation penalty has been consistently demonstrated to be an attractive regression shrinkage and selection method. It not only automatically and consistently selects the important variables, but also produces estimators which are as efficient as the oracle estimator. However, these attractive features depend on appropriate choice of the tuning parameter. We show that the commonly used generalized crossvalidation cannot select the tuning parameter satisfactorily, with a nonignorable overfitting effect in the resulting model. In addition, we propose a <sc>BIC</sc> tuning parameter selector, which is shown to be able to identify the true model consistently. Simulation studies are presented to support theoretical findings, and an empirical example is given to illustrate its use in the Female Labor Supply data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
553
568
http://hdl.handle.net/10.1093/biomet/asm053
application/pdf
Access to full text is restricted to subscribers.
Hansheng Wang
Runze Li
Chih-Ling Tsai
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:831-8452014-11-17RePEc:oup:biomet
article
A goodness-of-fit test for inhomogeneous spatial Poisson processes
We introduce a formal testing procedure to assess the fit of an inhomogeneous spatial Poisson process model, based on a discrepancy measure function <inline-formula><inline-graphic xlink:href="asn045ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> that is constructed from residuals obtained from the fitted model. We derive the asymptotic distributional properties of <inline-formula><inline-graphic xlink:href="asn045ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and develop a test statistic based on them. Our test statistic has a limiting standard normal distribution, so that the test can be performed by simply comparing the test statistic with readily available critical values. We perform a simulation study to assess the performance of the proposed method and apply it to a real data example. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
831
845
http://hdl.handle.net/10.1093/biomet/asn045
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:663-6842014-11-17RePEc:oup:biomet
article
Efficient restricted estimators for conditional mean models with missing data
Consider a conditional mean model with missing data on the response or explanatory variables due to two-phase sampling or nonresponse. Robins et al. (1994) introduced a class of augmented inverse-probability-weighted estimators, depending on a vector of functions of explanatory variables and a vector of functions of coarsened data. Tsiatis (2006) studied two classes of restricted estimators, class 1 with both vectors restricted to finite-dimensional linear subspaces and class 2 with the first vector of functions restricted to a finite-dimensional linear subspace. We introduce a third class of restricted estimators, class 3, with the second vector of functions restricted to a finite-dimensional subspace. We derive a new estimator, which is asymptotically optimal in class 1, by the methods of nonparametric and empirical likelihood. We propose a hybrid strategy to obtain estimators that are asymptotically optimal in class 1 and locally optimal in class 2 or class 3. The advantages of the hybrid, likelihood estimator based on classes 1 and 3 are shown in a simulation study and a real-data example. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
663
684
http://hdl.handle.net/10.1093/biomet/asr007
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:100:y:2013:i:4:p:781-8002014-11-17RePEc:oup:biomet
article
Bridging the ensemble Kalman and particle filters
In many applications of Monte Carlo nonlinear filtering, the propagation step is computationally expensive, and hence the sample size is limited. With small sample sizes, the update step becomes crucial. Particle filtering suffers from the well-known problem of sample degeneracy. Ensemble Kalman filtering avoids this, at the expense of treating non-Gaussian features of the forecast distribution incorrectly. Here we introduce a procedure that makes a continuous transition indexed by Gamma∈[0,1] between the ensemble and the particle filter update. We propose automatic choices of the parameter Gamma such that the update stays as close as possible to the particle filter update subject to avoiding degeneracy. In various examples, we show that this procedure leads to updates that are able to handle non-Gaussian features of the forecast sample even in high-dimensional situations. Copyright 2013, Oxford University Press.
4
2013
100
Biometrika
781
800
http://hdl.handle.net/10.1093/biomet/ast020
application/pdf
Access to full text is restricted to subscribers.
M. Frei
H. R. Künsch
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:805-8202014-11-17RePEc:oup:biomet
article
Inference on population size in binomial detectability models
Many models for biological populations, including simple mark-recapture models and distance sampling models, involve a binomially distributed number, n, of observations x<sub>1</sub>, …, x<sub>n</sub> on members of a population of size N. Two popular estimators of (N, θ), where θ is a vector parameter, are the maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the conditional maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> based on the conditional distribution of x<sub>1</sub>, …, x<sub>n</sub> given n. We derive the large-N asymptotic distributions of <inline-formula><inline-graphic xlink:href="asp051ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>and <inline-formula><inline-graphic xlink:href="asp051ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, and give formulae for the biases of <inline-formula><inline-graphic xlink:href="asp051ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and <inline-formula><inline-graphic xlink:href="asp051ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>. We show that the difference <inline-formula><inline-graphic xlink:href="asp051ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>is, remarkably, of order 1 and we give a simple formula for the leading part of this difference. Simulations indicate that in many cases this formula is very accurate and that confidence intervals based on the asymptotic distribution have excellent coverage. An extension to product-binomial models is given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
805
820
http://hdl.handle.net/10.1093/biomet/asp051
application/pdf
Access to full text is restricted to subscribers.
R. M. Fewster
P. E. Jupp
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:748-7542014-11-17RePEc:oup:biomet
article
Designs of variable resolution
Prior information or background knowledge may suggest that interactions arise only within certain factors. When such knowledge is available, we propose using a new class of designs: designs of variable resolution. Several constructions are presented. Statistical justifications for using such designs from minimum G<sub>2</sub> aberration and design efficiency perspectives are provided. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
748
754
http://hdl.handle.net/10.1093/biomet/ass035
application/pdf
Access to full text is restricted to subscribers.
C. Devon Lin
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:489-4902014-11-17RePEc:oup:biomet
article
A counterexample to a claim about stochastic simulations
Engen & Lillegård (1997) presented a general method for doing Monte Carlo simulations conditioned on a sufficient statistic. The basic idea was to adjust the parameter values in the corresponding unconditional simulation so that the actual value of the sufficient statistic is obtained, and the claim was that if this adjustment is unique then the modified simulation is from the conditional distribution. Unfortunately the claim is not correct, as shown by a counterexample. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
489
490
Bo Henry Lindqvist
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:49-642014-11-17RePEc:oup:biomet
article
Functional quadratic regression
We extend the common linear functional regression model to the case where the dependency of a scalar response on a functional predictor is of polynomial rather than linear nature. Focusing on the quadratic case, we demonstrate the usefulness of the polynomial functional regression model, which encompasses linear functional regression as a special case. Our approach works under mild conditions for the case of densely spaced observations and also can be extended to the important practical situation where the functional predictors are derived from sparse and irregular measurements, as is the case in many longitudinal studies. A key observation is the equivalence of the functional polynomial model with a regression model that is a polynomial of the same order in the functional principal component scores of the predictor processes. Theoretical analysis as well as practical implementations are based on this equivalence and on basis representations of predictor processes. We also obtain an explicit representation of the regression surface that defines quadratic functional regression and provide functional asymptotic results for an increasing number of model components as the number of subjects in the study increases. The improvements that can be gained by adopting quadratic as compared to linear functional regression are illustrated with a case study that includes absorption spectra as functional predictors. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
49
64
http://hdl.handle.net/10.1093/biomet/asp069
application/pdf
Access to full text is restricted to subscribers.
Fang Yao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:295-3042014-11-17RePEc:oup:biomet
article
Sufficient dimension reduction through discretization-expectation estimation
In the context of sufficient dimension reduction, the goal is to parsimoniously recover the central subspace of a regression model. Many inverse regression methods use slicing estimation to recover the central subspace. The efficacy of slicing estimation depends heavily upon the number of slices. However, the selection of the number of slices is an open and long-standing problem. In this paper, we propose a discretization-expectation estimation method, which avoids selecting the number of slices, while preserving the integrity of the central subspace. This generic method assures root-n consistency and asymptotic normality of slicing estimators for many inverse regression methods, and can be applied to regressions with multivariate responses. A <sc>BIC</sc>-type criterion for the dimension of the central subspace is proposed. Comprehensive simulations and an illustrative application show that our method compares favourably with existing estimators. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
295
304
http://hdl.handle.net/10.1093/biomet/asq018
application/pdf
Access to full text is restricted to subscribers.
Liping Zhu
Tao Wang
Lixing Zhu
Louis Ferré
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:211-2242014-11-17RePEc:oup:biomet
article
Empty confidence sets for epidemics, branching processes and Brownian motion
<?Pub Caret> This paper treats some examples where likelihood-based inference for certain model parameters may produce empty confidence sets. The first example concerns epidemics, and the parameter of interest is the basic reproduction number R-sub-0, which is to be estimated from the final size of an epidemic in a finite population. The second example treats estimation of the mean of the offspring distribution in a branching process, based on observing the total progeny, i.e. the total number of individuals ever born in the branching process. The final example considers estimation of the linear drift in a Brownian motion, based on observing the first hitting time of some horizontal barrier. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
211
224
Frank G Ball
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:317-3352014-11-17RePEc:oup:biomet
article
A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models
A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M-super-+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M-super-+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
317
335
http://hdl.handle.net/10.1093/biomet/92.2.317
text/html
Access to full text is restricted to subscribers.
Aliye Atay-Kayis
Helène Massam
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:357-3702014-11-17RePEc:oup:biomet
article
Covariate-adjusted generalized linear models
We propose covariate adjustment methodology for a situation where one wishes to study the dependence of a generalized response on predictors while both predictors and response are distorted by an observable covariate. The distorting covariate is thought of as a size measurement that affects predictors in a multiplicative fashion. The generalized response is modelled by means of a random threshold, where the subject-specific thresholds are affected by a multiplicative factor that is a function of the distorting covariate. While the various factors are modelled as smooth unknown functions of the distorting covariate, the underlying relationship between response and covariates is assumed to be governed by a generalized linear model with a known link function. This model provides an extension of a covariate-adjusted regression approach to the case of a generalized linear model. We demonstrate that this contamination model leads to a semiparametric varying-coefficient model. Numerical implementation is straightforward by combining binning, quasilikelihood, and smoothing steps. The asymptotic distribution of the proposed estimators for the regression coefficients of the latent generalized linear model is derived by means of a martingale central limit theorem. Combining this result with consistent estimators for the asymptotic variance makes it then possible to obtain asymptotic inference for the targeted parameters. Both real and simulated data are used in illustrating the proposed methodology. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
357
370
http://hdl.handle.net/10.1093/biomet/asp012
application/pdf
Access to full text is restricted to subscribers.
Damla Şentürk
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:683-6982014-11-17RePEc:oup:biomet
article
Circular regression
A new model for an angular regression link function is introduced. The model employs an angular scale parameter, incorporates proper and improper rotations as special cases, and is equivalent to the Möbius circle mapping for complex variables. Desirable properties of the circle mapping carry over to angular regression. Parameter estimation and inferential methods are developed and illustrated. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
683
698
T. D. Downs
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:761-7802014-11-17RePEc:oup:biomet
article
Sinh-arcsinh distributions
We introduce the sinh-arcsinh transformation and hence, by applying it to a generating distribution with no parameters other than location and scale, usually the normal, a new family of sinh-arcsinh distributions. This four-parameter family has symmetric and skewed members and allows for tailweights that are both heavier and lighter than those of the generating distribution. The central place of the normal distribution in this family affords likelihood ratio tests of normality that are superior to the state-of-the-art in normality testing because of the range of alternatives against which they are very powerful. Likelihood ratio tests of symmetry are also available and are very successful. Three-parameter symmetric and asymmetric subfamilies of the full family are also of interest. Heavy-tailed symmetric sinh-arcsinh distributions behave like Johnson S<sub>U</sub> distributions, while their light-tailed counterparts behave like sinh-normal distributions, the sinh-arcsinh family allowing a seamless transition between the two, via the normal, controlled by a single parameter. The sinh-arcsinh family is very tractable and many properties are explored. Likelihood inference is pursued, including an attractive reparameterization. Illustrative examples are given. A multivariate version is considered. Options and extensions are discussed. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
761
780
http://hdl.handle.net/10.1093/biomet/asp053
application/pdf
Access to full text is restricted to subscribers.
M. C. Jones
Arthur Pewsey
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:225-2302014-11-17RePEc:oup:biomet
article
Data-driven selection of the spline dimension in penalized spline regression
A number of criteria exist to select the penalty in penalized spline regression, but the selection of the number of spline basis functions has received much less attention in the literature. We propose a likelihood-based criterion to select the number of basis functions in penalized spline regression. The criterion is easy to apply and we describe its theoretical and practical properties. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
225
230
http://hdl.handle.net/10.1093/biomet/asq081
application/pdf
Access to full text is restricted to subscribers.
Göran Kauermann
Jean D. Opsomer
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:661-6722014-11-17RePEc:oup:biomet
article
Recursive computing and simulation-free inference for general factorizable models
We illustrate how the recursive algorithm of Reeves & Pettitt (2004) for general factorizable models can be extended to allow exact sampling, maximization of distributions and computation of marginal distributions. All of the methods we describe apply to discrete-valued Markov random fields with nearest neighbour integrations defined on regular lattices; in particular we illustrate that exact inference can be performed for hidden autologistic models defined on moderately sized lattices. In this context we offer an extension of this methodology which allows approximate inference to be carried out for larger lattices without resorting to simulation techniques such as Markov chain Monte Carlo. In particular our work offers the basis for an automatic inference machine for such models. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
661
672
http://hdl.handle.net/10.1093/biomet/asm052
application/pdf
Access to full text is restricted to subscribers.
Nial Friel
Håvard Rue
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:333-3432014-11-17RePEc:oup:biomet
article
Modified estimating functions
In a parametric model the maximum likelihood estimator of a parameter of interest &psgr; may be viewed as the solution to the equation l′-sub-p(&psgr;) = 0, where l-sub-p denotes the profile <?Pub Caret>loglikelihood function. It is well known that the estimating function l′-sub-p(&psgr;) is not unbiased and that this bias can, in some cases, lead to poor estimates of &psgr;. An alternative approach is to use the modified profile likelihood function, or an approximation to the modified profile likelihood function, which yields an estimating function that is approximately unbiased. In many cases, the maximum likelihood estimating functions are unbiased under more general assumptions than those used to construct the likelihood function, for example under first- or second-moment conditions. Although the likelihood function itself may provide valid estimates under moment conditions alone, the modified profile likelihood requires a full parametric model. In this paper, modifications to l′-sub-p(&psgr;) are presented that yield an approximately unbiased estimating function under more general conditions. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
333
343
Thomas A. Severini
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:71-862014-11-17RePEc:oup:biomet
article
A pseudolikelihood method for analyzing interval censored data
We introduce a method based on a pseudolikelihood ratio for estimating the distribution function of the survival time in a mixed-case interval censoring model. In a mixed-case model, an individual is observed a random number of times, and at each time it is recorded whether an event has happened or not. One seeks to estimate the distribution of time to event. We use a Poisson process as the basis of a likelihood function to construct a pseudolikelihood ratio statistic for testing the value of the distribution function at a fixed point, and show that this converges under the null hypothesis to a known limit distribution, that can be expressed as a functional of different convex minorants of a two-sided Brownian motion process with parabolic drift. Construction of confidence sets then proceeds by standard inversion. The computation of the confidence sets is simple, requiring the use of the pool-adjacent-violators algorithm or a standard isotonic regression algorithm. We also illustrate the superiority of the proposed method over competitors based on resampling techniques or on the limit distribution of the maximum pseudolikelihood estimator, through simulation studies, and illustrate the different methods on a dataset involving time to <sc>HIV</sc> seroconversion in a group of haemophiliacs. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
71
86
http://hdl.handle.net/10.1093/biomet/asm011
application/pdf
Access to full text is restricted to subscribers.
Bodhisattva Sen
Moulinath Banerjee
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:230-2372014-11-17RePEc:oup:biomet
article
Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys
<?Pub Caret> Design weights in surveys are often adjusted to accommodate auxiliary information and to meet pre-specified range restrictions, typically via some ad hoc algorithmic adjustment to a generalised regression estimator. In this paper, we present a simple solution to this problem using empirical likelihood methods or generalised regression. We first develop algorithms for computing empirical likelihood estimators and model-calibrated empirical likelihood estimators. The first algorithm solves the computational problem of the empirical likelihood method in general, both in survey and non-survey settings, and theoretically guarantees its convergence. The second exploits properties of the model-calibration method and is particularly simple. The algorithms are adapted for handling benchmark constraints and pre-specified range restrictions on the weight adjustments. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
230
237
J. Chen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:773-7892014-11-17RePEc:oup:biomet
article
On the behaviour of marginal and conditional AIC in linear mixed models
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, <sc>aic</sc>, have been used, based either on the marginal or on the conditional distribution. We show that the marginal <sc>aic</sc> is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional <sc>aic</sc>, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional <sc>aic</sc>, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package (R Development Core Team, 2010) is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
773
789
http://hdl.handle.net/10.1093/biomet/asq042
application/pdf
Access to full text is restricted to subscribers.
Sonja Greven
Thomas Kneib
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:615-6252014-11-17RePEc:oup:biomet
article
Partial inverse regression
In regression with a vector of quantitative predictors, sufficient dimension reduction methods can effectively reduce the predictor dimension, while preserving full regression information and assuming no parametric model. However, all current reduction methods require the sample size n to be greater than the number of predictors p. It is well known that partial least squares can deal with problems with n < p. We first establish a link between partial least squares and sufficient dimension reduction. Motivated by this link, we then propose a new dimension reduction method, entitled partial inverse regression. We show that its sample estimator is consistent, and that its performance is similar to or superior to partial least squares when n < p, especially when the regression model is nonlinear or heteroscedastic. An example involving the spectroscopy analysis of biscuit dough is also given. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
615
625
http://hdl.handle.net/10.1093/biomet/asm043
application/pdf
Access to full text is restricted to subscribers.
Lexin Li
R. Dennis Cook
Chih-Ling Tsai
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:423-4342014-11-17RePEc:oup:biomet
article
Measures for designs in experiments with correlated errors
In this paper we consider optimal design of experiments in the case of correlated observations. We use and further develop the concept of design measures introduced by Pázman & Müller (1998) for the construction of a simple, quick and elegant design algorithm. We support the construction of this algorithm for a general correlation structure by an interpretation in terms of norms. Examples demonstrate that our results are useful for generating exact designs by sampling from the obtained design measures. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
423
434
Werner G. Müller
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:85-982014-11-17RePEc:oup:biomet
article
Covariance matrix selection and estimation via penalised normal likelihood
We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-sub-1 and L-sub-2 penalities are shown to be closely related to Tibshirani's (1996) <EM t="s">LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-sub-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 × 102 covariance matrix. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
85
98
http://hdl.handle.net/10.1093/biomet/93.1.85
text/html
Access to full text is restricted to subscribers.
Jianhua Z. Huang
Naiping Liu
Mohsen Pourahmadi
Linxu Liu
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:107-1222014-11-17RePEc:oup:biomet
article
Analysis of least absolute deviation
We develop a unified L<sub>1</sub>-based analysis-of-variance-type method for testing linear hypotheses. Like the classical L<sub>2</sub>-based analysis of variance, the method is coordinate-free in the sense that it is invariant under any linear transformation of the covariates or regression parameters. Moreover, it allows singular design matrices and heterogeneous error terms. A simple approximation using stochastic perturbation is proposed to obtain cut-off values for the resulting test statistics. Both test statistics and distributional approximations can be computed using standard linear programming. An asymptotic theory is derived for the method. Special cases of one- and multi-way analysis of variance and analysis of covariance models are worked out in detail. The main results of this paper can be extended to general quantile regression. Extensive simulations show that the method works well in practical settings. The method is also applied to a dataset from General Social Surveys. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
107
122
http://hdl.handle.net/10.1093/biomet/asm082
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Zhiliang Ying
Hong Zhang
Lincheng Zhao
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:769-7862014-11-17RePEc:oup:biomet
article
Bayesian Nonparametric Estimation of the Probability of Discovering New Species
We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
769
786
http://hdl.handle.net/10.1093/biomet/asm061
application/pdf
Access to full text is restricted to subscribers.
Antonio Lijoi
Ramsés H. Mena
Igor Prünster
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:641-6542014-11-17RePEc:oup:biomet
article
Dimension reduction and predictor selection in semiparametric models
Dimension reduction in semiparametric regressions includes construction of informative linear combinations and selection of contributing predictors. To reduce the predictor dimension in semiparametric regressions, we propose an ℓ<sub>1</sub>-minimization of sliced inverse regression with the Dantzig selector, and establish a non-asymptotic error bound for the resulting estimator. We also generalize the regularization concept to sliced inverse regression with an adaptive Dantzig selector. This ensures that all contributing predictors are selected with high probability, and that the resulting estimator is asymptotically normal even when the predictor dimension diverges to infinity. Numerical studies confirm our theoretical observations and demonstrate that our proposals are superior to existing estimators in terms of both dimension reduction and predictor selection. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/ast005
application/pdf
Access to full text is restricted to subscribers.
Zhou Yu
Liping Zhu
Heng Peng
Lixing Zhu
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:341-3532014-11-17RePEc:oup:biomet
article
Rank-based inference for the accelerated failure time model
A broad class of <?Pub Caret>rank-based monotone estimating functions is developed for the semiparametric accelerated failure time model with censored observations. The corresponding estimators can be obtained via linear programming, and are shown to be consistent and asymptotically normal. The limiting covariance matrices can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. The new estimators represent consistent roots of the non-monotone estimating equations based on the familiar weighted log-rank statistics. Simulation studies demonstrate that the proposed methods perform well in practical settings. Two real examples are provided. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
341
353
Zhezhen Jin
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:779-7982014-11-17RePEc:oup:biomet
article
A multi-dimensional scaling approach to shape analysis
We propose an alternative to Kendall's shape space for reflection shapes of configurations in <inline-formula><inline-graphic xlink:href="asn050ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> with k labelled vertices, where reflection shape consists of all the geometric information that is invariant under compositions of similarity and reflection transformations. The proposed approach embeds the space of such shapes into the space <inline-formula><inline-graphic xlink:href="asn050ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> of (k - 1) × (k - 1) real symmetric positive semidefinite matrices, which is the closure of an open subset of a Euclidean space, and defines mean shape as the natural projection of Euclidean means in <inline-formula><inline-graphic xlink:href="asn050ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> on to the embedded copy of the shape space. This approach has strong connections with multi-dimensional scaling, and the mean shape so defined gives good approximations to other commonly used definitions of mean shape. We also use standard perturbation arguments for eigenvalues and eigenvectors to obtain a central limit theorem which then enables the application of standard statistical techniques to shape analysis in two or more dimensions. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
779
798
http://hdl.handle.net/10.1093/biomet/asn050
application/pdf
Access to full text is restricted to subscribers.
Ian L. Dryden
Alfred Kume
Huiling Le
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:65-802014-11-17RePEc:oup:biomet
article
Particle approximations of the score and observed information matrix in state space models with application to parameter estimation
Particle methods are popular computational tools for Bayesian inference in nonlinear non-Gaussian state space models. For this class of models, we present two particle algorithms to compute the score vector and observed information matrix recursively. The first algorithm is implemented with computational complexity <inline-formula><inline-graphic xlink:href="ASQ062IM1" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the second with complexity <inline-formula><inline-graphic xlink:href="ASQ062IM2" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where N is the number of particles. Although cheaper, the performance of the <inline-formula><inline-graphic xlink:href="ASQ062IM3" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method degrades quickly, as it relies on the approximation of a sequence of probability distributions whose dimension increases linearly with time. In particular, even under strong mixing assumptions, the variance of the estimates computed with the <inline-formula><inline-graphic xlink:href="ASQ062IM4" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method increases at least quadratically in time. The more expensive <inline-formula><inline-graphic xlink:href="ASQ062IM5" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method relies on a nonstandard particle implementation and does not suffer from this rapid degradation. It is shown how both methods can be used to perform batch and recursive parameter estimation. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
65
80
http://hdl.handle.net/10.1093/biomet/asq062
application/pdf
Access to full text is restricted to subscribers.
George Poyiadjis
Arnaud Doucet
Sumeetpal S. Singh
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:149-1622014-11-17RePEc:oup:biomet
article
Bayesian nonparametric functional data analysis through density estimation
In many modern experimental settings, observations are obtained in the form of functions and interest focuses on inferences about a collection of such functions. We propose a hierarchical model that allows us simultaneously to estimate multiple curves nonparametrically by using dependent Dirichlet process mixtures of Gaussian distributions to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of conductivity and temperature at depth data in the north Atlantic. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
149
162
http://hdl.handle.net/10.1093/biomet/asn054
application/pdf
Access to full text is restricted to subscribers.
Abel Rodríguez
David B. Dunson
Alan E. Gelfand
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:313-3342014-11-17RePEc:oup:biomet
article
Inference on fractal processes using multiresolution approximation
We consider Bayesian inference via Markov chain Monte Carlo for a variety of fractal Gaussian processes on the real line. These models have unknown parameters in the covariance matrix, requiring inversion of a new covariance matrix at each Markov chain Monte Carlo iteration. The processes have no suitable independence properties so this becomes computationally prohibitive. We surmount these difficulties by developing a computational algorithm for likelihood evaluation based on a 'multiresolution approximation' to the original process. The method is computationally very efficient and widely applicable, making likelihood-based inference feasible for large datasets. A simulation study indicates that this approach leads to accurate estimates for underlying parameters in fractal models, including fractional Brownian motion and fractional Gaussian noise, and functional parameters in the recently introduced multifractional Brownian motion. We apply the method to a variety of real datasets and illustrate its application to prediction and to model selection. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
313
334
http://hdl.handle.net/10.1093/biomet/asm025
application/pdf
Access to full text is restricted to subscribers.
Kenneth Falconer
Carmen Fernández
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:673-6892014-11-17RePEc:oup:biomet
article
Optimal adaptive randomized designs for clinical trials
Optimal decision-analytic designs are deterministic. Such designs are appropriately criticized in the context of clinical trials because they are subject to assignment bias. On the other hand, balanced randomized designs may assign an excessive number of patients to a treatment arm that is performing relatively poorly. We propose a compromise between these two extremes, one that achieves some of the good characteristics of both. We introduce a constrained optimal adaptive design for a fully sequential randomized clinical trial with k arms and n patients. An r-design is one for which, at each allocation, each arm has probability at least r of being chosen, 0 ⩽ r ⩽ 1/k. An optimal design among all r-designs is called r-optimal. An r<sub>1</sub>-design is also an r<sub>2</sub>-design if r<sub>1</sub> ⩾ r<sub>2</sub>. A design without constraint is the special case r = 0 and a balanced randomized design is the special case r = 1/k. The optimization criterion is to maximize the expected overall utility in a Bayesian decision-analytic approach, where utility is the sum over the utilities for individual patients over a 'patient horizon' N. We prove analytically that there exists an r-optimal design such that each patient is assigned to a particular one of the arms with probability 1 − (k − 1)r, and to the remaining arms with probability r. We also show that the balanced design is asymptotically r-optimal for any given r, 0 ⩽ r < 1/k, as N/n → ∞. This implies that every r-optimal design is asymptotically optimal without constraint. Numerical computations using backward induction for k = 2 arms show that, in general, this asymptotic optimality feature for r-optimal designs can be accomplished with moderate trial size n if the patient horizon N is large relative to n. We also show that, in a trial with an r-optimal design, r < 1/2, fewer patients are assigned to an inferior arm than when following a balanced design, even for r-optimal designs having the same statistical power as a balanced design. We discuss extensions to various clinical trial settings. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
673
689
http://hdl.handle.net/10.1093/biomet/asm049
application/pdf
Access to full text is restricted to subscribers.
Yi Cheng
Donald A. Berry
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:221-2282014-11-17RePEc:oup:biomet
article
On p-values for smooth components of an extended generalized additive model
The problem of testing smooth components of an extended generalized additive model for equality to zero is considered. Confidence intervals for such components exhibit good across-the-function coverage probabilities if based on the approximate result <inline-formula><inline-graphic xlink:href="ASS048IM1" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where f is the vector of evaluated values for the smooth component of interest and V <sub>f</sub> is the covariance matrix for f according to the Bayesian view of the smoothing process. Based on this result, a Wald-type test of f=0 is proposed. It is shown that care must be taken in selecting the rank used in the test statistic. The method complements previous work by extending applicability beyond the Gaussian case, while considering tests of zero effect rather than testing the parametric hypothesis given by the null space of the component's smoothing penalty. The proposed p-values are routine and efficient to compute from a fitted model, without requiring extra model fits or null distribution simulation. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
221
228
http://hdl.handle.net/10.1093/biomet/ass048
application/pdf
Access to full text is restricted to subscribers.
Simon N. Wood
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:73-842014-11-17RePEc:oup:biomet
article
Confidence regions when the Fisher information is zero
We examine the asymptotic behaviour of confidence regions in identifiable one-dimensional parametric models with smooth likelihood function and information equal to zero at a critical point of the parameter space. Confidence regions are based on inversion of the likelihood ratio test statistic and of some common forms of the score and Wald test statistics. For fixed parameter values other than the critical point, all these statistics have limiting x-super-2-sub-(1) distributions, but for most of them the convergence is not uniform near the critical point. When it is not, confidence regions based on inverting the tests, using the x-super-2-sub-(1) approximation, do not asymptotically have the nominal level. The exception to this lack of locally uniform convergence occurs with the score test standardised by expected, rather than observed, information. For the regions based on the score test standardised by observed information and on the likelihood ratio test, conservative procedures that do not rely on the x-super-2-sub-(1) approximation can be developed, but they are much too conservative near the critical parameter value. The regions based on the Wald tests have asymptotic level less than ½, regardless of the procedure used. Our results suggest that no procedure based solely on the likelihood function will be satisfactory. Whether or not this is the case is an open problem. A simulation study illustrates the results of this paper. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
73
84
Matteo Bottai
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1006-10122014-11-17RePEc:oup:biomet
article
Marginal log-linear parameterization of conditional independence models
Models defined by a set of conditional independence restrictions play an important role in statistical theory and applications, especially, but not only, in graphical modelling. In this paper we identify a subclass of these consisting of hierarchical marginal log-linear models, as defined by Bergsma & Rudas (2002a). Such models are smooth, which implies the applicability of standard asymptotic theory and simplifies interpretation. Furthermore, we give a marginal log-linear parameterization and a minimal specification of the models in the subclass, which implies the applicability of standard methods to compute maximum likelihood estimates and simplifies the calculation of the degrees of freedom of chi-squared statistics to test goodness-of-fit. The utility of the results is illustrated by applying them to block-recursive Markov models associated with chain graphs. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1006
1012
http://hdl.handle.net/10.1093/biomet/asq037
application/pdf
Access to full text is restricted to subscribers.
Tamás Rudas
Wicher P. Bergsma
Renáta Németh
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:827-8412014-11-17RePEc:oup:biomet
article
Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling
We consider parameter-driven models of time series of counts, where the observations are assumed to arise from a Poisson distribution with a mean changing over time according to a latent process. Estimation of these models is carried out within a Bayesian framework using data augmentation and Markov chain Monte Carlo methods. We suggest a new auxiliary mixture sampler, which possesses a Gibbsian transition kernel, where we draw from full conditional distributions belonging to standard distribution families only. Emphasis lies on application to state space modelling of time series of counts, but we show that auxiliary mixture sampling may be applied to a wider range of parameter-driven models, including random-effects models and panel data models based on the Poisson distribution. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
827
841
http://hdl.handle.net/10.1093/biomet/93.4.827
text/html
Access to full text is restricted to subscribers.
Sylvia FrüHwirth-Schnatter
Helga Wagner
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:901-9182014-11-17RePEc:oup:biomet
article
Estimation of latent factors for high-dimensional time series
This paper deals with the dimension reduction of high-dimensional time series based on a lower-dimensional factor process. In particular, we allow the dimension of time series N to be as large as, or even larger than, the length of observed time series T. The estimation of the factor loading matrix and the factor process itself is carried out via an eigenanalysis of a N×N non-negative definite matrix. We show that when all the factors are strong in the sense that the norm of each column in the factor loading matrix is of the order N-super-1/2, the estimator of the factor loading matrix is weakly consistent in L<sub>2</sub>-norm with the convergence rate independent of N. Thus the curse is cancelled out by the blessing of dimensionality. We also establish the asymptotic properties of the estimators when factors are not strong. The proposed method together with the asymptotic properties are illustrated in a simulation study. An application to an implied volatility data set, with a trading strategy derived from the fitted factor model, is also reported. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
901
918
http://hdl.handle.net/10.1093/biomet/asr048
application/pdf
Access to full text is restricted to subscribers.
Clifford Lam
Qiwei Yao
Neil Bathia
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:111-1242014-11-17RePEc:oup:biomet
article
Conditional simulation of max-stable processes
Since many environmental processes are spatial in extent, a single extreme event may affect several locations, and the spatial dependence must be taken into account in an appropriate way. This paper proposes a framework for conditional simulation of max-stable processes and gives closed forms for the regular conditional distributions of Brown--Resnick and Schlather processes. We test the method on simulated data and present applications to extreme rainfall around Zurich and extreme temperatures in Switzerland. The proposed framework provides accurate conditional simulations and can handle problems of realistic size. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
111
124
http://hdl.handle.net/10.1093/biomet/ass067
application/pdf
Access to full text is restricted to subscribers.
C. Dombry
F. Éyi-Minko
M. Ribatet
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:494-4962014-11-17RePEc:oup:biomet
article
Dimension reduction in time series and the dynamic factor model
This note shows that the dimension reduction method proposed by Li & Shedden (2002) is equivalent to the dynamic factor model introduced by Peña & Box (1987). Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
494
496
http://hdl.handle.net/10.1093/biomet/asp009
application/pdf
Access to full text is restricted to subscribers.
Daniel Peña
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:19-352014-11-17RePEc:oup:biomet
article
Model selection and estimation in the Gaussian graphical model
We propose penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model. The methods lead to a sparse and shrinkage estimator of the concentration matrix that is positive definite, and thus conduct model selection and estimation simultaneously. The implementation of the methods is nontrivial because of the positive definite constraint on the concentration matrix, but we show that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization. We propose a <sc>BIC</sc>-type criterion for the selection of the tuning parameter in the penalized likelihood methods. The connection between our methods and existing methods is illustrated. Simulations and real examples demonstrate the competitive performance of the new methods. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
19
35
http://hdl.handle.net/10.1093/biomet/asm018
application/pdf
Access to full text is restricted to subscribers.
Ming Yuan
Yi Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:591-6002014-11-17RePEc:oup:biomet
article
Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models
A semiparametric transformation model comprises a parametric component for covariate effects and a nonparametric component for the baseline hazard/intensity. The Breslow-type estimator has been proposed for estimating the nonparametric component in some inefficient estimation procedures. We show that introducing weights into this estimator leads to nonparametric maximum likelihood estimation, with the weights depending on the martingale residuals. The weighted Breslow-type estimator suggests an iterative reweighting algorithm for nonparametric maximum likelihood estimation, which can be implemented by a weighted variant of the existing algorithms for inefficient estimation, and can be computationally more efficient than an <sc>em</sc>-type algorithm. The weighting idea is further extended to semiparametric transformation models with mismeasured covariates. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
591
600
http://hdl.handle.net/10.1093/biomet/asp032
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:742-7462014-11-17RePEc:oup:biomet
article
Simes' procedure is 'valid on average'
Although Simes' modification of the Bonferroni procedure tends to perform very well, albeit often being slightly liberal for negatively dependent hypotheses, there are special cases where it fails more dramatically. We prove that these special cases are indeed special, applying only to specific significance levels, and obtain a strong bound on the average deviation of the Simes corrected P-value from the true probability over any interval of P-values. From this, it is argued that Simes' procedure should be expected to perform well except for pathological examples. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
742
746
http://hdl.handle.net/10.1093/biomet/93.3.742
text/html
Access to full text is restricted to subscribers.
Einar Andreas Rødland
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:327-3392014-11-17RePEc:oup:biomet
article
Likelihood for component parameters
For a statistical model with data, likelihood for the scalar or vector full parameter &thgr;, of dimension p say, is typically well defined and easily computed. In this paper, we investigate likelihood for a component parameter &psgr;(&thgr;) of dimension d < p and make use of the recent likelihood theory that has been successful in producing highly accurate third-order p-values for scalar parameters of continuous models. The theory leads under moderate regularity to a definitive third-order determination of likelihood for a component parameter &psgr;(&thgr;) of dimension d, where 1 <= d <= p. We use the simple location model on the plane with standard normal errors to motivate the development. The example exhibits most of the key characteristics of the general case and the recent theory then extends the determination of likelihood to the general context. For the scalar interest parameter case with d = 1, the usual determinations are typically of second-order accuracy; the example indicates how the new determination achieves third-order accuracy. The implementation is straightforward and uses familiar ingredients to other determinations, such as the full maximum likelihood value &thgr;ˆ, the constrained value &thgr;˜-sub-&psgr; given &psgr;(&thgr;) = &psgr;, and the observed information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;) for a complementing nuisance parameter &lgr;(&thgr;). It does however require a special version of the nuisance information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;), a version calibrated relative to a symmetric choice of the exponential-type reparameterisation &phgr;(&thgr;) underlying the recent theory, but this is easily computed. Various examples are given and the motivating example is discussed in detail. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
327
339
D. A. S. Fraser
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:233-2382014-11-17RePEc:oup:biomet
article
Some theory for constructing minimum aberration fractional factorial designs
Minimum aberration is the most established criterion for selecting a regular fractional factorial design of maximum resolution. Minimum aberration designs for n runs and n/2 <= m < n factors have previously been constructed using the novel idea of complementary designs. In this paper, an alternative method of construction is developed by relating the wordlength pattern of designs to the so-called 'confounding between experimental runs'. This allows minimum aberration designs to be constructed for n runs and 5n/16 <= m <= n/2 factors as well as for n/2 <= m < n. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
233
238
Neil A. Butler
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:569-5842014-11-17RePEc:oup:biomet
article
Dimension reduction in regression without matrix inversion
Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient dimension reduction provides a promising approach to such problems, by restricting attention to d < n linear combinations of the original p predictors. However, standard methods of sufficient dimension reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
569
584
http://hdl.handle.net/10.1093/biomet/asm038
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Bing Li
Francesca Chiaromonte
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:691-7032014-11-17RePEc:oup:biomet
article
Adaptive Lasso for Cox's proportional hazards model
We investigate the variable selection problem for Cox's proportional hazards model, and propose a unified model selection and estimation procedure with desired theoretical properties and computational convenience. The new method is based on a penalized log partial likelihood with the adaptively weighted L<sub>1</sub> penalty on regression coefficients, providing what we call the adaptive Lasso estimator. The method incorporates different penalties for different coefficients: unimportant variables receive larger penalties than important ones, so that important variables tend to be retained in the selection process, whereas unimportant variables are more likely to be dropped. Theoretical properties, such as consistency and rate of convergence of the estimator, are studied. We also show that, with proper choice of regularization parameters, the proposed estimator has the oracle properties. The convex optimization nature of the method leads to an efficient algorithm. Both simulated and real examples show that the method performs competitively. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
691
703
http://hdl.handle.net/10.1093/biomet/asm037
application/pdf
Access to full text is restricted to subscribers.
Hao Helen Zhang
Wenbin Lu
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:129-1432014-11-17RePEc:oup:biomet
article
The discrimination power of projection pursuit with different density estimators
<?Pub Caret> We explore the properties of projection pursuit discriminant analysis. This discriminant method is very powerful but relies heavily on a univariate density estimate. We show that the procedure based on wavelets maintains the same rate of convergence as with univariate wavelet density estimation. We also show the Bayes risk strong consistency of both the kernel- and wavelet-based methods. Simulated data and real data concerning character recognition show that the method is effective and robust against the curse of dimensionality. The wavelet alternative seems more likely than the kernel counterpart to find an interesting projection. Wavelets are often criticised for giving too wiggly an estimate and for being too localised to give good global properties. In the above context, these potential drawbacks do not weaken the method but the use of wavelets seems to enhance it. A multiple projection generalisation is also considered. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
129
143
Olivier Renaud
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:655-6682014-11-17RePEc:oup:biomet
article
Spherical regression
Methods are introduced for regressing points on the surface of one sphere on points on another. Complex variables and stereographic projection are used to deal with theoretical problems of directional statistics much as they have been used historically to deal with problems in non-Euclidean geometry. The complex plane harbours the group of Möbius transformations, and stereographic projection is used as a bridge to map these Möbius transforms to regression link functions on the surface of a unit sphere. A special form for these links is introduced which employs the complex plane and stereographic projection to effect angular scale changes on the sphere. The family of special forms is closed under orthogonal transformations of the dependent variable and Möbius transformations of the independent variable, and incorporates independence and proper and improper rotations as special cases. Parameter estimation and inference are exemplified using the von Mises--Fisher spherical distribution and vectorcardiogram data. All statistical results and calculations have been formulated in the real domain. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
655
668
T. D. Downs
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:513-5182014-11-17RePEc:oup:biomet
article
Optimal designs for the emax, log-linear and exponential models
We derive locally D- and ED<sub>p</sub>-optimal designs for the exponential, log-linear and three-parameter emax models. For each model the locally D- and ED<sub>p</sub>-optimal designs are supported at the same set of points, while the corresponding weights are different. This indicates that for a given model, D-optimal designs are efficient for estimating the smallest dose that achieves 100p% of the maximum effect in the observed dose range. Conversely, ED<sub>p</sub>-optimal designs also yield good D-efficiencies. We illustrate the results using several examples and demonstrate that locally D- and ED<sub>p</sub>-optimal designs for the emax, log-linear and exponential models are relatively robust with respect to misspecification of the model parameters. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
513
518
http://hdl.handle.net/10.1093/biomet/asq020
application/pdf
Access to full text is restricted to subscribers.
H. Dette
C. Kiss
M. Bevanda
F. Bretz
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:663-6742014-11-17RePEc:oup:biomet
article
On the stick-breaking representation of normalized inverse Gaussian priors
Random probability measures are the main tool for Bayesian nonparametric inference, with their laws acting as prior distributions. Many well-known priors used in practice admit different, though equivalent, representations. In terms of computational convenience, stick-breaking representations stand out. In this paper we focus on the normalized inverse Gaussian process and provide a completely explicit stick-breaking representation for it. This result is of interest both from a theoretical viewpoint and for statistical practice. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
663
674
http://hdl.handle.net/10.1093/biomet/ass023
application/pdf
Access to full text is restricted to subscribers.
S. Favaro
A. Lijoi
I. Prünster
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:933-9442014-11-17RePEc:oup:biomet
article
Some design properties of a rejective sampling procedure
Occasionally, a selected probability sample may appear undesirable with respect to the available auxiliary information. In such a situation, the practitioner might consider rejecting the sample and selecting a new set of sample elements. We consider a procedure in which the probability sample is rejected unless the sample mean of an auxiliary vector is within a specified distance of the population mean. It is proven that the large sample mean and variance of the regression estimator for the rejective sample are the same as those of the regression estimator for the original selection procedure. Likewise, the usual estimator of variance for the regression estimator is appropriate for the rejective sample. In a Monte Carlo experiment, the large sample properties hold for relatively small samples and the Monte Carlo results are in agreement with the theoretical orders of approximation. The efficiency effect of the described rejective sampling is o(n<sub>N</sub>-super- - 1, where n<sub>N</sub> is the expected sample size, but the effect can be important for particular samples. For example, rejective sampling can be used to eliminate those samples that give negative weights for the regression estimator. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
933
944
http://hdl.handle.net/10.1093/biomet/asp042
application/pdf
Access to full text is restricted to subscribers.
Wayne A. Fuller
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:61-702014-11-17RePEc:oup:biomet
article
Interval censoring: identifiability and the constant-sum property
The constant-sum property given in Oller et al. (2004) for censoring models justifies the use of a simplified likelihood to obtain the nonparametric maximum likelihood estimator of the lifetime distribution. In this paper we study the relevance of the constant-sum property in the identifiability of the lifetime distribution. We show that the lifetime distribution is not identifiable outside the class of constant-sum models. We also show that the lifetime probabilities assigned to the observable intervals are identifiable inside the class of constant-sum models. We illustrate all these notions with several examples. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
61
70
http://hdl.handle.net/10.1093/biomet/asm002
application/pdf
Access to full text is restricted to subscribers.
Ramon Oller
Guadalupe Gómez
M. Luz Calle
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:551-5662014-11-17RePEc:oup:biomet
article
Penalized Bregman divergence for large-dimensional regression and classification
Regularization methods are characterized by loss functions measuring data fits and penalty terms constraining model parameters. The commonly used quadratic loss is not suitable for classification with binary responses, whereas the loglikelihood function is not readily applicable to models where the exact distribution of observations is unknown or not fully specified. We introduce the penalized Bregman divergence by replacing the negative loglikelihood in the conventional penalized likelihood with Bregman divergence, which encompasses many commonly used loss functions in the regression analysis, classification procedures and machine learning literature. We investigate new statistical properties of the resulting class of estimators with the number p<sub>n</sub> of parameters either diverging with the sample size n or even nearly comparable with n, and develop statistical inference tools. It is shown that the resulting penalized estimator, combined with appropriate penalties, achieves the same oracle property as the penalized likelihood estimator, but asymptotically does not rely on the complete specification of the underlying distribution. Furthermore, the choice of loss function in the penalized classifiers has an asymptotically relatively negligible impact on classification performance. We illustrate the proposed method for quasilikelihood regression and binary classification with simulation evaluation and real-data application. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
551
566
http://hdl.handle.net/10.1093/biomet/asq033
application/pdf
Access to full text is restricted to subscribers.
Chunming Zhang
Yuan Jiang
Yi Chai
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:323-3372014-11-17RePEc:oup:biomet
article
A generalized Dantzig selector with shrinkage tuning
The Dantzig selector performs variable selection and model fitting in linear regression. It uses an L<sub>1</sub> penalty to shrink the regression coefficients towards zero, in a similar fashion to the lasso. While both the lasso and Dantzig selector potentially do a good job of selecting the correct variables, they tend to overshrink the final coefficients. This results in an unfortunate trade-off. One can either select a high shrinkage tuning parameter that produces an accurate model but poor coefficient estimates or a low shrinkage parameter that produces more accurate coefficients but includes many irrelevant variables. We extend the Dantzig selector to fit generalized linear models while eliminating overshrinkage of the coefficient estimates, and develop a computationally efficient algorithm, similar in nature to least angle regression, to compute the entire path of coefficient estimates. A simulation study illustrates the advantages of our approach relative to others. We apply the methodology to two datasets. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
323
337
http://hdl.handle.net/10.1093/biomet/asp013
application/pdf
Access to full text is restricted to subscribers.
Gareth M. James
Peter Radchenko
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:363-3822014-11-17RePEc:oup:biomet
article
Estimating vaccine efficacy from small outbreaks
Let C-sub-V and C-sub-0 denote the number of cases among vaccinated and unvaccinated individuals, respectively, and let &ugr; be the proportion of individuals vaccinated. The quantity ê = 1--(1--&ugr;)C-sub-V/(&ugr;C-sub-0) = 1--(relative attack rate) is the most used estimator of the effectiveness of a vaccine to protect against infection. For a wide class of vaccine responses, a family of transmission models and three types of community settings, this paper investigates what ê actually estimates. It does so under the assumption that the community is large and the vaccination coverage is adequate to prevent major outbreaks of the infectious disease, so that only data on minor outbreaks are available. For a community of homogeneous individuals who mix uniformly, it is found that ê estimates a quantity with the interpretation of 1--(mean susceptibility, per contact, of vaccinees relative to unvaccinated individuals). We provide a standard error for ê in this setting. For a community with some heterogeneity ê can be a very misleading estimator of the effectiveness of the vaccine. When individuals have inherent differences, ê estimates a quantity that depends also on the inherent susceptibilities of different types of individual and on the vaccination coverage for different types. For a community of households, ê estimates a quantity that depends on the rate of transmission within households and on the reduction in infectivity induced by the vaccine. In communities that are structured, into households or age-groups, it is possible that ê estimates a value that is negative even when the vaccine reduces both susceptibility and infectivity. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
363
382
Niels G. Becker
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:623-6402014-11-17RePEc:oup:biomet
article
Adaptive Bayesian multivariate density estimation with Dirichlet mixtures
We show that rate-adaptive multivariate density estimation can be performed using Bayesian methods based on Dirichlet mixtures of normal kernels with a prior distribution on the kernel's covariance matrix parameter. We derive sufficient conditions on the prior specification that guarantee convergence to a true density at a rate that is minimax optimal for the smoothness class to which the true density belongs. No prior knowledge of smoothness is assumed. The sufficient conditions are shown to hold for the Dirichlet location mixture-of-normals prior with a Gaussian base measure and an inverse Wishart prior on the covariance matrix parameter. Locally Hölder smoothness classes and their anisotropic extensions are considered. Our study involves several technical novelties, including sharp approximation of finitely differentiable multivariate densities by normal mixtures and a new sieve on the space of such densities. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
623
640
http://hdl.handle.net/10.1093/biomet/ast015
application/pdf
Access to full text is restricted to subscribers.
Weining Shen
Surya T. Tokdar
Subhashis Ghosal
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:173-1822014-11-17RePEc:oup:biomet
article
Power of edge exclusion tests in graphical Gaussian models
Asymptotic multivariate normal approximations to the joint distributions of edge exclusion test statistics for saturated graphical Gaussian models are derived. Non-signed and signed square-root versions of the likelihood ratio, Wald and score test statistics are considered. Noncentral chi-squared approximations are also considered for the non-signed versions. These approximations are used to estimate the power of edge exclusion tests and an example is presented. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
173
182
http://hdl.handle.net/10.1093/biomet/92.1.173
text/html
Access to full text is restricted to subscribers.
M. Fátima Salgueiro
Peter W. F. Smith
John W. McDonald
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:401-4092014-11-17RePEc:oup:biomet
article
A type of restricted maximum likelihood estimator of variance components in generalised linear mixed models
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (REML) corrects this problem by using the likelihood of a set of residual contrasts and is generally considered superior. However, this original restricted maximum likelihood definition does not directly extend beyond linear models. We propose a REML-type estimator for generalised linear mixed models by correcting the bias in the profile score function of the variance components. The proposed estimator has the same consistency properties as the maximum likelihood estimator if the number of parameters in the mean and variance components models remains fixed. However, the estimator of the variance components has a smaller finite sample bias. A simulation study with a logistic mixed model shows <?Pub Caret>that the proposed estimator is effective in correcting the downward bias in the maximum likelihood estimator. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
401
409
J. G. Liao
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:807-8242014-11-17RePEc:oup:biomet
article
Most-predictive design points for functional data predictors
We suggest a way of reducing the very high dimension of a functional predictor, X, to a low number of dimensions chosen so as to give the best predictive performance. Specifically, if X is observed on a fine grid of design points t<sub>1</sub>,…, t<sub>r</sub>, we propose a method for choosing a small subset of these, say t<sub>i<sub>1</sub></sub>,…, t<sub>i<sub>k</sub></sub>, to optimize the prediction of a response variable, Y. The values t<sub>i<sub>j</sub></sub> are referred to as the most predictive design points, or covariates, for a given value of k, and are computed using information contained in a set of independent observations (X<sub>i</sub>, Y<sub>i</sub>) of (X, Y). The algorithm is based on local linear regression, and calculations can be accelerated using linear regression to preselect the design points. Boosting can be employed to further improve the predictive performance. We illustrate the usefulness of our ideas through simulations and examples drawn from chemometrics, and we develop theoretical arguments showing that the methodology can be applied successfully in a range of settings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
807
824
http://hdl.handle.net/10.1093/biomet/asq058
application/pdf
Access to full text is restricted to subscribers.
F. Ferraty
P. Hall
P. Vieu
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:587-6002014-11-17RePEc:oup:biomet
article
Robust functional estimation using the median and spherical principal components
We present robust estimators for the mean and the principal components of a stochastic process in <inline-formula><inline-graphic xlink:href="asn031ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>. Robustness and asymptotic properties of the estimators are studied theoretically, by simulation and by example. It is shown that the proposed estimators are generally more robust to outliers than the commonly used sample mean and principal components, although their properties depend on the spacings of the eigenvalues of the covariance function. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
587
600
http://hdl.handle.net/10.1093/biomet/asn031
application/pdf
Access to full text is restricted to subscribers.
Daniel Gervini
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:419-4332014-11-17RePEc:oup:biomet
article
Efficient scalable schemes for monitoring a large number of data streams
The sequential changepoint detection problem is studied in the context of global online monitoring of a large number of independent data streams. We are interested in detecting an occurring event as soon as possible, but we do not know when the event will occur, nor do we know which subset of data streams will be affected by the event. A family of scalable schemes is proposed based on the sum of the local cumulative sum, <sc>cusum</sc>, statistics from each individual data stream, and is shown to asymptotically minimize the detection delays for each and every possible combination of affected data streams, subject to the global false alarm constraint. The usefulness and limitations of our asymptotic optimality results are illustrated by numerical simulations and heuristic arguments. The Appendices contain a probabilistic result on the first epoch to simultaneous record values for multiple independent random walks. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
419
433
http://hdl.handle.net/10.1093/biomet/asq010
application/pdf
Access to full text is restricted to subscribers.
Y. Mei
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:403-4162014-11-17RePEc:oup:biomet
article
Maximum smoothed likelihood for multivariate mixtures
We introduce an algorithm for estimating the parameters in a finite mixture of completely unspecified multivariate components in at least three dimensions under the assumption of conditionally independent coordinate dimensions. We prove that this algorithm, based on a majorization-minimization idea, possesses a desirable descent property just as any <sc>em</sc> algorithm does. We discuss the similarities between our algorithm and a related one, the so-called nonlinearly smoothed <sc>em</sc> algorithm for the non-mixture setting. We also demonstrate via simulation studies that the new algorithm gives very similar results to another algorithm that has been shown empirically to be effective but that does not satisfy any descent property. We provide code for implementing the new algorithm in a publicly available R package. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
403
416
http://hdl.handle.net/10.1093/biomet/asq079
application/pdf
Access to full text is restricted to subscribers.
M. Levine
D. R. Hunter
D. Chauveau
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:261-2782014-11-17RePEc:oup:biomet
article
Variable selection in high-dimensional linear models: partially faithful distributions and the <sc>pc</sc>-simple algorithm
We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the <sc>pc</sc> algorithm (Spirtes et al., 2000), which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
261
278
http://hdl.handle.net/10.1093/biomet/asq008
application/pdf
Access to full text is restricted to subscribers.
P. Bühlmann
M. Kalisch
M. H. Maathuis
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:607-6222014-11-17RePEc:oup:biomet
article
Continuously additive models for nonlinear functional regression
We introduce continuously additive models, which can be viewed as extensions of additive regression models with vector predictors to the case of infinite-dimensional predictors. This approach produces a class of flexible functional nonlinear regression models, where random predictor curves are coupled with scalar responses. In continuously additive modelling, integrals taken over a smooth surface along graphs of predictor functions relate the predictors to the responses in a nonlinear fashion. We use tensor product basis expansions to fit the smooth regression surface that characterizes the model. In a theoretical investigation, we show that the predictions obtained from fitting continuously additive models are consistent and asymptotically normal. We also consider extensions to generalized responses. The proposed class of models outperforms existing functional regression models in simulations and real-data examples. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
607
622
http://hdl.handle.net/10.1093/biomet/ast004
application/pdf
Access to full text is restricted to subscribers.
Hans-Georg Müller
Yichao Wu
Fang Yao
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:603-6202014-11-17RePEc:oup:biomet
article
On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, <inline-formula><inline-graphic xlink:href="asq031ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, <inline-formula><inline-graphic xlink:href="asq031ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, based on the pseudolikelihood <inline-formula><inline-graphic xlink:href="asq031ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asq031ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> is a consistent estimator of <inline-formula><inline-graphic xlink:href="asq031ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, the nuisance parameter. We show that the asymptotic distribution of T under H<sub>0</sub> is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, <inline-formula><inline-graphic xlink:href="asq031ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, or the true value of the nuisance parameter, <inline-formula><inline-graphic xlink:href="asq031ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
603
620
http://hdl.handle.net/10.1093/biomet/asq031
application/pdf
Access to full text is restricted to subscribers.
Yong Chen
Kung-Yee Liang
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:423-4362014-11-17RePEc:oup:biomet
article
Goodness of fit of biplots and correspondence analysis
The present paper examines proportional goodness of fit to variables recorded on individuals, the variances and covariances of the variables, and the form and distances between individuals. No single plot displays all three optimally in the sense of least squares. However, even aspects which are non-optimally fitted by biplots and Benzecri plots often closely preserve the optimal fit. This is shown by means of a preservation-of-fit function which depends on the type of display and on the ratio of the second to the first singular value of the data matrix. This function is never below 0·5, so at least half the fit is always preserved, and it is close to 1 unless the ratio of the singular values is small. That explains the frequently observed similarity of the various biplots and the Benzecri plot and the fact that they usually lead to the same conclusions. It follows that in many applications it is reasonable to use either the symmetric biplot or the Benzecri plot or a compromise maximin preservation plot, and that the difference between these three is usually unimportant. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
423
436
K. Ruben Gabriel
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:75-892014-11-17RePEc:oup:biomet
article
Covariate-adjusted regression
We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate-adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/92.1.75
text/html
Access to full text is restricted to subscribers.
Damla Şenturk
Hans-Georg Muller
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:187-2042014-11-17RePEc:oup:biomet
article
Two-stage sampling from a prediction point of view when the cluster sizes are unknown
We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n<sub>0</sub> of sampled clusters, they differ significantly, but for large n<sub>0</sub>, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 - α and are slightly less than 1 - α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 - 2α, being raised to 1 - α for a modified interval based on the <inline-formula><inline-graphic xlink:href="asm098ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> distribution. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
187
204
http://hdl.handle.net/10.1093/biomet/asm098
application/pdf
Access to full text is restricted to subscribers.
Jan F. Bjørnstad
Elinor Ytterstad
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:999-10052014-11-17RePEc:oup:biomet
article
Positive Association Among Three Binary Variables and Cross-Product Ratios
We show that, when the three-way association level among the three binary variables, X, U<sub>1</sub> and U<sub>2</sub> is fixed, D<sub>P</sub> = pr(X = 1¦U<sub>1</sub> = 1) - pr(X = 1¦U<sub>1</sub> = 0) increases as the cross-product ratio of U<sub>1</sub> and U<sub>2</sub> increases under the assumption that X is positively associated with U<sub>1</sub> and U<sub>2</sub>. We then discuss some implications of this property. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
999
1005
http://hdl.handle.net/10.1093/biomet/asm075
application/pdf
Access to full text is restricted to subscribers.
Stephen E. Fienberg
Sung-Ho Kim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:141-1502014-11-17RePEc:oup:biomet
article
A moving average Cholesky factor model in covariance modelling for longitudinal data
We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have a moving average and log-innovation interpretation and are modelled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the coefficient estimates. Furthermore, we study a local search algorithm, computationally more efficient than traditional all subset selection, based on <sc>bic</sc> for model selection, and show its model selection consistency. Thus, a conjecture of Pan & MacKenzie (2003) is verified. We demonstrate the finite-sample performance of the method via analysis of data on CD4 trajectories and through simulations. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
141
150
http://hdl.handle.net/10.1093/biomet/asr068
application/pdf
Access to full text is restricted to subscribers.
Weiping Zhang
Chenlei Leng
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:139-1562014-11-17RePEc:oup:biomet
article
Covariate-adjusted precision matrix estimation with an application in genetical genomics
Motivated by analysis of genetical genomics data, we introduce a sparse high-dimensional multivariate regression model for studying conditional independence relationships among a set of genes adjusting for possible genetic effects. The precision matrix in the model specifies a covariate-adjusted Gaussian graph, which presents the conditional dependence structure of gene expression after the confounding genetic effects on gene expression are taken into account. We present a covariate-adjusted precision matrix estimation method using a constrained ℓ<sub>1</sub> minimization, which can be easily implemented by linear programming. Asymptotic convergence rates in various matrix norms and sign consistency are established for the estimators of the regression coefficients and the precision matrix, allowing both the number of genes and the number of the genetic variants to diverge. Simulation shows that the proposed method results in significant improvements in both precision matrix estimation and graphical structure selection when compared to the standard Gaussian graphical model assuming constant means. The proposed method is applied to yeast genetical genomics data for the identification of the gene network among a set of genes in the mitogen-activated protein kinase pathway. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
139
156
http://hdl.handle.net/10.1093/biomet/ass058
application/pdf
Access to full text is restricted to subscribers.
T. Tony Cai
Hongzhe Li
Weidong Liu
Jichun Xie
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:411-4262014-11-17RePEc:oup:biomet
article
Non-finite Fisher information and homogeneity: an EM approach
Even simple examples of finite mixture models can fail to fulfil the regularity conditions that are routinely assumed in standard parametric inference problems. Many methods have been investigated for testing for homogeneity in finite mixture models, for example, but all rely on regularity conditions including the finiteness of the Fisher information and the space of the mixing parameter being a compact subset of some Euclidean space. Very simple examples where such assumptions fail include mixtures of two geometric distributions and two exponential distributions, and, more generally, mixture models in scale distribution families. To overcome these difficulties, we propose and study an <sc>em</sc>-test statistic, which has a simple limiting distribution for examples in this paper. Simulations show that the <sc>em</sc>-test has accurate Type I errors and is more efficient than existing methods when they are applicable. A real example is included. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
411
426
http://hdl.handle.net/10.1093/biomet/asp011
application/pdf
Access to full text is restricted to subscribers.
P. Li
J. Chen
P. Marriott
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:375-3882014-11-17RePEc:oup:biomet
article
Risk-adjusted monitoring of time to event
Recently there has been interest in risk-adjusted cumulative sum charts, <sc>CUSUMs</sc>, to monitor the performance of e.g. hospitals, taking into account the heterogeneity of patients. Even though many outcomes involve time, only conventional regression models are commonly used. In this article we investigate how time to event models may be used for monitoring purposes. We consider monitoring using <sc>CUSUMs</sc> based on the partial likelihood ratio between an out-of-control state and an in-control state. We consider both proportional and non-proportional alternatives, as well as a head start. Against proportional alternatives, we present an analytic method of computing the expected number of observed events before stopping or the probability of stopping before a given observed number of events. In a stationary set-up, the former is roughly proportional to the average run length in calendar time. Adding a head start changes the threshold only slightly if the expected number of events until hitting is used as a criterion. However, it changes the threshold substantially if a false alarm probability is used. In simulation studies, charts based on survival analysis perform better than simpler monitoring schemes. We present one example from retail finance and one medical application. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
375
388
http://hdl.handle.net/10.1093/biomet/asq004
application/pdf
Access to full text is restricted to subscribers.
A. Gandy
J. T. Kvaløy
A. Bottle
F. Zhou
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:17-342014-11-17RePEc:oup:biomet
article
The multivariate beta process and an extension of the Polya tree model
We introduce a novel stochastic process that we term the multivariate beta process. The process is defined for modelling-dependent random probabilities and has beta marginal distributions. We use this process to define a probability model for a family of unknown distributions indexed by covariates. The marginal model for each distribution is a Polya tree prior. An important feature of the proposed prior is the easy centring of the nonparametric model around any parametric regression model. We use the model to implement nonparametric inference for survival distributions. The nonparametric model that we introduce can be adopted to extend the support of prior distributions for parametric regression models. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
17
34
http://hdl.handle.net/10.1093/biomet/asq072
application/pdf
Access to full text is restricted to subscribers.
Lorenzo Trippa
Peter Müller
Wesley Johnson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-1025a2014-07-28RePEc:oup:biomet
article
Amendments and Corrections
The paper included comparison of a 12-factor, 16-run design to randomly generated Latin hypercube designs and U-designs, with respect to the properties of their alias matrices. An error in a computer program led to incorrect computation of the properties of the alias matrix of the orthogonal design. A corrected version of Table 2 is provided here. The orthogonal Latin hypercube design still has better properties than the best of 100 random designs, but the differences are less striking than those in our original table. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025-a
text/html
Access to full text is restricted to subscribers.
David M. Steinberg
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:219-2252014-07-28RePEc:oup:biomet
article
Estimating genetic association parameters from family data
We consider the problem of estimating a parameter theta, reflecting association between a disease and genotypes of a genetic polymorphism, using nuclear family data. In many applications, some parental genotypes are missing, and the distribution of these genotypes is unknown. Since misspecification of this distribution can bias estimators for theta, we consider estimating functions that are unbiased, regardless of how the distribution is specified. We call the resulting estimators parental-genotype-robust. Rabinowitz (2002) has proposed a constrained optimisation method for obtaining locally optimal unbiased tests of the null hypothesis of no association. We use a similar method to derive estimating functions that yield parental-genotype-robust estimators with minimum variance in the class of all such estimators. We extend the estimating functions to obtain parental-genotype-robust estimators when theta is a vector of unknown parameters, and show that the estimating functions enjoy a certain optimality property. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
219
225
Alice S. Whittemore
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-505a2014-07-28RePEc:oup:biomet
article
"Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505-a
text/html
Access to full text is restricted to subscribers.
Shaun R. Seaman
Sylvia Richardson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10252014-07-28RePEc:oup:biomet
article
Amendments and Corrections
It has been brought to our attention that the implicit expression (6) for the estimator with general warping function had been derived earlier by B. Ronn, in an unpublished technical report of the Royal Veterinary and Agricultural University, Frederiksberg. However, the actual implementation and computation of the estimators are very different in our paper from in the technical report. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025
text/html
Access to full text is restricted to subscribers.
D. Gervini
T. Gasser
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-5052014-07-28RePEc:oup:biomet
article
"A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505
text/html
Access to full text is restricted to subscribers.
Albert W. Marshall
Ingram Olkin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:177-1932014-07-28RePEc:oup:biomet
article
Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data
For independent data, it is well known that kernel methods and spline methods are essentially asymptotically equivalent (Silverman, 1984). However, recent work of Welsh et al. (2002) shows that the same is not true for clustered/longitudinal data. Splines and conventional kernels are different in localness and ability to account for the within-cluster correlation. We show that a smoothing spline estimator is asymptotically equivalent to a recently proposed seemingly unrelated kernel estimator of Wang (2003) for any working covariance matrix. We show that both estimators can be obtained iteratively by applying conventional kernel or spline smoothing to pseudo-observations. This result allows us to study the asymptotic properties of the smoothing spline estimator by deriving its asymptotic bias and variance. We show that smoothing splines are consistent for an arbitrary working covariance and have the smallest variance when assuming the true covariance. We further show that both the seemingly unrelated kernel estimator and the smoothing spline estimator are nonlocal unless working independence is assumed but have asymptotically negligible bias. Their finite sample performance is compared through simulations. Our results justify the use of efficient, non-local estimators such as smoothing splines for clustered/longitudinal data. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
177
193
Xihong Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:240-2452014-07-28RePEc:oup:biomet
article
Revisiting simple linear regression with autocorrelated errors
This paper studies properties of ordinary and generalised least squares estimators in a simple linear regression with stationary autocorrelated errors. Explicit expressions for the variances of the regression parameter estimators are derived for some common time series autocorrelation structures, including a first-order autoregression and general moving averages. Applications of the results include confidence intervals and an example where the variance of the trend slope estimator does not increase with increasing autocorrelation. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
240
245
Jaechoul Lee
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:899-9142013-01-01RePEc:oup:biomet
article
Simultaneous supervised clustering and feature selection over a graph
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
899
914
http://hdl.handle.net/10.1093/biomet/ass038
application/pdf
Access to full text is restricted to subscribers.
Xiaotong Shen
Hsin-Cheng Huang
Wei Pan
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:945-9582013-01-01RePEc:oup:biomet
article
Penalized balanced sampling
Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz--Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz--Thompson estimation dominates a variety of standard strategies. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
945
958
http://hdl.handle.net/10.1093/biomet/ass033
application/pdf
Access to full text is restricted to subscribers.
F. J. Breidt
G. Chauvet
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:915-9282013-01-01RePEc:oup:biomet
article
On the sparsity of signals in a random sample
This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
915
928
http://hdl.handle.net/10.1093/biomet/ass039
application/pdf
Access to full text is restricted to subscribers.
Binyan Jiang
Wei-Liem Loh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:799-8112013-01-01RePEc:oup:biomet
article
Choosing trajectory and data type when classifying functional data
In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
799
811
http://hdl.handle.net/10.1093/biomet/ass011
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Tapabrata Maiti
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:865-8772013-01-01RePEc:oup:biomet
article
A two-stage dimension-reduction method for transformed responses and its applications
Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
865
877
http://hdl.handle.net/10.1093/biomet/ass042
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:775-7862013-01-01RePEc:oup:biomet
article
Classification based on a permanental process with cyclic approximation
We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2--3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
775
786
http://hdl.handle.net/10.1093/biomet/ass047
application/pdf
Access to full text is restricted to subscribers.
J. Yang
K. Miescke
P. McCullagh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:929-9442013-01-01RePEc:oup:biomet
article
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction
Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
929
944
http://hdl.handle.net/10.1093/biomet/ass044
application/pdf
Access to full text is restricted to subscribers.
James Y. Dai
Charles Kooperberg
Michael Leblanc
Ross L. Prentice
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:981-9882013-01-01RePEc:oup:biomet
article
Finite population estimators in stochastic search variable selection
Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz--Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
981
988
http://hdl.handle.net/10.1093/biomet/ass040
application/pdf
Access to full text is restricted to subscribers.
Merlise A. Clyde
Joyee Ghosh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:995-10002013-01-01RePEc:oup:biomet
article
Proportional mean residual life model for right-censored length-biased data
To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika 77, 409--10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
995
1000
http://hdl.handle.net/10.1093/biomet/ass049
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
Ying Qing Chen
Chong-Zhi Di
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:879-8982013-01-01RePEc:oup:biomet
article
Scaled sparse linear regression
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
879
898
http://hdl.handle.net/10.1093/biomet/ass043
application/pdf
Access to full text is restricted to subscribers.
Tingni Sun
Cun-Hui Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:763-7742013-01-01RePEc:oup:biomet
article
Testing one hypothesis twice in observational studies
In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
763
774
http://hdl.handle.net/10.1093/biomet/ass032
application/pdf
Access to full text is restricted to subscribers.
P. R. Rosenbaum
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:973-9802013-01-01RePEc:oup:biomet
article
Statistical properties of an early stopping rule for resampling-based multiple testing
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
973
980
http://hdl.handle.net/10.1093/biomet/ass051
application/pdf
Access to full text is restricted to subscribers.
Hui Jiang
Julia Salzman
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:1001-10072013-01-01RePEc:oup:biomet
article
An efficient empirical likelihood approach for estimating equations with missing data
We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
1001
1007
http://hdl.handle.net/10.1093/biomet/ass045
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Yongsong Qin
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:851-8642013-01-01RePEc:oup:biomet
article
Bidirectional discrimination with application to data visualization
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
851
864
http://hdl.handle.net/10.1093/biomet/ass029
application/pdf
Access to full text is restricted to subscribers.
Hanwen Huang
Yufeng Liu
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:959-9722013-01-01RePEc:oup:biomet
article
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
959
972
http://hdl.handle.net/10.1093/biomet/ass036
application/pdf
Access to full text is restricted to subscribers.
Ronald W. Butler
Douglas A. Bronson
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:813-8322013-01-01RePEc:oup:biomet
article
Dispersion operators and resistant second-order functional data analysis
Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M-test based on a spectrally truncated version of the Hilbert--Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
813
832
http://hdl.handle.net/10.1093/biomet/ass037
application/pdf
Access to full text is restricted to subscribers.
David Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:989-9942013-01-01RePEc:oup:biomet
article
Compatible weighted proper scoring rules
Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
989
994
http://hdl.handle.net/10.1093/biomet/ass046
application/pdf
Access to full text is restricted to subscribers.
P. G. M. Forbes
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:787-7982013-01-01RePEc:oup:biomet
article
Orthogonalization of vectors with minimal adjustment
Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
787
798
http://hdl.handle.net/10.1093/biomet/ass041
application/pdf
Access to full text is restricted to subscribers.
Paul H. Garthwaite
Frank Critchley
Karim Anaya-Izquierdo
Emmanuel Mubwandarikwa
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:833-8492013-01-01RePEc:oup:biomet
article
A geometric approach to projective shape and the cross ratio
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
833
849
http://hdl.handle.net/10.1093/biomet/ass055
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Kanti V. Mardia
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:757-7642014-06-14RePEc:oup:biomet
article
Empirical likelihood methods for two-dimensional shape analysis
We consider empirical likelihood for the mean similarity shape of objects in two dimensions described by labelled landmarks. The restriction to two dimensions permits the representation of preshapes as complex unit vectors. We focus on the use of empirical likelihood techniques for the construction of confidence regions for the mean shape and for testing the hypothesis of a common mean shape across several populations. Theoretical properties and computational details are discussed and the results of a simulation study are presented. Our results show that bootstrap calibrated empirical likelihood performs well in practice in the planar shape setting. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
757
764
http://hdl.handle.net/10.1093/biomet/asq028
application/pdf
Access to full text is restricted to subscribers.
Getulio J. A. Amaral
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:567-5842014-06-14RePEc:oup:biomet
article
Shape curves and geodesic modelling
A family of shape curves is introduced that is useful for modelling the changes in shape in a series of geometrical objects. The relationship between the preshape sphere and the shape space is used to define a general family of curves based on horizontal geodesics on the preshape sphere. Methods for fitting geodesics and more general curves in the non-Euclidean shape space of point sets are discussed, based on minimizing sums of squares of Procrustes distances. Likelihood-based inference is considered. We illustrate the ideas by carrying out statistical analysis of two-dimensional landmarks on rats' skulls at various times in their development and three-dimensional landmarks on lumbar vertebrae from three primate species. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
567
584
http://hdl.handle.net/10.1093/biomet/asq027
application/pdf
Access to full text is restricted to subscribers.
Kim Kenobi
Ian L. Dryden
Huiling Le
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:361-3742014-06-14RePEc:oup:biomet
article
Efficient estimation in multi-phase case-control studies
In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979), work by Breslow & Cain (1988), Scott & Wild (1991), Breslow & Holubkov (1997) and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations used in a Newton--Raphson or Fisher-scoring maximization of the prospective loglikelihood. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
361
374
http://hdl.handle.net/10.1093/biomet/asq009
application/pdf
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:765-7722014-06-14RePEc:oup:biomet
article
Strictly stationary solutions of autoregressive moving average equations
Necessary and sufficient conditions for the existence of a strictly stationary solution of the equations defining an autoregressive moving average process driven by an independent and identically distributed noise sequence are determined. No moment assumptions on the driving noise sequence are made. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
765
772
http://hdl.handle.net/10.1093/biomet/asq034
application/pdf
Access to full text is restricted to subscribers.
Peter J. Brockwell
Alexander Lindner
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:347-3602014-06-14RePEc:oup:biomet
article
A theory for testing hypotheses under covariate-adaptive randomization
The covariate-adaptive randomization method was proposed for clinical trials long ago but little theoretical work has been done for statistical inference associated with it. Practitioners often apply test procedures available for simple randomization, which is controversial since procedures valid under simple randomization may not be valid under other randomization schemes. In this paper, we provide some theoretical results for testing hypotheses after covariate-adaptive randomization. We show that one way to obtain a valid test procedure is to use a correct model between outcomes and covariates, including those used in randomization. We also show that the simple two sample t-test, without using any covariate, is conservative under covariate-adaptive biased coin randomization in terms of its Type I error, and that a valid bootstrap t-test can be constructed. The powers of several tests are examined theoretically and empirically. Our study provides guidance for applications and sheds light on further research in this area. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
347
360
http://hdl.handle.net/10.1093/biomet/asq014
application/pdf
Access to full text is restricted to subscribers.
Jun Shao
Xinxin Yu
Bob Zhong
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:405-4182014-06-14RePEc:oup:biomet
article
Interval estimation for drop-the-losers designs
In the first stage of a two-stage, drop-the-losers design, a candidate for the best treatment is selected. At the second stage, additional observations are collected to decide whether the candidate is actually better than the control. The design also allows the investigator to stop the trial for ethical reasons at the end of the first stage if there is already strong evidence of futility or superiority. Two types of tests have recently been developed, one based on the combined means and the other based on the combined p-values, but corresponding interval estimators are unavailable except in special cases. The problem is that, in most cases, the interval estimators depend on the mean configuration of all treatments in the first stage, which is unknown. In this paper, we prove a basic stochastic ordering lemma that enables us to bridge the gap between hypothesis testing and interval estimation. The proposed confidence intervals achieve the nominal confidence level in certain special cases. Simulations show that decisions based on our intervals are usually more powerful than those based on existing methods. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
405
418
http://hdl.handle.net/10.1093/biomet/asq003
application/pdf
Access to full text is restricted to subscribers.
Samuel S. Wu
Weizhen Wang
Mark C. K. Yang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:585-6012014-06-14RePEc:oup:biomet
article
A class of grouped Brunk estimators and penalized spline estimators for monotone regression
We study a class of monotone univariate regression estimators. We use B-splines to approximate an underlying regression function and estimate spline coefficients based on grouped data. We investigate asymptotic properties of two monotone estimators: a grouped Brunk estimator and a penalized monotone estimator. These estimators are consistent at the boundary and their mean square errors achieve optimal convergence rates under suitable assumptions of the true regression function. Asymptotic distributions are developed and are shown to be independent of spline degrees and the number of knots. Simulation results and car data illustrate performance of the proposed estimators. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asq029
application/pdf
Access to full text is restricted to subscribers.
Xiao Wang
Jinglai Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:647-6592014-06-14RePEc:oup:biomet
article
Sufficient cause interactions for categorical and ordinal exposures with three levels
Definitions are given for weak and strong sufficient cause interactions in settings in which the outcome is binary and in which there are two exposures of interest that are categorical or ordinal. Weak sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes some other value. Strong sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes any other value. Empirical conditions are derived for such interactions when exposures have two or three levels and are related to regression coefficients in linear and log-linear models. When the exposures are binary, the notions of a weak and a strong sufficient cause interaction coincide, but not when the exposures are categorical or ordinal. The results are applied to examples concerning gene-gene and gene-environment interactions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asq030
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:321-3322014-06-14RePEc:oup:biomet
article
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
321
332
http://hdl.handle.net/10.1093/biomet/asq006
application/pdf
Access to full text is restricted to subscribers.
D. Y. Lin
D. Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:305-3192014-06-14RePEc:oup:biomet
article
Semiparametric dimension reduction estimation for mean response with missing data
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
305
319
http://hdl.handle.net/10.1093/biomet/asq005
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Jing Qin
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:481-4962014-06-14RePEc:oup:biomet
article
Likelihood ratio statistics based on an integrated likelihood
An integrated likelihood depends only on the parameter of interest and the data, so it can be used as a standard likelihood function for likelihood-based inference. In this paper, the higher-order asymptotic properties of the signed integrated likelihood ratio statistic for a scalar parameter of interest are considered. These results are used to construct a modified integrated likelihood ratio statistic and to suggest a class of prior densities to use in forming the integrated likelihood. The properties of the integrated likelihood ratio statistic are compared to those of the standard likelihood ratio statistic. Several examples show that the integrated likelihood ratio statistic can be a useful alternative to the standard likelihood ratio statistic. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
481
496
http://hdl.handle.net/10.1093/biomet/asq015
application/pdf
Access to full text is restricted to subscribers.
T. A. Severini
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:699-7122014-06-14RePEc:oup:biomet
article
A semiparametric additive rate model for recurrent events with an informative terminal event
We propose a semiparametric additive rate model for modelling recurrent events in the presence of a terminal event. The dependence between recurrent events and terminal event is nonparametric. A general transformation model is used to model the terminal event. We construct an estimating equation for parameter estimation and derive the asymptotic distributions of the proposed estimators. Simulation studies demonstrate that the proposed inference procedure performs well in realistic settings. Application to a medical study is presented. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
699
712
http://hdl.handle.net/10.1093/biomet/asq039
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Jianwen Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:465-4802014-06-14RePEc:oup:biomet
article
The horseshoe estimator for sparse signals
This paper proposes a new approach to sparsity, called the horseshoe estimator, which arises from a prior based on multivariate-normal scale mixtures. We describe the estimator's advantages over existing approaches, including its robustness, adaptivity to different sparsity patterns and analytical tractability. We prove two theorems: one that characterizes the horseshoe estimator's tail robustness and the other that demonstrates a super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using both real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers obtained by Bayesian model averaging under a point-mass mixture prior. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
465
480
http://hdl.handle.net/10.1093/biomet/asq017
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Nicholas G. Polson
James G. Scott
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:727-7402014-06-14RePEc:oup:biomet
article
Estimating species richness by a Poisson-compound gamma model
We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/asq026
application/pdf
Access to full text is restricted to subscribers.
Ji-Ping Wang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:631-6452014-06-14RePEc:oup:biomet
article
Detecting simultaneous changepoints in multiple sequences
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
631
645
http://hdl.handle.net/10.1093/biomet/asq025
application/pdf
Access to full text is restricted to subscribers.
Nancy R. Zhang
David O. Siegmund
Hanlee Ji
Jun Z. Li
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:279-2942014-06-14RePEc:oup:biomet
article
Dimension reduction for non-elliptically distributed predictors: second-order methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify first-order methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to second-order methods, such as sliced average variance estimation and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment-based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asq016
application/pdf
Access to full text is restricted to subscribers.
Yuexiao Dong
Bing Li
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:333-3452014-06-14RePEc:oup:biomet
article
Evidence factors in observational studies
Some experiments involve more than one random assignment of treatments to units. An analogous situation arises in certain observational studies, although randomization is not used, so each assignment may be biased. If each assignment is suspect, it is natural to ask whether there are separate pieces of information, dependent upon different assumptions, and perhaps whether conclusions about treatment effects are not critically dependent upon one or another suspect assumption. The design of an observational study contains evidence factors if it permits several statistically independent tests of the same null hypothesis about treatment effects, where these tests rely on different assumptions about treatment assignments at several levels of assignment. Two designs and two empirical examples are considered, one example of each design. In the dose-control design, there are matched pairs of a treated subject and an untreated control, and doses of treatment vary between pairs for treated subjects; this yields two evidence factors. In the varied intensity design, there are matched sets with two treated subjects and one or more untreated controls, where the two treated subjects within the same matched set receive different doses of treatment, and in a technically different way, the design yields two evidence factors. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
333
345
http://hdl.handle.net/10.1093/biomet/asq019
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:741-7552014-06-14RePEc:oup:biomet
article
Properties of nested sampling
Nested sampling is a simulation method for approximating marginal likelihoods. We establish that nested sampling has an approximation error that vanishes at the standard Monte Carlo rate and that this error is asymptotically Gaussian. It is shown that the asymptotic variance of the nested sampling approximation typically grows linearly with the dimension of the parameter. We discuss the applicability and efficiency of nested sampling in realistic problems, and compare it with two current methods for computing marginal likelihood. Finally, we propose an extension that avoids resorting to Markov chain Monte Carlo simulation to obtain the simulated points. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/asq021
application/pdf
Access to full text is restricted to subscribers.
Nicolas Chopin
Christian P. Robert
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:389-4042014-06-14RePEc:oup:biomet
article
Calibrating parametric subject-specific risk estimation
For modern evidence-based medicine, decisions on disease prevention or management strategies are often guided by a risk index system. For each individual, the system uses his/her baseline information to estimate the risk of experiencing a future disease-related clinical event. Such a risk scoring scheme is usually derived from an overly simplified parametric model. To validate a model-based procedure, one may perform a standard global evaluation via, for instance, a receiver operating characteristic analysis. In this article, we propose a method to calibrate the risk index system at a subject level. Specifically, we developed point and interval estimation procedures for t-year mortality rates conditional on the estimated parametric risk score. The proposals are illustrated with a dataset from a large clinical trial with post-myocardial infarction patients. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
389
404
http://hdl.handle.net/10.1093/biomet/asq012
application/pdf
Access to full text is restricted to subscribers.
T. Cai
L. Tian
Hajime Uno
Scott D. Solomon
L. J. Wei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:661-6822014-06-14RePEc:oup:biomet
article
Bounded, efficient and doubly robust estimation with inverse weighting
Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on a previous nonparametric likelihood approach and propose new doubly robust estimators, which have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare the new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
661
682
http://hdl.handle.net/10.1093/biomet/asq035
application/pdf
Access to full text is restricted to subscribers.
Zhiqiang Tan
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:505-5122014-06-14RePEc:oup:biomet
article
Copula inference under censoring
This paper discusses copula model selection procedures and goodness-of-fit tests under censoring. The proposed methodology is based on a comparison of nonparametric and model-based estimators of the probability integral transformation, K. New weighted estimators for K are introduced. The resulting tests are compared to an existing approach by simulation and illustrated with an example involving bleeding changes in a woman's reproductive history. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
505
512
http://hdl.handle.net/10.1093/biomet/asq011
application/pdf
Access to full text is restricted to subscribers.
M. L. Lakhal-Chaieb
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:683-6982014-06-14RePEc:oup:biomet
article
Analysis of cohort studies with multivariate and partially observed disease classification data
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
683
698
http://hdl.handle.net/10.1093/biomet/asq036
application/pdf
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Samiran Sinha
W. Ryan Diver
Heather Spencer Feigelson
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:519-5382014-06-14RePEc:oup:biomet
article
Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
519
538
http://hdl.handle.net/10.1093/biomet/asq038
application/pdf
Access to full text is restricted to subscribers.
Ali Shojaie
George Michailidis
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:539-5502014-06-14RePEc:oup:biomet
article
A new approach to Cholesky-based covariance regularization in high dimensions
In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well-known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretation always results in a positive definite estimator. In particular, one can obtain a positive definite banded estimator of the covariance matrix at the same computational cost as the popular banded estimator of Bickel & Levina (2008b), which is not guaranteed to be positive definite. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint, and compare the numerical performance of several methods in simulations and on a sonar data example. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
539
550
http://hdl.handle.net/10.1093/biomet/asq022
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
Elizaveta Levina
Ji Zhu
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:713-7262014-06-14RePEc:oup:biomet
article
Attributable fraction functions for censored event times
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
713
726
http://hdl.handle.net/10.1093/biomet/asq023
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:447-4642014-06-14RePEc:oup:biomet
article
A sequential smoothing algorithm with linear computational cost
In this paper we propose a new particle smoother that has a computational complexity of O(N), where N is the number of particles. This compares favourably with the O(N-super-2) computational cost of most smoothers. The new method also overcomes some degeneracy problems in existing algorithms. Through simulation studies we show that substantial gains in efficiency are obtained for practical amounts of computational cost. It is shown both through these simulation studies, and by the analysis of an athletics dataset, that our new method also substantially outperforms the simple filter-smoother, the only other smoother with computational cost that is O(N). Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
447
464
http://hdl.handle.net/10.1093/biomet/asq013
application/pdf
Access to full text is restricted to subscribers.
Paul Fearnhead
David Wyncoll
Jonathan Tawn
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:435-4462014-06-14RePEc:oup:biomet
article
Estimating linear dependence between nonstationary time series using the locally stationary wavelet model
Large volumes of neuroscience data comprise multiple, nonstationary electrophysiological or neuroimaging time series recorded from different brain regions. Accurately estimating the dependence between such neural time series is critical, since changes in the dependence structure are presumed to reflect functional interactions between neuronal populations. We propose a new dependence measure, derived from a bivariate locally stationary wavelet time series model. Since wavelets are localized in both time and scale, this approach leads to a natural, local and multi-scale estimate of nonstationary dependence. Our methodology is illustrated by application to a simulated example, and to electrophysiological data relating to interactions between the rat hippocampus and prefrontal cortex during working memory and decision making. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
435
446
http://hdl.handle.net/10.1093/biomet/asq007
application/pdf
Access to full text is restricted to subscribers.
J. Sanderson
P. Fryzlewicz
M. W. Jones
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:621-6302014-06-14RePEc:oup:biomet
article
Accurate and robust tests for indirect inference
In this paper we propose accurate parameter and over-identification tests for indirect inference. Under the null hypothesis the new tests are asymptotically χ-super-2-distributed with a relative error of order n-super- - 1. They exhibit better finite sample accuracy than classical tests for indirect inference, which have the same asymptotic distribution but an absolute error of order n-super- - 1-2. Robust versions of the tests are also provided. We illustrate their accuracy in nonlinear regression, Poisson regression with overdispersion and diffusion models. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
621
630
http://hdl.handle.net/10.1093/biomet/asq040
application/pdf
Access to full text is restricted to subscribers.
Veronika Czellar
Elvezio Ronchetti
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:497-5042014-06-14RePEc:oup:biomet
article
Objective Bayes and conditional inference in exponential families
Objective Bayes methodology is considered for conditional frequentist inference about a canonical parameter in a multi-parameter exponential family. A condition is derived under which posterior Bayes quantiles match the conditional frequentist coverage to a higher-order approximation in terms of the sample size. This condition is on the model, not on the prior, and it ensures that any first-order probability matching prior in the unconditional sense automatically yields higher-order conditional probability matching. Objective Bayes methods are compared to parametric bootstrap and analytic methods for higher-order conditional frequentist inference. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
497
504
http://hdl.handle.net/10.1093/biomet/asq002
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:861-8722013-08-16RePEc:oup:biomet
article
Nonparametric estimation of the probability of illness in the illness-death model under cross-sectional sampling
Cross-sectional sampling is an attractive design that saves resources but results in biased data. For proper inference, one should first discover the bias function and then weigh observations appropriately. We consider cross-sectioning of the illness-death model with the aim of estimating the probability of visiting the illness state before death. We develop simple consistent and asymptotically normal estimators under various assumptions on the model and data collection and, in particular, compare designs with and without a follow-up. These designs are common in surveillance of hospital acquired infections, but estimators currently in use do not properly correct the bias. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asp046
application/pdf
Access to full text is restricted to subscribers.
M. Mandel
R. Fluss
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:975-9822013-08-16RePEc:oup:biomet
article
Maximum likelihood estimation using composite likelihoods for closed exponential families
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
975
982
http://hdl.handle.net/10.1093/biomet/asp056
application/pdf
Access to full text is restricted to subscribers.
Kanti V. Mardia
John T. Kent
Gareth Hughes
Charles C. Taylor
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:887-9012013-08-16RePEc:oup:biomet
article
Marginal hazards model for case-cohort studies with multiple disease outcomes
Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
887
901
http://hdl.handle.net/10.1093/biomet/asp059
application/pdf
Access to full text is restricted to subscribers.
S. Kang
J. Cai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1019-10232013-08-16RePEc:oup:biomet
article
A note on a conjectured sharpness principle for probabilistic forecasting with calibration
This note proves a weak sharpness principle as conjectured by Gneiting et al. (2007) in connection with probabilistic forecasting subject to calibration constraints. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1019
1023
http://hdl.handle.net/10.1093/biomet/asp054
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:835-8452013-08-16RePEc:oup:biomet
article
Bayesian lasso regression
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
835
845
http://hdl.handle.net/10.1093/biomet/asp047
application/pdf
Access to full text is restricted to subscribers.
Chris Hans
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:847-8602013-08-16RePEc:oup:biomet
article
Generalized fiducial inference for wavelet regression
We apply Fisher's fiducial idea to wavelet regression, first developing a general methodology for handling model selection problems within the fiducial framework. We propose fiducial-based methods for wavelet curve estimation and the construction of pointwise confidence intervals. We show that these confidence intervals have asymptotically correct coverage. Simulations demonstrate that they possess promising empirical properties. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
847
860
http://hdl.handle.net/10.1093/biomet/asp050
application/pdf
Access to full text is restricted to subscribers.
Jan Hannig
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1024-10242013-08-16RePEc:oup:biomet
article
'Generalized method of moments estimation for linear regression with clustered failure time data'
4
2009
96
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/asp061
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:917-9322013-08-16RePEc:oup:biomet
article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
917
932
http://hdl.handle.net/10.1093/biomet/asp041
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:971-9742013-08-16RePEc:oup:biomet
article
Construction of orthogonal Latin hypercube designs
Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/asp058
application/pdf
Access to full text is restricted to subscribers.
Fasheng Sun
Min-Qian Liu
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:983-9902013-08-16RePEc:oup:biomet
article
Adaptive approximate Bayesian computation
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
983
990
http://hdl.handle.net/10.1093/biomet/asp052
application/pdf
Access to full text is restricted to subscribers.
Mark A. Beaumont
Jean-Marie Cornuet
Jean-Michel Marin
Christian P. Robert
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:873-8862013-08-16RePEc:oup:biomet
article
Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach
To estimate the lifetime distribution of right-censored length-biased data, we propose a pseudo-partial likelihood approach that allows us to derive two nonparametric estimators. With its closed-form estimators and explicit limiting variances, this approach retains the simplicity of conditional analysis, and has only a small efficiency loss compared with the unconditional analysis. Under some regularity conditions, we show that the two estimators are uniformly consistent and converge weakly to Gaussian processes. A simulation study demonstrates that the proposed estimators have satisfactory finite-sample performance. Application to an Alzheimer's disease study is reported. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
873
886
http://hdl.handle.net/10.1093/biomet/asp064
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:821-8342013-08-16RePEc:oup:biomet
article
Bayesian analysis of matrix normal graphical models
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
821
834
http://hdl.handle.net/10.1093/biomet/asp049
application/pdf
Access to full text is restricted to subscribers.
Hao Wang
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:781-7922013-08-16RePEc:oup:biomet
article
A new look at time series of counts
This paper proposes a simple new model for stationary time series of integer counts. Previous work has focused on thinning methods and classical time series autoregressive moving-average difference equations; in contrast, our methods use a renewal process to generate a correlated sequence of Bernoulli trials. By superpositioning independent copies of such processes, stationary series with binomial, Poisson, geometric or any other discrete marginal distribution can be readily constructed. The model class proposed is parsimonious, non-Markov and readily generates series with either short- or long-memory autocovariances. The model can be fitted with linear prediction techniques for stationary series. As an example, a stationary series with binomial marginal distributions is fitted to the number of rainy days in 210 consecutive weeks at Key West, Florida. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
781
792
http://hdl.handle.net/10.1093/biomet/asp057
application/pdf
Access to full text is restricted to subscribers.
Yunwei Cui
Robert Lund
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:998-10042013-08-16RePEc:oup:biomet
article
A note on the variance of doubly-robust G-estimators
A recursive variance calculation is derived for doubly-robust G-estimators for dynamic treatment regimes in a multi-interval setting. Treatment decision parameters are not assumed to be shared across treatment intervals; this independence of parameters permits sequential estimation of the G-estimators' variance when G-estimation is performed in a sequential fashion. The recursive variance calculation is both natural and computationally feasible. This development can easily be adapted to other complex estimating procedures that require nuisance parameter estimation. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
998
1004
http://hdl.handle.net/10.1093/biomet/asp043
application/pdf
Access to full text is restricted to subscribers.
E. E. M. Moodie
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:945-9562013-08-16RePEc:oup:biomet
article
Sliced space-filling designs
We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
945
956
http://hdl.handle.net/10.1093/biomet/asp044
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:903-9152013-08-16RePEc:oup:biomet
article
Tests and confidence intervals for secondary endpoints in sequential clinical trials
In a sequential clinical trial whose stopping rule depends on the primary endpoint, inference on secondary endpoints is an important long-standing problem. Ignoring the possibility of early stopping based on the primary endpoint may result in substantial bias. To address this problem, a commonly used approach is to develop bias correction by estimating the bias in the case of bivariate normal outcomes and appealing to joint asymptotic normality of the statistics associated with the primary and secondary endpoints. We propose herein a new approach that uses resampling and a novel ordering scheme in the sample space of sequential statistics observed up to a stopping time. This approach is shown to provide accurate inference in complex clinical trials, including time-sequential trials with survival endpoints and covariates. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
903
915
http://hdl.handle.net/10.1093/biomet/asp063
application/pdf
Access to full text is restricted to subscribers.
Tze Leung Lai
Mei-Chiung Shih
Zheng Su
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:957-9702013-08-16RePEc:oup:biomet
article
Nested Latin hypercube designs
We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
957
970
http://hdl.handle.net/10.1093/biomet/asp045
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1012-10182013-08-16RePEc:oup:biomet
article
A note on adaptive Bonferroni and Holm procedures under dependence
Hochberg & Benjamini (1990) first presented adaptive procedures for controlling familywise error rate. However, until now, it has not been proved that these procedures control the familywise error rate. We introduce a simplified version of Hochberg & Benjamini's adaptive Bonferroni and Holm procedures. Assuming a conditional dependence model, we prove that the former procedure controls the familywise error rate in finite samples while the latter controls it approximately. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1012
1018
http://hdl.handle.net/10.1093/biomet/asp048
application/pdf
Access to full text is restricted to subscribers.
Wenge Guo
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:793-8042013-08-16RePEc:oup:biomet
article
Bias reduction in exponential family nonlinear models
In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
793
804
http://hdl.handle.net/10.1093/biomet/asp055
application/pdf
Access to full text is restricted to subscribers.
Ioannis Kosmidis
David Firth
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:991-9972013-08-16RePEc:oup:biomet
article
Semiparametric methods for evaluating risk prediction markers in case-control studies
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
991
997
http://hdl.handle.net/10.1093/biomet/asp040
application/pdf
Access to full text is restricted to subscribers.
Ying Huang
Margaret Sullivan Pepe
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:421-437.2015-07-30RePEc:oup:biomet
article
Effective dimension reduction for sparse functional data
We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method.
2
2015
102
Biometrika
421
437
http://hdl.handle.net/10.1093/biomet/asv006
application/pdf
Access to full text is restricted to subscribers.
F. Yao
E. Lei
Y. Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:831-847.2015-07-30RePEc:oup:biomet
article
Interactive model building for Q-learning
Evidence-based rules for optimal treatment allocation are key components in the quest for efficient, effective health-care delivery. Q-learning, an approximate dynamic programming algorithm, is a popular method for estimating optimal sequential decision rules from data. Q-learning requires the modelling of nonsmooth, nonmonotone transformations of the data, complicating the search for adequately expressive, yet parsimonious, statistical models. The default Q-learning working model is multiple linear regression, which not only is misspecified under most data-generating models but also results in nonregular regression estimators, complicating inference. We propose an alternative strategy for estimating optimal sequential decision rules for which the requisite statistical modelling does not depend on nonsmooth, nonmonotone transformed data, does not result in nonregular regression estimators, is consistent under more data-generation models than is Q-learning, results in estimated sequential decision rules that have better sampling properties, and is amenable to established statistical methods for exploratory data analysis, model building and validation. We derive the new method, IQ-learning, via an interchange in the order of certain steps in Q-learning. In simulated experiments, IQ-learning improves upon Q-learning in terms of integrated mean-squared error and power. The method is illustrated using data from a study of major depressive disorder.
4
2014
101
Biometrika
831
847
http://hdl.handle.net/10.1093/biomet/asu043
application/pdf
Access to full text is restricted to subscribers.
Eric B. Laber
Kristin A. Linn
Leonard A. Stefanski
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:267-280.2015-07-30RePEc:oup:biomet
article
Hierarchical recognition of sparse patterns in large-scale simultaneous inference
We study how to separate signals from noisy data accurately and determine the patterns of the selected signals. Controlling the inflation of false positive errors is important in large-scale simultaneous inference but has not been addressed in the pattern recognition literature. We develop a decision-theoretic framework and formulate the sparse pattern recognition problem as a simultaneous inference problem with multiple decision trees. Oracle and adaptive classifiers are proposed for maximizing the expected number of true positives subject to a constraint on the overall false positive rate. Existing results on multiple testing are extended by allowing more than two states of nature, hierarchical decision-making and new error rate concepts.
2
2015
102
Biometrika
267
280
http://hdl.handle.net/10.1093/biomet/asv012
application/pdf
Access to full text is restricted to subscribers.
Wenguang Sun
Zhi Wei
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:319-332.2015-07-30RePEc:oup:biomet
article
Latin hypercube designs with controlled correlations and multi-dimensional stratification
Various methods have been proposed to construct Latin hypercube designs with small correlations. Orthogonal arrays have been used to construct Latin hypercube designs with multi-dimensional stratification. To integrate these two ideas, we propose a method to construct Latin hypercube designs with both controlled correlations and multi-dimensional stratification. For numerical integration, the constructed designs not only filter out lower-dimensional variance components as effectively as ordinary orthogonal array-based Latin hypercube designs, but also filter out bilinear terms more effectively. The proposed construction method entails no iterative searches. Sampling properties of the constructed designs are derived. Examples are given to illustrate the proposed construction method and the theoretical results.
2
2014
101
Biometrika
319
332
http://hdl.handle.net/10.1093/biomet/ast062
application/pdf
Access to full text is restricted to subscribers.
Jiajie Chen
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:151-168.2015-07-30RePEc:oup:biomet
article
Doubly robust learning for estimating individualized treatment with censored data
Individualized treatment rules recommend treatments based on individual patient characteristics in order to maximize clinical benefit. When the clinical outcome of interest is survival time, estimation is often complicated by censoring. We develop nonparametric methods for estimating an optimal individualized treatment rule in the presence of censored data. To adjust for censoring, we propose a doubly robust estimator which requires correct specification of either the censoring model or survival model, but not both; the method is shown to be Fisher consistent when either model is correct. Furthermore, we establish the convergence rate of the expected survival under the estimated optimal individualized treatment rule to the expected survival under the optimal individualized treatment rule. We illustrate the proposed methods using simulation study and data from a Phase III clinical trial on non-small cell lung cancer.
1
2015
102
Biometrika
151
168
http://hdl.handle.net/10.1093/biomet/asu050
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Zhao
D. Zeng
E. B. Laber
R. Song
M. Yuan
M. R. Kosorok
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:205-218.2015-07-30RePEc:oup:biomet
article
Model averaging and weight choice in linear mixed-effects models
This article studies model averaging for linear mixed-effects models. We establish an unbiased estimator of the squared risk for the model averaging, and use the estimator as a criterion for choosing weights. The resulting model average estimator is proved to be asymptotically optimal under some regularity conditions. Simulation experiments show it is superior or comparable to estimators based on the final models selected by the commonly-used methods and some existing averaging procedures. The proposed procedure is applied to data from an AIDS clinic trial.
1
2014
101
Biometrika
205
218
http://hdl.handle.net/10.1093/biomet/ast052
application/pdf
Access to full text is restricted to subscribers.
Xinyu Zhang
Guohua Zou
Hua Liang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:978-984.2015-07-30RePEc:oup:biomet
article
Tests for Kronecker envelope models in multilinear principal components analysis
We develop likelihood methods for the Kronecker envelope model in the principal components analysis of matrix observations that have a multivariate normal distribution. Maximum likelihood estimates are derived and the associated likelihood ratio statistic for a test of this Knonecker envelope model is obtained. The asymptotic null distribution of the likelihood ratio statistic is derived as some nuisance parameters approach infinity, and a saddlepoint approximation for this limiting distribution is given. An alternative composite test for the Kronecker envelope model, which can be used when the sample size is too small to use the likelihood ratio test, is also given. Simulation results demonstrate the accuracy of our approximations.
4
2014
101
Biometrika
978
984
http://hdl.handle.net/10.1093/biomet/asu029
application/pdf
Access to full text is restricted to subscribers.
James R. Schott
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:393-408.2015-07-30RePEc:oup:biomet
article
A combined estimating function approach for fitting stationary point process models
A composite likelihood technique based on pairwise contributions provides a computationally simple but potentially inefficient approach for fitting spatial point process models. We propose a new estimation procedure that improves the efficiency. Our approach combines estimating functions derived from pairwise composite likelihood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate the efficacy of our proposed method through a simulation study and an application to the longleaf pine data.
2
2014
101
Biometrika
393
408
http://hdl.handle.net/10.1093/biomet/ast069
application/pdf
Access to full text is restricted to subscribers.
C. Deng
R. P. Waagepetersen
Y. Guan
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:849-864.2015-07-30RePEc:oup:biomet
article
Estimation of a semiparametric natural direct effect model incorporating baseline covariates
Establishing cause-effect relationships is a standard goal of empirical science. Once the existence of a causal relationship is established, the precise causal mechanism involved becomes a topic of interest. A particularly popular type of mechanism analysis concerns questions of mediation, i.e., to what extent an effect is direct, and to what extent it is mediated by a third variable. A semiparametric theory has recently been proposed that allows multiply robust estimation of direct and mediated marginal effect functionals in observational studies (Tchetgen Tchetgen & Shpitser, 2012). In this paper we extend the theory to handle parametric models of natural direct and indirect effects within levels of pre-exposure variables with an identity or log link function, where the model for the observed data likelihood is otherwise unrestricted. We show that estimation is generally infeasible in such a model because of the curse of dimensionality associated with the required estimation of auxiliary conditional densities or expectations, given high-dimensional covariates. Thus, we consider multiply robust estimation and propose a more general model which assumes that a subset, but not the entirety, of several working models holds.
4
2014
101
Biometrika
849
864
http://hdl.handle.net/10.1093/biomet/asu044
application/pdf
Access to full text is restricted to subscribers.
E. J. Tchetgen Tchetgen
I. Shpitser
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:365-375.2015-07-30RePEc:oup:biomet
article
Locally ϕp-optimal designs for generalized linear models with a single-variable quadratic polynomial predictor
Finding optimal designs for generalized linear models is a challenging problem. Recent research has identified the structure of optimal designs for generalized linear models with single or multiple unrelated explanatory variables that appear as first-order terms in the predictor. We consider generalized linear models with a single-variable quadratic polynomial as the predictor under a popular family of optimality criteria. When the design region is unrestricted, our results establish that optimal designs can be found within a subclass of designs based on a small support with symmetric structure. We show that the same conclusion holds with certain restrictions on the design region, but in other cases a larger subclass may have to be considered. In addition, we derive explicit expressions for some D-optimal designs.
2
2014
101
Biometrika
365
375
http://hdl.handle.net/10.1093/biomet/ast071
application/pdf
Access to full text is restricted to subscribers.
Hsin-Ping Wu
John Stufken
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:711-718.2015-07-30RePEc:oup:biomet
article
Posterior expectation based on empirical likelihoods
Posterior expectation is widely used as a Bayesian point estimator. In this note we extend it from parametric models to nonparametric models using empirical likelihood, and develop a nonparametric analogue of James–Stein estimation. We use the Laplace method to establish asymptotic approximations to our proposed posterior expectations, and show by simulation that they are often more efficient than the corresponding classical nonparametric procedures, especially when the underlying data are skewed.
3
2014
101
Biometrika
711
718
http://hdl.handle.net/10.1093/biomet/asu018
application/pdf
Access to full text is restricted to subscribers.
A. Vexler
G. Tao
A. D. Hutson
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:37-55.2015-07-30RePEc:oup:biomet
article
Information criteria for variable selection under sparsity
The optimization of an information criterion in a variable selection procedure leads to an additional bias, which can be substantial for sparse, high-dimensional data. One can compensate for the bias by applying shrinkage while estimating within the selected models. This paper presents modified information criteria for use in variable selection and estimation without shrinkage. The analysis motivating the modified criteria follows two routes. The first, which we explore for signal-plus-noise observations only, proceeds by comparing estimators with and without shrinkage. The second, discussed for general regression models, describes the optimization or selection bias as a double-sided effect, which we call a mirror effect: among the numerous insignificant variables, those with large, noisy values appear more valuable than an arbitrary variable, while in fact they carry more noise than an arbitrary variable. The mirror effect is investigated for Akaike’s information criterion and for Mallows’ Cp, with special attention paid to the latter criterion as a stopping rule in a least-angle regression routine. The result is a new stopping rule, which focuses not on the quality of a lasso shrinkage selection but on the least-squares estimator without shrinkage within the same selection.
1
2014
101
Biometrika
37
55
http://hdl.handle.net/10.1093/biomet/ast055
application/pdf
Access to full text is restricted to subscribers.
Maarten Jansen
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:397-408.2015-07-30RePEc:oup:biomet
article
Jump information criterion for statistical inference in estimating discontinuous curves
Nonparametric regression analysis when the regression function is discontinuous has many applications. Existing methods for estimating a discontinuous regression curve usually assume that the number of jumps in the regression curve is known beforehand, which is unrealistic in some situations. Although there has been research on estimation of a discontinuous regression curve when the number of jumps is unknown, the problem remains mostly open because such research often requires assumptions on other related quantities, such as a known minimum jump size. In this paper we propose a jump information criterion which consists of a term measuring the fidelity of the estimated regression curve to the observed data and a penalty related to the number of jumps and the jump sizes. The number of jumps can then be determined by minimizing our criterion. Theoretical and numerical studies show that our method works well.
2
2015
102
Biometrika
397
408
http://hdl.handle.net/10.1093/biomet/asv018
application/pdf
Access to full text is restricted to subscribers.
Zhiming Xia
Peihua Qiu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:971-977.2015-07-30RePEc:oup:biomet
article
Generalized Cornfield conditions for the risk difference
A central question in causal inference with observational studies is the sensitivity of conclusions to unmeasured confounding. The classical Cornfield condition allows us to assess whether an unmeasured binary confounder can explain away the observed relative risk of the exposure on the outcome. It states that for an unmeasured confounder to explain away an observed relative risk, the association between the unmeasured confounder and the exposure and the association between the unmeasured confounder and the outcome must both be larger than the observed relative risk. In this paper, we extend the classical Cornfield condition in three directions. First, we consider analogous conditions for the risk difference and allow for a categorical, not just a binary, unmeasured confounder. Second, we provide more stringent thresholds that the maximum of the above-mentioned associations must satisfy, rather than weaker conditions that both must satisfy. Third, we show that all the earlier results on Cornfield conditions hold under weaker assumptions than previously used. We illustrate the potential applications by real examples, where our new conditions give more information than the classical ones.
4
2014
101
Biometrika
971
977
http://hdl.handle.net/10.1093/biomet/asu030
application/pdf
Access to full text is restricted to subscribers.
Peng Ding
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:494-499.2015-07-30RePEc:oup:biomet
article
Optimum designs for two treatments with unequal variances in the presence of covariates
Optimum designs are described for two treatments with different variances when covariates are included in the model. The designs, a generalization of Neyman allocation, are required in personalized medicine to model the effect of covariates on the choice of treatment. The use of the designs in clinical trials is indicated. D-optimality of the designs is established using results from Kiefer’s general equivalence theorem. The results are obtained with the use of surprisingly elementary algebra.
2
2015
102
Biometrika
494
499
http://hdl.handle.net/10.1093/biomet/asu071
application/pdf
Access to full text is restricted to subscribers.
A. C. Atkinson
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:1-14.2015-07-30RePEc:oup:biomet
article
Warped functional regression
A characteristic feature of functional data is the presence of phase variability in addition to amplitude variability. Existing functional regression methods do not handle time variability in an explicit and efficient way. In this paper we introduce a functional regression method that incorporates time warping as an intrinsic part of the model. The method achieves good predictive power in a parsimonious way and allows unified statistical inference about phase and amplitude components. The asymptotic distribution of the estimators is derived and their finite-sample properties are studied by simulation. An application involving ground-level ozone trajectories is presented.
1
2015
102
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asu054
application/pdf
Access to full text is restricted to subscribers.
Daniel Gervini
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:377-392.2015-07-30RePEc:oup:biomet
article
Logistic regression for spatial Gibbs point processes
We propose a computationally efficient technique, based on logistic regression, for fitting Gibbs point process models to spatial point pattern data. The score of the logistic regression is an unbiased estimating function and is closely related to the pseudolikelihood score. Implementation of our technique does not require numerical quadrature, and thus avoids a source of bias inherent in other methods. For stationary processes, we prove that the parameter estimator is strongly consistent and asymptotically normal, and propose a variance estimator. We demonstrate the efficiency and practicability of the method on a real dataset and in a simulation study.
2
2014
101
Biometrika
377
392
http://hdl.handle.net/10.1093/biomet/ast060
application/pdf
Access to full text is restricted to subscribers.
Adrian Baddeley
Jean-François Coeurjolly
Ege Rubak
Rasmus Waagepetersen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:219-228.2015-07-30RePEc:oup:biomet
article
Identifiability of Gaussian structural equation models with equal error variances
We consider structural equation models in which variables can be written as a function of their parents and noise terms, which are assumed to be jointly independent. Corresponding to each structural equation model is a directed acyclic graph describing the relationships between the variables. In Gaussian structural equation models with linear functions, the graph can be identified from the joint distribution only up to Markov equivalence classes, assuming faithfulness. In this work, we prove full identifiability in the case where all noise variables have the same variance: the directed acyclic graph can be recovered from the joint Gaussian distribution. Our result has direct implications for causal inference: if the data follow a Gaussian structural equation model with equal error variances, then, assuming that all variables are observed, the causal structure can be inferred from observational data only. We propose a statistical method and an algorithm based on our theoretical findings.
1
2014
101
Biometrika
219
228
http://hdl.handle.net/10.1093/biomet/ast043
application/pdf
Access to full text is restricted to subscribers.
J. Peters
P. Bühlmann
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:85-101.2015-07-30RePEc:oup:biomet
article
Graph estimation with joint additive models
In recent years, there has been considerable interest in estimating conditional independence graphs in high dimensions. Most previous work assumed that the variables are multivariate Gaussian or that the conditional means of the variables are linearly related. Unfortunately, if these assumptions are violated, the resulting conditional independence estimates can be inaccurate. We propose a semiparametric method, graph estimation with joint additive models, which allows the conditional means of the features to take an arbitrary additive form. We present an efficient algorithm for computation of our estimator, and prove that it is consistent. We extend our method to estimation of directed graphs with known causal ordering. Using simulated data, we show that our method performs better than existing methods when there are nonlinear relationships among the features, and is comparable to methods that assume multivariate normality when the conditional means are linear. We illustrate our method on a cell signalling dataset.
1
2014
101
Biometrika
85
101
http://hdl.handle.net/10.1093/biomet/ast053
application/pdf
Access to full text is restricted to subscribers.
Arend Voorman
Ali Shojaie
Daniela Witten
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:239-246.2015-07-30RePEc:oup:biomet
article
A Wilcoxon–Mann–Whitney-type test for infinite-dimensional data
The Wilcoxon–Mann–Whitney test is a robust competitor of the $t$ test in the univariate setting. For finite-dimensional multivariate non-Gaussian data, several extensions of the Wilcoxon–Mann–Whitney test have been shown to outperform Hotelling's $T^{2}$ test. In this paper, we study a Wilcoxon–Mann–Whitney-type test based on spatial ranks in infinite-dimensional spaces, we investigate its asymptotic properties and compare it with several existing tests. The proposed test is shown to be robust with respect to outliers and to have better power than some competitors for certain distributions with heavy tails. We study its performance using real and simulated data.
1
2015
102
Biometrika
239
246
http://hdl.handle.net/10.1093/biomet/asu072
application/pdf
Access to full text is restricted to subscribers.
Anirvan Chakraborty
Probal Chaudhuri
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:484-490.2015-07-30RePEc:oup:biomet
article
Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data
The development of high-throughput biomedical technologies has led to increased interest in the analysis of high-dimensional data where the number of features is much larger than the sample size. In this paper, we investigate principal component analysis under the ultra-high dimensional regime, where both the number of features and the sample size increase as the ratio of the two quantities also increases. We bridge the existing results from the finite and the high-dimension low sample size regimes, embedding the two regimes in a more general framework. We also numerically demonstrate the universal application of the results from the finite regime.
2
2014
101
Biometrika
484
490
http://hdl.handle.net/10.1093/biomet/ast064
application/pdf
Access to full text is restricted to subscribers.
Seunggeun Lee
Fei Zou
Fred A. Wright
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:47-64.2015-07-30RePEc:oup:biomet
article
Selection and estimation for mixed graphical models
We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.
1
2015
102
Biometrika
47
64
http://hdl.handle.net/10.1093/biomet/asu051
application/pdf
Access to full text is restricted to subscribers.
Shizhe Chen
Daniela M. Witten
Ali Shojaie
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:371-380.2015-07-30RePEc:oup:biomet
article
Maximum projection designs for computer experiments
Space-filling properties are important in designing computer experiments. The traditional maximin and minimax distance designs consider only space-filling in the full-dimensional space; this can result in poor projections onto lower-dimensional spaces, which is undesirable when only a few factors are active. Restricting maximin distance design to the class of Latin hypercubes can improve one-dimensional projections but cannot guarantee good space-filling properties in larger subspaces. We propose designs that maximize space-filling properties on projections to all subsets of factors. We call our designs maximum projection designs. Our design criterion can be computed at no more cost than a design criterion that ignores projection properties.
2
2015
102
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asv002
application/pdf
Access to full text is restricted to subscribers.
V. Roshan Joseph
Evren Gul
Shan Ba
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:229-236.2015-07-30RePEc:oup:biomet
article
Multivariate sign-based high-dimensional tests for sphericity
This article concerns tests for sphericity in cases where the data dimension is larger than the sample size. The existing multivariate sign-based procedure (Hallin & Paindaveine, 2006) for sphericity is not robust with respect to high dimensionality, producing tests with Type I error rates that are much larger than the nominal levels. This is mainly due to bias from estimating the location parameter. We develop a correction that makes the existing test statistic robust with respect to high dimensionality, and show that the proposed test statistic is asymptotically normal under elliptical distributions. The proposed method allows the dimensionality to increase as the square of the sample size. Simulations demonstrate that it has good size and power in a wide range of settings.
1
2014
101
Biometrika
229
236
http://hdl.handle.net/10.1093/biomet/ast040
application/pdf
Access to full text is restricted to subscribers.
Changliang Zou
Liuhua Peng
Long Feng
Zhaojun Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:733-740.2015-07-30RePEc:oup:biomet
article
Inadmissibility of the best equivariant predictive density in the unknown variance case
This work treats the problem of estimating the predictive density of a random vector when both the mean vector and the variance are unknown. We prove that the density of reference in this context is inadmissible under the Kullback–Leibler loss in a nonasymptotic framework. Our result holds even when the dimension of the vector is strictly lower than three, which is surprising compared to the known variance setting. Finally, we discuss the relationship between the prediction and the estimation problems.
3
2014
101
Biometrika
733
740
http://hdl.handle.net/10.1093/biomet/asu024
application/pdf
Access to full text is restricted to subscribers.
A. Boisbunon
Y. Maruyama
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:303-317.2015-07-30RePEc:oup:biomet
article
Bayesian monotone regression using Gaussian process projection
Shape-constrained regression analysis has applications in dose-response modelling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. We propose a novel method, focusing on monotone curve and surface estimation, which uses Gaussian process projections. Our inference is based on projecting posterior samples from the Gaussian process. We develop theory on continuity of the projection and rates of contraction. Our approach leads to simple computation with good performance in finite samples. The proposed projection method can also be applied to other constrained-function estimation problems, including those in multivariate settings.
2
2014
101
Biometrika
303
317
http://hdl.handle.net/10.1093/biomet/ast063
application/pdf
Access to full text is restricted to subscribers.
Lizhen Lin
David B. Dunson
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:465-476.2015-07-30RePEc:oup:biomet
article
Bootstrap for the case-cohort design
The case-cohort design facilitates economical investigation of risk factors in a large survival study, with covariate data collected only from the cases and a simple random subset of the full cohort. Methods that accommodate the design have been developed for various semiparametric models, but most inference procedures are based on asymptotic distribution theory. Such inference can be cumbersome to derive and implement, and does not permit confidence band construction. While the bootstrap is an obvious alternative, it is unclear how to resample because of complications from the two-stage sampling design. We establish an equivalent sampling scheme, and propose a novel and versatile nonparametric bootstrap for robust inference with an appealingly simple single-stage resampling. Theoretical justification and numerical assessment are provided for a number of procedures under the proportional hazards model.
2
2014
101
Biometrika
465
476
http://hdl.handle.net/10.1093/biomet/asu004
application/pdf
Access to full text is restricted to subscribers.
Yijian Huang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:381-395.2015-07-30RePEc:oup:biomet
article
Automatic structure recovery for additive models
We propose an automatic structure recovery method for additive models, based on a backfitting algorithm coupled with local polynomial smoothing, in conjunction with a new kernel-based variable selection strategy. Our method produces estimates of the set of noise predictors, the sets of predictors that contribute polynomially at different degrees up to a specified degree M, and the set of predictors that contribute beyond polynomially of degree M. We prove consistency of the proposed method, and describe an extension to partially linear models. Finite-sample performance of the method is illustrated via Monte Carlo studies and a real-data example.
2
2015
102
Biometrika
381
395
http://hdl.handle.net/10.1093/biomet/asu070
application/pdf
Access to full text is restricted to subscribers.
Yichao Wu
Leonard A. Stefanski
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:169-180.2015-07-30RePEc:oup:biomet
article
Using covariate-specific disease prevalence information to increase the power of case-control studies
Public registration databases and large cohort studies provide vital information on disease prevalence at various levels of a risk factor. This auxiliary information can be helpful in conducting statistical inference in a new study. We aim to develop a statistical procedure that improves the efficiency of the logistic regression model for a case-control study by utilizing auxiliary information on covariate-specific disease prevalence via a series of unbiased estimating equations. We adopt empirical likelihood for statistical inference, and demonstrate its advantages through simulation and an application.
1
2015
102
Biometrika
169
180
http://hdl.handle.net/10.1093/biomet/asu048
application/pdf
Access to full text is restricted to subscribers.
Jing Qin
Han Zhang
Pengfei Li
Demetrius Albanes
Kai Yu
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:325-343.2015-07-30RePEc:oup:biomet
article
Information-theoretic optimality of observation-driven time series models for continuous responses
We investigate information-theoretic optimality properties of the score function of the predictive likelihood as a device for updating a real-valued time-varying parameter in a univariate observation-driven model with continuous responses. We restrict our attention to models with updates of one lag order. The results provide theoretical justification for a class of score-driven models which includes the generalized autoregressive conditional heteroskedasticity model as a special case. Our main contribution is to show that only parameter updates based on the score will always reduce the local Kullback–Leibler divergence between the true conditional density and the model-implied conditional density. This result holds irrespective of the severity of model misspecification. We also show that use of the score leads to a considerably smaller global Kullback–Leibler divergence in empirically relevant settings. We illustrate the theory with an application to time-varying volatility models. We show that the reduction in Kullback–Leibler divergence across a range of different settings can be substantial compared to updates based on, for example, squared lagged observations.
2
2015
102
Biometrika
325
343
http://hdl.handle.net/10.1093/biomet/asu076
application/pdf
Access to full text is restricted to subscribers.
F. Blasques
S. J. Koopman
A. Lucas
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:599-612.2015-07-30RePEc:oup:biomet
article
Characterization of the likelihood continual reassessment method
This paper deals with the design of the likelihood continual reassessment method, which is an increasingly widely used model-based method for dose-finding studies. It is common to implement the method in a two-stage approach, whereby the model-based stage is activated after an initial sequence of patients has been treated. While this two-stage approach is practically appealing, it lacks a theoretical framework, and it is often unclear how the design components should be specified. This paper develops a general framework based on the coherence principle, from which we derive a design calibration process. A real clinical-trial example is used to demonstrate that the proposed process can be implemented in a timely and reproducible manner, while offering competitive operating characteristics. We explore the operating characteristics of different models within this framework and show the performance to be insensitive to the choice of dose-toxicity model.
3
2014
101
Biometrika
599
612
http://hdl.handle.net/10.1093/biomet/asu012
application/pdf
Access to full text is restricted to subscribers.
Xiaoyu Jia
Shing M. Lee
Ying Kuen Cheung
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:121-140.2015-07-30RePEc:oup:biomet
article
Nonparametric estimation of a periodic sequence in the presence of a smooth trend
We investigate a nonparametric regression model including a periodic component, a smooth trend function, and a stochastic error term. We propose a procedure to estimate the unknown period and the function values of the periodic component as well as the nonparametric trend function. The theoretical part of the paper establishes the asymptotic properties of our estimators. In particular, we show that our estimator of the period is consistent. In addition, we derive the convergence rates and the limiting distributions of our estimators of the periodic component and the trend function. The asymptotic results are complemented with a simulation study and an application to global temperature anomaly data.
1
2014
101
Biometrika
121
140
http://hdl.handle.net/10.1093/biomet/ast051
application/pdf
Access to full text is restricted to subscribers.
Michael Vogt
Oliver Linton
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:245-251.2015-07-30RePEc:oup:biomet
article
The mode functional is not elicitable
This article is concerned with point forecasting of a real-valued random variable with a general Lebesgue density. Answering a question of Gneiting (2011), it is shown that the mode is not elicitable, or, in other words, that it is impossible to find a loss or scoring function under which the mode is the Bayes predictor.
1
2014
101
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/ast048
application/pdf
Access to full text is restricted to subscribers.
C. Heinrich
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:1-15.2015-07-30RePEc:oup:biomet
article
Efficient inference for spatial extreme value processes associated to log-Gaussian random functions
Max-stable processes arise as the only possible nontrivial limits for maxima of affinely normalized identically distributed stochastic processes, and thus form an important class of models for the extreme values of spatial processes. Until recently, inference for max-stable processes has been restricted to the use of pairwise composite likelihoods, due to intractability of higher-dimensional distributions. In this work we consider random fields that are in the domain of attraction of a widely used class of max-stable processes, namely those constructed via manipulation of log-Gaussian random functions. For this class, we exploit limiting d-dimensional multivariate Poisson process intensities of the underlying process for inference on all d-vectors exceeding a high marginal threshold in at least one component, employing a censoring scheme to incorporate information below the marginal threshold. We also consider the d-dimensional distributions for the equivalent max-stable process, and perform full likelihood inference by exploiting the methods of Stephenson & Tawn (2005), where information on the occurrence times of extreme events is shown to dramatically simplify the likelihood. The Stephenson–Tawn likelihood is in fact simply a special case of the censored Poisson process likelihood. We assess the improvements in inference from both methods over pairwise likelihood methodology by simulation.
1
2014
101
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/ast042
application/pdf
Access to full text is restricted to subscribers.
Jennifer L. Wadsworth
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:269-284.2015-07-30RePEc:oup:biomet
article
Variance estimation in high-dimensional linear models
The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics and model selection procedures, as well as in determining the performance limits in many problems. In this paper we propose new method-of-moments-based estimators for the residual variance, the proportion of explained variation and other related quantities, such as the ℓ2 signal strength. The proposed estimators are consistent and asymptotically normal in high-dimensional linear models with Gaussian predictors and errors, where the number of predictors d is proportional to the number of observations n; in fact, consistency holds even in settings where d/n → ∞. Existing results on residual variance estimation in high-dimensional linear models depend on sparsity in the underlying signal. Our results require no sparsity assumptions and imply that the residual variance and the proportion of explained variation can be consistently estimated even when d>n and the underlying signal itself is nonestimable. Numerical work suggests that some of our distributional assumptions may be relaxed. A real-data analysis involving gene expression data and single nucleotide polymorphism data illustrates the performance of the proposed methods.
2
2014
101
Biometrika
269
284
http://hdl.handle.net/10.1093/biomet/ast065
application/pdf
Access to full text is restricted to subscribers.
Lee H. Dicker
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:457-477.2015-07-30RePEc:oup:biomet
article
On the degrees of freedom of reduced-rank estimators in multivariate regression
We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.
2
2015
102
Biometrika
457
477
http://hdl.handle.net/10.1093/biomet/asu067
application/pdf
Access to full text is restricted to subscribers.
A. Mukherjee
K. Chen
N. Wang
J. Zhu
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:655-671.2015-07-30RePEc:oup:biomet
article
Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation
Approximate Bayesian computation has emerged as a standard computational tool when dealing with intractable likelihood functions in Bayesian inference. We show that many common Markov chain Monte Carlo kernels used to facilitate inference in this setting can fail to be variance bounding and hence geometrically ergodic, which can have consequences for the reliability of estimates in practice. This phenomenon is typically independent of the choice of tolerance in the approximation. We prove that a recently introduced Markov kernel can inherit the properties of variance bounding and geometric ergodicity from its intractable Metropolis–Hastings counterpart, under reasonably weak conditions. We show that the computational cost of this alternative kernel is bounded whenever the prior is proper, and present indicative results for an example where spectral gaps and asymptotic variances can be computed, as well as an example involving inference for a partially and discretely observed, time-homogeneous, pure jump Markov process. We also supply two general theorems, one providing a simple sufficient condition for lack of variance bounding for reversible kernels and the other providing a positive result concerning inheritance of variance bounding and geometric ergodicity for mixtures of reversible kernels.
3
2014
101
Biometrika
655
671
http://hdl.handle.net/10.1093/biomet/asu027
application/pdf
Access to full text is restricted to subscribers.
Anthony Lee
Krzysztof Łatuszyński
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:703-710.2015-07-30RePEc:oup:biomet
article
Extended empirical likelihood for estimating equations
We derive an extended empirical likelihood for parameters defined by estimating equations which generalizes the original empirical likelihood to the full parameter space. Under mild conditions, the extended empirical likelihood has all the asymptotic properties of the original empirical likelihood. The first-order extended empirical likelihood is easy to use and substantially more accurate than the original empirical likelihood.
3
2014
101
Biometrika
703
710
http://hdl.handle.net/10.1093/biomet/asu014
application/pdf
Access to full text is restricted to subscribers.
Min Tsao
Fan Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:285-302.2015-07-30RePEc:oup:biomet
article
Bayes and empirical Bayes: do they merge?
Bayesian inference is attractive due to its internal coherence and for often having good frequentist properties. However, eliciting an honest prior may be difficult, and common practice is to take an empirical Bayes approach using an estimate of the prior hyperparameters. Although not rigorous, the underlying idea is that, for a sufficiently large sample size, empirical Bayes methods should lead to similar inferential answers as a proper Bayesian inference. However, precise mathematical results on this asymptotic agreement seem to be missing. In this paper, we give results in terms of merging Bayesian and empirical Bayes posterior distributions. We study two notions of merging: Bayesian weak merging and frequentist merging in total variation. We also show that, under regularity conditions, the empirical Bayes approach asymptotically gives an oracle selection of the prior hyperparameters. Examples include empirical Bayes density estimation with Dirichlet process mixtures.
2
2014
101
Biometrika
285
302
http://hdl.handle.net/10.1093/biomet/ast067
application/pdf
Access to full text is restricted to subscribers.
S. Petrone
J. Rousseau
C. Scricciolo
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:964-970.2015-07-30RePEc:oup:biomet
article
Analytical p-value calculation for the higher criticism test in finite-d problems
The higher criticism test is effective for testing a joint null hypothesis against a sparse alternative, e.g., for testing the effect of a gene or genetic pathway that consists of d genetic markers. Accurate p-value calculations for the higher criticism test based on the asymptotic distribution require a very large d, which is not the case for the number of genetic variants in a gene or a pathway. In this paper we propose an analytical method for accurately computing the p-value of the higher criticism test for finite-d problems. Unlike previous treatments, this method does not rely on asymptotics in d or on simulation, and is exact for arbitrary d when the test statistics are normally distributed. The method is particularly computationally advantageous when d is not large. We illustrate the proposed method with a case-control genome-wide association study of lung cancer and compare its power with competing methods through simulations.
4
2014
101
Biometrika
964
970
http://hdl.handle.net/10.1093/biomet/asu033
application/pdf
Access to full text is restricted to subscribers.
Ian J. Barnett
Xihong Lin
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:985-991.2015-07-30RePEc:oup:biomet
article
Semiparametric maximum likelihood inference by using failed contact attempts to adjust for nonignorable nonresponse
In marketing research, social science and epidemiological studies, call-back of nonrespondents is standard. If respondents and nonrespondents tend to give different answers, the missing data are called nonignorable, and using them alone may produce biased results. To extend earlier work on nonresponse in the presence of call-backs, Alho (1990) proposed modelling the probability of response at each attempt through logistic regression, where outcomes of interest and covariates are explanatory variables. In this paper we propose a semiparametric maximum likelihood approach, and discuss large-sample properties and the semiparametric likelihood ratio statistic used to test whether the data are missing completely at random. Simulations are conducted to evaluate this approach and a modification of the method of Alho (1990). Data from the National Health Interview Survey are used for illustration.
4
2014
101
Biometrika
985
991
http://hdl.handle.net/10.1093/biomet/asu046
application/pdf
Access to full text is restricted to subscribers.
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:71-84.2015-07-30RePEc:oup:biomet
article
Better subset regression
This paper studies the relationship between model fitting and screening performance to find efficient screening methods for high-dimensional linear regression models. Under a sparsity assumption we show in a general asymptotic setting that a subset that includes the true submodel always yields a smaller residual sum of squares than those that do not. To seek such a subset, we consider the optimization problem associated with best subset regression. An em algorithm, known as orthogonalizing subset screening, and its accelerated version are proposed for searching for the best subset. Although the algorithms do not always find the best subset, their monotonicity makes the subset fit the data better than initial subsets, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive.
1
2014
101
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/ast041
application/pdf
Access to full text is restricted to subscribers.
Shifeng Xiong
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:359-370.2015-07-30RePEc:oup:biomet
article
A Möbius transformation-induced distribution on the torus
We propose a five-parameter bivariate wrapped Cauchy distribution as a unimodal model for toroidal data. It is highly tractable, displays numerous desirable properties, including marginal and conditional distributions that are all wrapped Cauchy, and arises as an appealing submodel of a six-parameter distribution obtained by applying Möbius transformation to a pre-existing bivariate circular model. Method of moments and maximum likelihood estimation of its parameters are fast, and tests for independence and goodness-of-fit are available. An analysis involving dihedral angles of the proteinogenic amino acid Tyrosine illustrates the distribution’s application. A Markov process for circular data is also explored.
2
2015
102
Biometrika
359
370
http://hdl.handle.net/10.1093/biomet/asv003
application/pdf
Access to full text is restricted to subscribers.
Shogo Kato
Arthur Pewsey
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:625-640.2015-07-30RePEc:oup:biomet
article
Multicategory angle-based large-margin classification
Large-margin classifiers are popular methods for classification. Among existing simultaneous multicategory large-margin classifiers, a common approach is to learn k different functions for a k-class problem with a sum-to-zero constraint. Such a formulation can be inefficient. We propose a new multicategory angle-based large-margin classification framework. The proposed angle-based classifiers consider a simplex-based prediction rule without the sum-to-zero constraint, and enjoy more efficient computation. Many binary large-margin classifiers can be naturally generalized for multicategory problems through the angle-based framework. Theoretical and numerical studies demonstrate the usefulness of the angle-based methods.
3
2014
101
Biometrika
625
640
http://hdl.handle.net/10.1093/biomet/asu017
application/pdf
Access to full text is restricted to subscribers.
Chong Zhang
Yufeng Liu
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:333-350.2015-07-30RePEc:oup:biomet
article
Permuting regular fractional factorial designs for screening quantitative factors
Fractional factorial designs are widely used in screening experiments. They are often chosen by the minimum aberration criterion, which regards factor levels as symbols. For designs with quantitative factors, however, permuting the levels for one or more factors could alter their geometrical structures and statistical properties. We provide a justification of the minimum β-aberration criterion for quantitative factors and study level permutations for regular fractional factorial designs in order to improve their efficiency for screening quantitative factors. We show how regular designs can be linearly permuted to reduce contamination of nonnegligible interactions on the estimation of linear effects without increasing the run size. We further show that such linear permutations are unique under the minimum β-aberration criterion and the best level permutations can be determined without an exhaustive search. We establish additional theoretical results for three-level designs and obtain the best level permutations for regular designs with 27 and 81 runs. We illustrate the practical benefits of level permutation with an antiviral drug combination experiment.
2
2014
101
Biometrika
333
350
http://hdl.handle.net/10.1093/biomet/ast073
application/pdf
Access to full text is restricted to subscribers.
Yu Tang
Hongquan Xu
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:231-238.2015-07-30RePEc:oup:biomet
article
Generalized Ewens–Pitman model for Bayesian clustering
We propose a Bayesian method for clustering from discrete data structures that commonly arise in genetics and other applications. This method is equivariant with respect to relabelling units; unsampled units do not interfere with sampled data; and missing data do not hinder inference. Cluster inference using the posterior mode performs well on simulated and real datasets, and the posterior predictive distribution enables supervised learning based on a partial clustering of the sample.
1
2015
102
Biometrika
231
238
http://hdl.handle.net/10.1093/biomet/asu052
application/pdf
Access to full text is restricted to subscribers.
Harry Crane
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:203-214.2015-07-30RePEc:oup:biomet
article
Double-bootstrap methods that use a single double-bootstrap simulation
We show that, when the double bootstrap is used to improve performance of bootstrap methods for bias correction, techniques based on using a single double-bootstrap sample for each single-bootstrap sample can produce third-order accuracy for much less computational expense than is required by conventional double-bootstrap methods. However, this improved level of performance is not available for the single double-bootstrap methods that have been suggested to construct confidence intervals or distribution estimators.
1
2015
102
Biometrika
203
214
http://hdl.handle.net/10.1093/biomet/asu060
application/pdf
Access to full text is restricted to subscribers.
Jinyuan Chang
Peter Hall
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:439-456.2015-07-30RePEc:oup:biomet
article
Envelopes and reduced-rank regression
We incorporate the nascent idea of envelopes (Cook et al., Statist. Sinica 20, 927–1010) into reduced-rank regression by proposing a reduced-rank envelope model, which is a hybrid of reduced-rank and envelope regressions. The proposed model has total number of parameters no more than either of reduced-rank regression or envelope regression. The resulting estimator is at least as efficient as both existing estimators. The methodology of this paper can be adapted to other envelope models, such as partial envelopes (Su & Cook, Biometrika 98, 133–46) and envelopes in predictor space (Cook et al., J. R. Statist. Soc. B 75, 851–77).
2
2015
102
Biometrika
439
456
http://hdl.handle.net/10.1093/biomet/asv001
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
Xin Zhang
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:191-202.2015-07-30RePEc:oup:biomet
article
Adaptive randomized trial designs that cannot be dominated by any standard design at the same total sample size
Prior work has shown that the power of adaptive designs with rules for modifying the sample size can always be matched or improved by suitably chosen, standard, group sequential designs. A natural question is whether analogous results hold for other types of adaptive designs. We focus on adaptive enrichment designs, which involve preplanned rules for modifying enrollment criteria based on accrued data in a randomized trial. Such designs often involve multiple hypotheses, e.g., one for the total population and one for a predefined subpopulation, such as those with high disease severity at baseline. We fix the total sample size, and consider overall power, defined as the probability of rejecting at least one false null hypothesis. We present adaptive enrichment designs whose overall power at two alternatives cannot simultaneously be matched by any standard design. In some scenarios there is a substantial gap between the overall power achieved by these adaptive designs and that of any standard design. We also prove that such gains in overall power come at a cost. To attain overall power above what is achievable by certain standard designs, it is necessary to increase power to reject some hypotheses and reduce power to reject others. We demonstrate that adaptive enrichment designs allow certain power trade-offs that are not available when restricting to standard designs.
1
2015
102
Biometrika
191
202
http://hdl.handle.net/10.1093/biomet/asu057
application/pdf
Access to full text is restricted to subscribers.
Michael Rosenblum
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:141-154.2015-07-30RePEc:oup:biomet
article
Frequentist estimation of an epidemic’s spreading potential when observations are scarce
We consider the problem of inferring the potential of an epidemic for escalating into a pandemic on the basis of limited observations in its initial stages. Classical results of Becker & Hasofer (J. R. Statist. Soc. B, 59, 415–29) illustrate that frequentist estimation of the complete set of parameters of an epidemic modelled as a birth and death process remains feasible even when one is able to observe only the deaths and the total number of births. These assumptions on the observation mechanism, however, are too strong to be met in practice. We consider a more realistic scenario where only temporally aggregated random proportions of the deaths are observed over time. We demonstrate that the frequentist estimation of the Malthusian parameter governing the growth of the epidemic is still feasible in this context. We construct explicit straightforwardly calculable estimators motivated heuristically by the martingale dynamics of the process, and show that they admit a rigorous quasilikelihood interpretation. We establish the consistency and asymptotic normality of these estimators, allowing for the construction of approximate confidence intervals that can be used to infer the spreading potential of the epidemic. A simulation study and an application to the initial outbreak data of the 2009 H1N1 influenza pandemic illustrate that the method can be expected to give reasonable results in practice.
1
2014
101
Biometrika
141
154
http://hdl.handle.net/10.1093/biomet/ast049
application/pdf
Access to full text is restricted to subscribers.
Andrea Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:281-294.2015-07-30RePEc:oup:biomet
article
On random-effects meta-analysis
Meta-analysis is widely used to compare and combine the results of multiple independent studies. To account for between-study heterogeneity, investigators often employ random-effects models, under which the effect sizes of interest are assumed to follow a normal distribution. It is common to estimate the mean effect size by a weighted linear combination of study-specific estimators, with the weight for each study being inversely proportional to the sum of the variance of the effect-size estimator and the estimated variance component of the random-effects distribution. Because the estimator of the variance component involved in the weights is random and correlated with study-specific effect-size estimators, the commonly adopted asymptotic normal approximation to the meta-analysis estimator is grossly inaccurate unless the number of studies is large. When individual participant data are available, one can also estimate the mean effect size by maximizing the joint likelihood. We establish the asymptotic properties of the meta-analysis estimator and the joint maximum likelihood estimator when the number of studies is either fixed or increases at a slower rate than the study sizes and we discover a surprising result: the former estimator is always at least as efficient as the latter. We also develop a novel resampling technique that improves the accuracy of statistical inference. We demonstrate the benefits of the proposed inference procedures using simulated and empirical data.
2
2015
102
Biometrika
281
294
http://hdl.handle.net/10.1093/biomet/asv011
application/pdf
Access to full text is restricted to subscribers.
D. Zeng
D. Y. Lin
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:423-437.2015-07-30RePEc:oup:biomet
article
Measurement bias and effect restoration in causal inference
This paper highlights several areas where graphical techniques can be harnessed to address the problem of measurement errors in causal inference. In particular, it discusses the control of unmeasured confounders in parametric and nonparametric models and the computational problem of obtaining bias-free effect estimates in such models. We derive new conditions under which causal effects can be restored by observing proxy variables of unmeasured confounders with/without external studies.
2
2014
101
Biometrika
423
437
http://hdl.handle.net/10.1093/biomet/ast066
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Judea Pearl
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:215-230.2015-07-30RePEc:oup:biomet
article
Multivariate max-stable spatial processes
Max-stable processes allow the spatial dependence of extremes to be modelled and quantified, so they are widely adopted in applications. For a better understanding of extremes, it may be useful to study several variables simultaneously. To this end, we study the maxima of independent replicates of multivariate processes, both in the Gaussian and Student-t cases. We define a Poisson process construction and introduce multivariate versions of the Smith Gaussian extreme-value, the Schlather extremal-Gaussian and extremal-t, and the Brown–Resnick models. We develop inference for the models based on composite likelihoods. We present results of Monte Carlo simulations and an application to daily maximum wind speed and wind gust.
1
2015
102
Biometrika
215
230
http://hdl.handle.net/10.1093/biomet/asu066
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Simone A. Padoan
Huiyan Sang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:315-323.2015-07-30RePEc:oup:biomet
article
A useful variant of the Davis–Kahan theorem for statisticians
The Davis–Kahan theorem is used in the analysis of many statistical procedures to bound the distance between subspaces spanned by population eigenvectors and their sample versions. It relies on an eigenvalue separation condition between certain population and sample eigenvalues. We present a variant of this result that depends only on a population eigenvalue separation condition, making it more natural and convenient for direct application in statistical contexts, and provide an improvement in many cases to the usual bound in the statistical literature. We also give an extension to situations where the matrices under study may be asymmetric or even non-square, and where interest is in the distance between subspaces spanned by corresponding singular vectors.
2
2015
102
Biometrika
315
323
http://hdl.handle.net/10.1093/biomet/asv008
application/pdf
Access to full text is restricted to subscribers.
Y. Yu
T. Wang
R. J. Samworth
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:505-518.2015-07-30RePEc:oup:biomet
article
Self-consistent nonparametric maximum likelihood estimator of the bivariate survivor function
As usually formulated the nonparametric likelihood for the bivariate survivor function is overparameterized, resulting in uniqueness problems for the corresponding nonparametric maximum likelihood estimator. Here the estimation problem is redefined to include parameters for marginal hazard rates, and for double failure hazard rates only at informative uncensored failure time grid points where there is pertinent empirical information. Double failure hazard rates at other grid points in the risk region are specified rather than estimated. With this approach the nonparametric maximum likelihood estimator is unique, and can be calculated using a two-step procedure. The first step involves setting aside all doubly censored observations that are interior to the risk region. The nonparametric maximum likelihood estimator from the remaining data turns out to be the Dabrowska (1988) estimator. The omitted doubly censored observations are included in the procedure in the second stage using self-consistency, resulting in a noniterative nonparametric maximum likelihood estimator for the bivariate survivor function. Simulation evaluation and asymptotic distributional results are provided. Moderate sample size efficiency for the survivor function nonparametric maximum likelihood estimator is similar to that for the Dabrowska estimator as applied to the entire dataset, while some useful efficiency improvement arises for the corresponding distribution function estimator, presumably due to the avoidance of negative mass assignments.
3
2014
101
Biometrika
505
518
http://hdl.handle.net/10.1093/biomet/asu010
application/pdf
Access to full text is restricted to subscribers.
R. L. Prentice
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:181-190.2015-07-30RePEc:oup:biomet
article
A tractable and interpretable four-parameter family of unimodal distributions on the circle
This article presents a class of four-parameter distributions for circular data that are unimodal, possess simple characteristic and density functions and a tractable distribution function, can be interpretably parameterized directly in terms of their trigonometric moments, afford a very wide range of skewness and kurtosis, envelop numerous interesting submodels including the wrapped Cauchy and cardioid distributions, allow straightforward parameter estimation by both method of moments and maximum likelihood, and are closed under convolution. This class of distributions exhibits the widest range of attractive properties yet available while retaining unimodality.
1
2015
102
Biometrika
181
190
http://hdl.handle.net/10.1093/biomet/asu059
application/pdf
Access to full text is restricted to subscribers.
Shogo Kato
M. C. Jones
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:613-624.2015-07-30RePEc:oup:biomet
article
Estimation of mean response via the effective balancing score
We introduce the effective balancing score for estimation of the mean response under a missing-at-random mechanism. Unlike conventional balancing scores, the proposed score is constructed via dimension reduction free of model specification. Three types of such scores are introduced, distinguished by whether they carry the covariate information about the missingness, the response, or both. The effective balancing score leads to consistent estimation with little or no loss in efficiency. Compared to existing estimators, it reduces the burden of model specification and is more robust. It is a near-automatic procedure which is most appealing when high-dimensional covariates are involved. We investigate its asymptotic and numerical properties, and illustrate its application with an HIV disease study.
3
2014
101
Biometrika
613
624
http://hdl.handle.net/10.1093/biomet/asu022
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Naisyin Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:155-173.2015-07-30RePEc:oup:biomet
article
On the stationary distribution of iterative imputations
Iterative imputation, in which variables are imputed one at a time conditional on all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modelling problem with relatively simple univariate regressions. In this paper, we begin to characterize the stationary distributions of iterative imputations and their statistical properties, accounting for the conditional models being iteratively estimated from data rather than being prespecified. When the families of conditional models are compatible, we provide sufficient conditions under which the imputation distribution converges in total variation to the posterior distribution of a Bayesian model. When the conditional models are incompatible but valid, we show that the combined imputation estimator is consistent.
1
2014
101
Biometrika
155
173
http://hdl.handle.net/10.1093/biomet/ast044
application/pdf
Access to full text is restricted to subscribers.
Jingchen Liu
Andrew Gelman
Jennifer Hill
Yu-Sung Su
Jonathan Kropko
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:77-94.2015-07-30RePEc:oup:biomet
article
Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems
We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against nonregular estimation of the nuisance part of the median regression function by using Neyman’s orthogonalization. We establish that the resulting instrumental median regression estimator of a target regression coefficient is asymptotically normally distributed uniformly with respect to the underlying sparse model and is semiparametrically efficient. We also generalize our method to a general nonsmooth Z-estimation framework where the number of target parameters is possibly much larger than the sample size. We extend Huber's results on asymptotic normality to this setting, demonstrating uniform asymptotic normality of the proposed estimators over rectangles, constructing simultaneous confidence bands on all of the target parameters, and establishing asymptotic validity of the bands uniformly over underlying approximately sparse models.
1
2015
102
Biometrika
77
94
http://hdl.handle.net/10.1093/biomet/asu056
application/pdf
Access to full text is restricted to subscribers.
A. Belloni
V. Chernozhukov
K. Kato
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:121-134.2015-07-30RePEc:oup:biomet
article
Moment-type estimators for the proportional likelihood ratio model with longitudinal data
Luo & Tsai, Biometrika 99, 211–22, 2012, proposed a proportional likelihood ratio model and discussed a maximum likelihood method for its parameter estimation. In this paper, we use this model as the marginal distribution to analyse longitudinal data, where the maximum likelihood method is not directly applicable because the joint distribution is not fully specified. We propose a moment-type method that is an extension of the generalized estimating equation method. The resulting estimators are consistent, asymptotically normal and perform well in our simulation study.
1
2015
102
Biometrika
121
134
http://hdl.handle.net/10.1093/biomet/asu055
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:567-585.2015-07-30RePEc:oup:biomet
article
New approaches to nonparametric and semiparametric regression for univariate and multivariate group testing data
We consider nonparametric and semiparametric estimation of a conditional probability curve in the case of group testing data, where the individuals are pooled randomly into groups and only the pooled data are available. We derive a nonparametric weighted estimator that has optimality properties accounting for group sizes, and show how to extend it to multivariate settings, including the partially linear model. In the group testing context, it is natural to assume that the probability curve depends on the covariates only through a linear combination of them. Motivated by this condition, we develop a nonparametric estimator based on the single-index model. We study theoretical properties of the proposed estimators and derive data-driven procedures. Practical properties of the methods are demonstrated via real and simulated examples, and our estimators are shown to have smaller median integrated square error than existing competitors.
3
2014
101
Biometrika
567
585
http://hdl.handle.net/10.1093/biomet/asu025
application/pdf
Access to full text is restricted to subscribers.
A. Delaigle
P. Hall
J. R. Wishart
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:748-754.2015-07-30RePEc:oup:biomet
article
Inference on multiple correlation coefficients with moderately high dimensional data
When the multiple correlation coefficient is used to measure how strongly a given variable can be linearly associated with a set of covariates, it suffers from an upward bias that cannot be ignored in the presence of a moderately high dimensional covariate. Under an independent component model, we derive an asymptotic approximation to the distribution of the squared multiple correlation coefficient that depends on a simple correction factor. We show that this approximation enables us to construct reliable confidence intervals on the population coefficient even when the ratio of the dimension to the sample size is close to unity and the variables are non-Gaussian.
3
2014
101
Biometrika
748
754
http://hdl.handle.net/10.1093/biomet/asu023
application/pdf
Access to full text is restricted to subscribers.
Shurong Zheng
Dandan Jiang
Zhidong Bai
Xuming He
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:673-688.2015-07-30RePEc:oup:biomet
article
The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions
The asymptotic efficiency of the spatial sign covariance matrix relative to affine equivariant estimators of scatter is studied. In particular, the spatial sign covariance matrix is shown to be asymptotically inadmissible, i.e., the asymptotic covariance matrix of the consistency-corrected spatial sign covariance matrix is uniformly larger than that of its affine equivariant counterpart, namely Tyler’s scatter matrix. Although the spatial sign covariance matrix has often been recommended when one is interested in principal components analysis, its inefficiency is shown to be most severe in situations where principal components are of greatest interest. Simulation shows that the inefficiency of the spatial sign covariance matrix also holds for small sample sizes, and that the asymptotic relative efficiency is a good approximation to the finite-sample efficiency for relatively modest sample sizes.
3
2014
101
Biometrika
673
688
http://hdl.handle.net/10.1093/biomet/asu020
application/pdf
Access to full text is restricted to subscribers.
Andrew F. Magyar
David E. Tyler
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:719-725.2015-07-30RePEc:oup:biomet
article
Estimation from cross-sectional samples under bias and dependence
A population can be entered at a known sequence of discrete times; it is sampled cross-sectionally, and the sojourn times of individuals in the sample are observed. It is well known that cross-sectioning leads to length-bias, but less well known and often ignored that it may also result in dependence among the observations. We show that observed sojourn times are independent only under a multinomial entrance process. We study asymptotic properties of parametric and nonparametric estimators of the sojourn time distribution using the product of marginals in spite of dependence, and provide conditions under which this approach results in proper or improper and wrong inference. We apply the proposed methods to data on hospitalization time after bowel and hernia surgeries collected by a cross-sectional design.
3
2014
101
Biometrika
719
725
http://hdl.handle.net/10.1093/biomet/asu013
application/pdf
Access to full text is restricted to subscribers.
Micha Mandel
Yosef Rinott
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:477-483.2015-07-30RePEc:oup:biomet
article
Hypothesis testing for band size detection of high-dimensional banded precision matrices
Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.
2
2014
101
Biometrika
477
483
http://hdl.handle.net/10.1093/biomet/asu006
application/pdf
Access to full text is restricted to subscribers.
Baiguo An
Jianhua Guo
Yufeng Liu
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:237-244.2015-07-30RePEc:oup:biomet
article
On adjustment for auxiliary covariates in additive hazard models for the analysis of randomized experiments
We consider additive hazard models (Aalen, 1989) for the effect of a randomized treatment on a survival outcome, adjusting for auxiliary baseline covariates. We demonstrate that the Aalen least-squares estimator of the treatment effect parameter is asymptotically unbiased, even when the hazard's dependence on time or on the auxiliary covariates is misspecified, and even away from the null hypothesis of no treatment effect. We furthermore show that adjustment for auxiliary baseline covariates does not change the asymptotic variance of the estimator of the effect of a randomized treatment. We conclude that, in view of its robustness against model misspecification, Aalen least-squares estimation is attractive for evaluating treatment effects on a survival outcome in randomized experiments, and the primary reasons to consider baseline covariate adjustment in such settings could be interest in subgroup effects or the need to adjust for informative censoring or baseline imbalances. Our results also shed light on the robustness of Aalen least-squares estimators against model misspecification in observational studies.
1
2014
101
Biometrika
237
244
http://hdl.handle.net/10.1093/biomet/ast045
application/pdf
Access to full text is restricted to subscribers.
S. Vansteelandt
T. Martinussen
E. J. Tchetgen Tchetgen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:103-120.2015-07-30RePEc:oup:biomet
article
Sparse precision matrix estimation via lasso penalized D-trace loss
We introduce a constrained empirical loss minimization framework for estimating high-dimensional sparse precision matrices and propose a new loss function, called the D-trace loss, for that purpose. A novel sparse precision matrix estimator is defined as the minimizer of the lasso penalized D-trace loss under a positive-definiteness constraint. Under a new irrepresentability condition, the lasso penalized D-trace estimator is shown to have the sparse recovery property. Examples demonstrate that the new condition can hold in situations where the irrepresentability condition for the lasso penalized Gaussian likelihood estimator fails. We establish rates of convergence for the new estimator in the elementwise maximum, Frobenius and operator norms. We develop a very efficient algorithm based on alternating direction methods for computing the proposed estimator. Simulated and real data are used to demonstrate the computational efficiency of our algorithm and the finite-sample performance of the new estimator. The lasso penalized D-trace estimator is found to compare favourably with the lasso penalized Gaussian likelihood estimator.
1
2014
101
Biometrika
103
120
http://hdl.handle.net/10.1093/biomet/ast059
application/pdf
Access to full text is restricted to subscribers.
Teng Zhang
Hui Zou
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:439-448.2015-07-30RePEc:oup:biomet
article
Propensity score adjustment with several follow-ups
Propensity score weighting adjustment is commonly used to handle unit nonresponse. When the response mechanism is nonignorable in the sense that the response probability depends directly on the study variable, a follow-up sample is commonly used to obtain an unbiased estimator using the framework of two-phase sampling, where the follow-up sample is assumed to respond completely. In practice, the follow-up sample is also subject to missingness. We consider propensity score weighting adjustment for nonignorable nonresponse when there are several follow-ups and the final follow-up sample is also subject to missingness. We propose a method-of-moments estimator for estimating parameters in the response probability. The proposed method can be implemented using the generalized method of moments and a consistent variance estimate can be obtained relatively easily. A limited simulation study shows the robustness of the proposed method. The proposed methods are applied to a Korean household survey of employment.
2
2014
101
Biometrika
439
448
http://hdl.handle.net/10.1093/biomet/asu003
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
Jongho Im
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:927-942.2015-07-30RePEc:oup:biomet
article
Testing independence and goodness-of-fit in linear models
We consider a linear regression model and propose an omnibus test to simultaneously check the assumption of independence between the error and predictor variables and the goodness-of-fit of the parametric model. Our approach is based on testing for independence between the predictor and the residual obtained from the parametric fit by using the Hilbert–Schmidt independence criterion (Gretton et al., 2008). The proposed method requires no user-defined regularization, is simple to compute based on only pairwise distances between points in the sample, and is consistent against all alternatives. We develop distribution theory for the proposed test statistic, under both the null and the alternative hypotheses, and devise a bootstrap scheme to approximate its null distribution. We prove the consistency of the bootstrap scheme. A simulation study shows that our method has better power than its main competitors. Two real datasets are analysed to demonstrate the scope and usefulness of our method.
4
2014
101
Biometrika
927
942
http://hdl.handle.net/10.1093/biomet/asu026
application/pdf
Access to full text is restricted to subscribers.
A. Sen
B. Sen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:175-188.2015-07-30RePEc:oup:biomet
article
Protective estimation of mixed-effects logistic regression when data are not missing at random
We consider estimation of mixed-effects logistic regression models for longitudinal data when missing outcomes are not missing at random. A typology of missingness mechanisms is presented that includes missingness dependent on observed or missing current outcomes, observed or missing lagged outcomes and subject-specific effects. When data are not missing at random, consistent estimation by maximum marginal likelihood generally requires correct parametric modelling of the missingness mechanism, which hinges on unverifiable assumptions. We show that standard maximum conditional likelihood estimators are protective in the sense that they are consistent for monotone or intermittent missing data under a wide range of missingness mechanisms. Our approach requires neither specification of parametric models for the missingness mechanism nor refreshment samples and is straightforward to implement in standard software.
1
2014
101
Biometrika
175
188
http://hdl.handle.net/10.1093/biomet/ast054
application/pdf
Access to full text is restricted to subscribers.
A. Skrondal
S. Rabe-Hesketh
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:449-464.2015-07-30RePEc:oup:biomet
article
Testing equality of a large number of densities
The problem of testing equality of a large number of densities is considered. The classical k-sample problem compares a small, fixed number of distributions and allows the sample size from each distribution to increase without bound. In our asymptotic analysis the number of distributions tends to infinity but the size of individual samples remains fixed. The proposed test statistic is motivated by the simple idea of comparing kernel density estimators from the various samples to the average of all density estimators. However, a novel interpretation of this familiar type of statistic arises upon centring it. The asymptotic distribution of the statistic under the null hypothesis of equal densities is derived, and power against local alternatives is considered. It is shown that a consistent test is attainable in many situations where all but a vanishingly small proportion of densities are equal to each other. The test is studied via simulation, and an illustration involving microarray data is provided.
2
2014
101
Biometrika
449
464
http://hdl.handle.net/10.1093/biomet/asu002
application/pdf
Access to full text is restricted to subscribers.
D. Zhan
J. D. Hart
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:519-533.2015-07-30RePEc:oup:biomet
article
Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing
In many biomedical applications, interest focuses on the occurrence of two or more consecutive failure events and the relationship between event times, such as age of disease onset and residual lifetime. Bivariate survival data with interval sampling arise frequently when disease registries or surveillance systems collect data based on disease incidence occurring within a specific calendar time interval. The initial event is then retrospectively confirmed and the subsequent failure event may be observed during follow-up. In life history studies, the initial and two consecutive failure events could correspond to birth, disease onset and death. The statistical features and bias of observed data in relation to interval sampling were discussed by Zhu & Wang (2012). Here we propose nonparametric estimation of the association between bivariate failure times based on Kendall’s tau for data collected with interval sampling. A nonparametric estimator is given, where the contribution of each comparable and orderable pair is weighted by the inverse of the associated selection probability. Analysis methods for bivariate survival data with interval sampling rely on the assumption of quasi-independence, i.e., that bivariate failure times and the time of the initial event are independent in the observable region. This paper develops a nonparametric test of quasi-independence based on a bivariate conditional Kendall’s tau for such data. Simulation studies demonstrate that the association estimator and testing procedure perform well with moderate sample sizes. Illustrations with two real datasets are provided.
3
2014
101
Biometrika
519
533
http://hdl.handle.net/10.1093/biomet/asu005
application/pdf
Access to full text is restricted to subscribers.
Hong Zhu
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:15-32.2015-07-30RePEc:oup:biomet
article
Varying-coefficient additive models for functional data
Both varying-coefficient and additive models have been studied extensively in the literature as extensions to linear models. They have also been extended to deal with functional response data. However, existing extensions are still not flexible enough to reflect the functional nature of the responses. In this paper, we extend varying-coefficient and additive models to obtain a much more flexible model and propose a simple algorithm to estimate its nonparametric additive and varying-coefficient components. We establish the asymptotic properties of each component function. We demonstrate the applicability of the new model through analysis of traffic data.
1
2015
102
Biometrika
15
32
http://hdl.handle.net/10.1093/biomet/asu053
application/pdf
Access to full text is restricted to subscribers.
Xiaoke Zhang
Jane-Ling Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:957-963.2015-07-30RePEc:oup:biomet
article
Nearly orthogonal arrays mappable into fully orthogonal arrays
We develop a method for construction of arrays which are nearly orthogonal, in the sense that each column is orthogonal to a large proportion of the other columns, and which are convertible to fully orthogonal arrays via a mapping of the symbols in each column to a possibly smaller set of symbols. These arrays can be useful in computer experiments as designs which accommodate a large number of factors and enjoy attractive space-filling properties. Our construction allows both the mappable nearly orthogonal array and the consequent fully orthogonal array to be either symmetric or asymmetric. Resolvable orthogonal arrays play a key role in the construction.
4
2014
101
Biometrika
957
963
http://hdl.handle.net/10.1093/biomet/asu042
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Fasheng Sun
Boxin Tang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:553-566.2015-07-30RePEc:oup:biomet
article
Statistical inference methods for recurrent event processes with shape and size parameters
This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(⋅) and a random variable X. Measures of association between X and λ(⋅) are defined via shape- and size-based coefficients. Rate-independence of X and λ(⋅) is studied through tests of shape-independence and size-independence, where the shape- and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(⋅) or, in the one-sample setting, when X is the censoring time at which the observation of N(⋅) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation.
3
2014
101
Biometrika
553
566
http://hdl.handle.net/10.1093/biomet/asu016
application/pdf
Access to full text is restricted to subscribers.
Mei-Cheng Wang
Chiung-Yu Huang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:479-485.2015-07-30RePEc:oup:biomet
article
Effective degrees of freedom: a flawed metaphor
To most applied statisticians, a fitting procedure’s degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, the degrees of freedom is often used to parameterize the bias-variance trade-off in model selection. We argue that, on the contrary, model complexity and degrees of freedom may correspond very poorly. We exhibit and theoretically explore various fitting procedures for which the degrees of freedom is not monotonic in the model complexity parameter and can exceed the total dimension of the ambient space even in very simple settings. We show that the degrees of freedom for any nonconvex projection method can be unbounded.
2
2015
102
Biometrika
479
485
http://hdl.handle.net/10.1093/biomet/asv019
application/pdf
Access to full text is restricted to subscribers.
Lucas Janson
William Fithian
Trevor J. Hastie
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:641-654.2015-07-30RePEc:oup:biomet
article
Latent factor models for density estimation
Although discrete mixture modelling has formed the backbone of the literature on Bayesian density estimation, there are some well-known disadvantages. As an alternative to discrete mixtures, we propose a class of priors based on random nonlinear functions of a uniform latent variable with an additive residual. The induced prior for the density is shown to have desirable properties, including ease of centring on an initial guess, large support, posterior consistency and straightforward computation via Gibbs sampling. Some advantages over discrete mixtures, such as Dirichlet process mixtures of Gaussian kernels, are discussed and illustrated via simulations and an application.
3
2014
101
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/asu019
application/pdf
Access to full text is restricted to subscribers.
S. Kundu
D. B. Dunson
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:57-70.2015-07-30RePEc:oup:biomet
article
Asymptotic properties for combined L1 and concave regularization
Two important goals of high-dimensional modelling are prediction and variable selection. In this article, we consider regularization with combined L1 and concave penalties, and study the sampling properties of the global optimum of the suggested method in ultrahigh-dimensional settings. The L1 penalty provides the minimum regularization needed for removing noise variables in order to achieve oracle prediction risk, while a concave penalty imposes additional regularization to control model sparsity. In the linear model setting, we prove that the global optimum of our method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Moreover, we establish oracle risk inequalities for the method and the sampling properties of computable solutions. Numerical studies suggest that our method yields more stable estimates than using a concave penalty alone.
1
2014
101
Biometrika
57
70
http://hdl.handle.net/10.1093/biomet/ast047
application/pdf
Access to full text is restricted to subscribers.
Yingying Fan
Jinchi Lv
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:992-998.2015-07-30RePEc:oup:biomet
article
Robust Bayesian variable selection in linear models with spherically symmetric errors
This paper studies Bayesian variable selection in linear models with general spherically symmetric error distributions. We construct the posterior odds based on a separable prior, which arises as a class of mixtures of Gaussian densities. The posterior odds for comparing among nonnull models are shown to be independent of the error distribution, if this is spherically symmetric. Because of this invariance, we refer to our method as a robust Bayesian variable selection method. We demonstrate that our posterior odds have model selection consistency, and that our class of prior functions are the only ones within a large class which are robust in our sense.
4
2014
101
Biometrika
992
998
http://hdl.handle.net/10.1093/biomet/asu039
application/pdf
Access to full text is restricted to subscribers.
Yuzo Maruyama
William E. Strawderman
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:587-598.2015-07-30RePEc:oup:biomet
article
Semiparametric group testing regression models
Group testing, through the use of pooling, has proven to be an efficient method of reducing the time and cost associated with screening for a binary characteristic of interest, such as infection status. A topic of key interest in the statistical literature involves the development of regression models that relate individual-level covariates to testing responses observed from pooled specimens. In this article, we propose a general semiparametric framework that allows for the inclusion of multi-dimensional covariates, decoding information, and imperfect testing. The asymptotic properties of our estimators are presented and guidance on finite sample implementation is provided. We illustrate the performance of our methods through simulation and by applying them to chlamydia and gonorrhea data collected by the Nebraska Public Health Laboratory as a part of the Infertility Prevention Project.
3
2014
101
Biometrika
587
598
http://hdl.handle.net/10.1093/biomet/asu007
application/pdf
Access to full text is restricted to subscribers.
D. Wang
C. S. McMahan
C. M. Gallagher
K. B. Kulasekera
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:409-422.2015-07-30RePEc:oup:biomet
article
Distances and inference for covariance operators
A framework is developed for inference concerning the covariance operator of a functional random process, where the covariance operator itself is an object of interest for statistical analysis. Distances for comparing positive-definite covariance matrices are either extended or shown to be inapplicable to functional data. In particular, an infinite-dimensional analogue of the Procrustes size-and-shape distance is developed. Convergence of finite-dimensional approximations to the infinite-dimensional distance metrics is also shown. For inference, a Fréchet estimator of both the covariance operator itself and the average covariance operator is introduced. A permutation procedure to test the equality of the covariance operators between two groups is also considered. Additionally, the use of such distances for extrapolation to make predictions is explored. As an example of the proposed methodology, the use of covariance operators has been suggested in a philological study of cross-linguistic dependence as a way to incorporate quantitative phonetic information. It is shown that distances between languages derived from phonetic covariance functions can provide insight into the relationships between the Romance languages.
2
2014
101
Biometrika
409
422
http://hdl.handle.net/10.1093/biomet/asu008
application/pdf
Access to full text is restricted to subscribers.
Davide Pigoli
John A. D. Aston
Ian L. Dryden
Piercesare Secchi
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:689-702.2015-07-30RePEc:oup:biomet
article
Multivariate functional-coefficient regression models for nonlinear vector time series data
Vector time series data are widely met in practice. In this paper we propose a multivariate functional-coefficient regression model with heteroscedasticity for modelling such data. A local linear smoother is employed to estimate the unknown coefficient matrices. Asymptotic normality of the proposed estimators is established, and bandwidth selection is considered. To deal with the co-integration commonly observed in financial markets, we propose an error-corrected multivariate functional-coefficient model. Simulations show that our proposed estimation procedures capture nonlinear structures of coefficients well. Analysis of United States interest rates illustrates the proposed methods.
3
2014
101
Biometrika
689
702
http://hdl.handle.net/10.1093/biomet/asu011
application/pdf
Access to full text is restricted to subscribers.
Jiancheng Jiang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:741-747.2015-07-30RePEc:oup:biomet
article
Construction of orthogonal and nearly orthogonal designs for computer experiments
This paper presents new infinite families of orthogonal designs for computer experiments. In cases where orthogonal designs cannot exist, we construct alternative, nearly orthogonal designs. Our designs can accommodate many factors and a large set of levels. No iterative computer search is required. To build up the desired orthogonal designs we develop and use new infinite classes of periodic Golay pairs.
3
2014
101
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asu021
application/pdf
Access to full text is restricted to subscribers.
S. D. Georgiou
S. Stylianou
K. Drosou
C. Koukouvinos
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:726-732.2015-07-30RePEc:oup:biomet
article
Simple relaxed conditional likelihood
When the data are sparse but not exceedingly so, we face a trade-off between bias and precision that makes the usual choice between conducting either a fully unconditional inference or a fully conditional inference unduly restrictive. We propose a method to relax the conditional inference that relies upon commonly available computer outputs. In the rectangular array asymptotic setting, the relaxed conditional maximum likelihood estimator has smaller bias than the unconditional estimator and smaller mean square error than the conditional estimator.
3
2014
101
Biometrika
726
732
http://hdl.handle.net/10.1093/biomet/asu028
application/pdf
Access to full text is restricted to subscribers.
John J. Hanfelt
Lijia Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:253-268.2015-07-30RePEc:oup:biomet
article
Direct estimation of differential networks
It is often of interest to understand how the structure of a genetic network differs between two conditions. In this paper, each condition-specific network is modelled using the precision matrix of a multivariate normal random vector, and a method is proposed to directly estimate the difference of the precision matrices. In contrast to other approaches, such as separate or joint estimation of the individual matrices, direct estimation does not require those matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Under the assumption that the true differential network is sparse, the direct estimator is shown to be consistent in support recovery and estimation. It is also shown to outperform existing methods in simulations, and its properties are illustrated on gene expression data from late-stage ovarian cancer patients.
2
2014
101
Biometrika
253
268
http://hdl.handle.net/10.1093/biomet/asu009
application/pdf
Access to full text is restricted to subscribers.
Sihai Dave Zhao
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:785-797.2015-07-30RePEc:oup:biomet
article
Variable selection in regression with compositional covariates
Motivated by research problems arising in the analysis of gut microbiome and metagenomic data, we consider variable selection and estimation in high-dimensional regression with compositional covariates. We propose an ℓ1 regularization method for the linear log-contrast model that respects the unique features of compositional data. We formulate the proposed procedure as a constrained convex optimization problem and introduce a coordinate descent method of multipliers for efficient computation. In the high-dimensional setting where the dimensionality grows at most exponentially with the sample size, model selection consistency and $\ell _{\infty }$ bounds for the resulting estimator are established under conditions that are mild and interpretable for compositional data. The numerical performance of our method is evaluated via simulation studies and its usefulness is illustrated by an application to a microbiome study relating human body mass index to gut microbiome composition.
4
2014
101
Biometrika
785
797
http://hdl.handle.net/10.1093/biomet/asu031
application/pdf
Access to full text is restricted to subscribers.
Wei Lin
Pixu Shi
Rui Feng
Hongzhe Li
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:65-76.2015-07-30RePEc:oup:biomet
article
Conditional quantile screening in ultrahigh-dimensional heterogeneous data
To accommodate the heterogeneity that is often present in ultrahigh-dimensional data, we propose a conditional quantile screening method, which enables us to select features that contribute to the conditional quantile of the response given the covariates. The method can naturally handle censored data by incorporating a weighting scheme through redistribution of the mass to the right; moreover, it is invariant to monotone transformation of the response and requires substantially weaker conditions than do alternative methods. We establish sure independent screening properties for both the complete and the censored response cases. We also conduct simulations to evaluate the finite-sample performance of the proposed method, and compare it with existing approaches.
1
2015
102
Biometrika
65
76
http://hdl.handle.net/10.1093/biomet/asu068
application/pdf
Access to full text is restricted to subscribers.
Yuanshan Wu
Guosheng Yin
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:95-106.2015-07-30RePEc:oup:biomet
article
Dimension reduction based on the Hellinger integral
Sufficient dimension reduction is a useful tool for studying the dependence between a response and a multi-dimensional predictor. In this article, a new formulation is proposed that is based on the Hellinger integral of order two, introduced as a natural measure of the regression information contained in the predictor subspace. The response may be either continuous or discrete. We establish links between local and global central subspaces, and propose an efficient local estimation algorithm. Simulations and an application show that our method compares favourably with existing approaches.
1
2015
102
Biometrika
95
106
http://hdl.handle.net/10.1093/biomet/asu062
application/pdf
Access to full text is restricted to subscribers.
Qin Wang
Xiangrong Yin
Frank Critchley
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:913-926.2015-07-30RePEc:oup:biomet
article
A distribution-free two-sample run test applicable to high-dimensional data
We propose a multivariate generalization of the univariate two-sample run test based on the shortest Hamiltonian path. The proposed test is distribution-free in finite samples. While most existing two-sample tests perform poorly or are even inapplicable to high-dimensional data, our test can be conveniently used in high-dimension, low-sample-size situations. We investigate its power when the sample size remains fixed and the dimension of the data grows to infinity. Simulated and real datasets demonstrate our method’s superiority over existing nonparametric two-sample tests.
4
2014
101
Biometrika
913
926
http://hdl.handle.net/10.1093/biomet/asu045
application/pdf
Access to full text is restricted to subscribers.
Munmun Biswas
Minerva Mukhopadhyay
Anil K. Ghosh
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:865-882.2015-07-30RePEc:oup:biomet
article
Robust estimators for nondecomposable elliptical graphical models
Robust estimators of the restricted covariance matrices associated with elliptical graphical models are studied. General asymptotic results, which apply to both decomposable and nondecomposable graphical models, are presented for robust plug-in type estimators. These extend results previously established only for the decomposable case. Furthermore, a class of graphical M-estimators for the restricted covariance matrices is introduced and compared with the corresponding plug-in M-estimators. The two approaches are shown to be asymptotically equivalent under random sampling from an elliptical distribution. A simulation study demonstrates the superiority of the graphical M-estimators for small samples.
4
2014
101
Biometrika
865
882
http://hdl.handle.net/10.1093/biomet/asu041
application/pdf
Access to full text is restricted to subscribers.
D. Vogel
D. E. Tyler
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:345-358.2015-07-30RePEc:oup:biomet
article
On the dependence structure of bivariate recurrent event processes: inference and estimation
Bivariate or multivariate recurrent event processes are often encountered in longitudinal studies in which more than one type of event is of interest. There has been much research on regression analysis for such data, but little has been done to measure the dependence between recurrent event processes. We propose a time-dependent measure, termed the rate ratio, to assess the local dependence between two types of recurrent event processes. We model the rate ratio as a parametric function of time, and leave unspecified all other aspects of the distribution. We develop a composite likelihood procedure for model fitting and parameter estimation. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is evaluated by simulation and illustrated by an application to a soft tissue sarcoma study.
2
2015
102
Biometrika
345
358
http://hdl.handle.net/10.1093/biomet/asu073
application/pdf
Access to full text is restricted to subscribers.
Jing Ning
Yong Chen
Chunyan Cai
Xuelin Huang
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:999-1002.2015-07-30RePEc:oup:biomet
article
General type-token distribution
We consider the problem of estimating the number of types in a corpus using the number of types observed in a sample of tokens from that corpus. We derive exact and asymptotic distributions for the number of observed types, conditioned on the number of tokens and the latent type distribution. We use the asymptotic distributions to derive an estimator of the latent number of types and validate this estimator numerically.
4
2014
101
Biometrika
999
1002
http://hdl.handle.net/10.1093/biomet/asu035
application/pdf
Access to full text is restricted to subscribers.
S. Hidaka
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:943-956.2015-07-30RePEc:oup:biomet
article
Circular designs balanced for neighbours at distances one and two
We define three types of neighbour-balanced designs for experiments where the units are arranged in a circle or single line in space or time. The designs are balanced with respect to neighbours at distance one and at distance two. The variants come from allowing or forbidding self-neighbours, and from considering neighbours to be directed or undirected. For two of the variants, we give a method of constructing a design for all values of the number of treatments, except for some small values where it is impossible. In the third case, we give a partial solution that covers all sizes likely to be used in practice.
4
2014
101
Biometrika
943
956
http://hdl.handle.net/10.1093/biomet/asu036
application/pdf
Access to full text is restricted to subscribers.
R. E. L. Aldred
R. A. Bailey
Brendan D. Mckay
Ian M. Wanless
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:499-504.2015-07-30RePEc:oup:biomet
article
Multiscale variance stabilization via maximum likelihood
This article proposes maximum likelihood approaches for multiscale variance stabilization transformations for independently and identically distributed data. For two multiscale variance stabilization transformations we present new unified theoretical results on their Jacobians, a key component of the likelihood. The results provide a deeper understanding of the transformations and the ability to compute the likelihood in linear time. The transformations are shown empirically to compare favourably to the Box–Cox transformation.
2
2014
101
Biometrika
499
504
http://hdl.handle.net/10.1093/biomet/ast072
application/pdf
Access to full text is restricted to subscribers.
G. P. Nason
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:33-45.2015-07-30RePEc:oup:biomet
article
Covariance-enhanced discriminant analysis
Linear discriminant analysis has been widely used to characterize or separate multiple classes via linear combinations of features. However, the high dimensionality of features from modern biological experiments defies traditional discriminant analysis techniques. Possible interfeature correlations present additional challenges and are often underused in modelling. In this paper, by incorporating possible interfeature correlations, we propose a covariance-enhanced discriminant analysis method that simultaneously and consistently selects informative features and identifies the corresponding discriminable classes. Under mild regularity conditions, we show that the method can achieve consistent parameter estimation and model selection, and can attain an asymptotically optimal misclassification rate. Extensive simulations have verified the utility of the method, which we apply to a renal transplantation trial.
1
2015
102
Biometrika
33
45
http://hdl.handle.net/10.1093/biomet/asu049
application/pdf
Access to full text is restricted to subscribers.
Peirong Xu
Ji Zhu
Lixing Zhu
Yi Li
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:883-898.2015-07-30RePEc:oup:biomet
article
Nonparametric Bayes dynamic modelling of relational data
Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the link probability matrix space to the latent relational space, we obtain a flexible and computationally tractable formulation. Employing Pólya-gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide theoretical results on flexibility of the model, and illustrate its performance via simulation experiments. We also consider an application to co-movements in world financial markets.
4
2014
101
Biometrika
883
898
http://hdl.handle.net/10.1093/biomet/asu040
application/pdf
Access to full text is restricted to subscribers.
Daniele Durante
David B. Dunson
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:247-266.2015-07-30RePEc:oup:biomet
article
Testing differential networks with applications to the detection of gene-gene interactions
Model organisms and human studies have yielded increasing empirical evidence that interactions among genes contribute broadly to genetic variation of complex traits. In the presence of gene-gene interactions, the dimensionality of the feature space becomes extremely high relative to the sample size. This poses a significant methodological challenge in the identification of gene-gene interactions. In this paper, by using a Gaussian graphical model framework, we translate the problem of identifying gene-gene interactions associated with a binary trait D into an inference problem on the difference of two high-dimensional precision matrices that summarize the conditional dependence network structures of the genes. We propose a procedure for testing the differential network globally, which is particularly powerful against sparse alternatives. In addition, a multiple testing procedure with false discovery rate control is developed to infer the specific structure of the differential network. Theoretical justification is provided to ensure the validity of the proposed tests, and optimality results are derived under sparsity assumptions. Through a simulation study we demonstrate that the proposed tests maintain the desired error rates under the null hypothesis and have good power under the alternative hypothesis. The methods are applied to a breast cancer gene expression study.
2
2015
102
Biometrika
247
266
http://hdl.handle.net/10.1093/biomet/asu074
application/pdf
Access to full text is restricted to subscribers.
Yin Xia
Tianxi Cai
T. Tony Cai
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:135-150.2015-07-30RePEc:oup:biomet
article
An extended hazard model with longitudinal covariates
In clinical trials and other medical studies, it has become increasingly common to observe simultaneously an event time of interest and longitudinal covariates. In the literature, joint modelling approaches have been employed to analyse both survival and longitudinal processes and to investigate their association. However, these approaches focus mostly on developing adaptive and flexible longitudinal processes based on a prespecified survival model, most commonly the Cox proportional hazards model. In this paper, we propose a general class of semiparametric hazard regression models, referred to as the extended hazard model, for the survival component. This class includes two popular survival models, the Cox proportional hazards model and the accelerated failure time model, as special cases. The proposed model is flexible for modelling event data, and its nested structure facilitates model selection for the survival component through likelihood ratio tests. A pseudo joint likelihood approach is proposed for estimating the unknown parameters and components via a Monte Carlo em algorithm. Asymptotic theory for the estimators is developed together with theory for the semiparametric likelihood ratio tests. The performance of the procedure is demonstrated through simulation studies. A case study featuring data from a Taiwanese HIV/AIDS cohort study further illustrates the usefulness of the extended hazard model.
1
2015
102
Biometrika
135
150
http://hdl.handle.net/10.1093/biomet/asu058
application/pdf
Access to full text is restricted to subscribers.
Y. K. Tseng
Y. R. Su
M. Mao
J. L. Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:491-498.2015-07-30RePEc:oup:biomet
article
Sequential combination of weighted and nonparametric bagging for classification
We propose a simple sequential procedure for bagged classification, which modifies nonparametric bagging by randomizing class labels of resampled data points. The random labelling feature of the procedure also enables us to undertake unsupervised classification with the benefit of supervised learning. Theoretical properties are given for the nearest neighbour classifier in the case of supervised learning and a hard-thresholding indicator in the case of unsupervised learning, showing that sequential bagging accelerates convergence of the bagged predictor to the Bayes rule. Simulation results are provided in support of the proposed method.
2
2014
101
Biometrika
491
498
http://hdl.handle.net/10.1093/biomet/ast068
application/pdf
Access to full text is restricted to subscribers.
M. Soleymani
S. M. S. Lee
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:17-36.2015-07-30RePEc:oup:biomet
article
A sum characterization of hidden regular variation with likelihood inference via expectation-maximization
A fundamental deficiency of classical multivariate extreme value theory is the inability to distinguish between asymptotic independence and exact independence. In this work, we examine multivariate threshold modelling in the framework of regular variation on cones. Tail dependence is described by a limiting measure, which in some cases is degenerate on joint tail regions despite strong subasymptotic dependence in such regions. Hidden regular variation, a higher-order tail decay on these regions, offers a refinement of the classical theory. We develop a representation of random vectors possessing hidden regular variation as the sum of independent regular varying components. The representation is shown to be asymptotically valid via a multivariate tail equivalence result. We develop a likelihood-based estimation procedure from this representation via a Monte Carlo expectation-maximization algorithm which has been modified for tail estimation. The method is demonstrated on simulated data and applied to air pollution measurements.
1
2014
101
Biometrika
17
36
http://hdl.handle.net/10.1093/biomet/ast046
application/pdf
Access to full text is restricted to subscribers.
Grant B. Weller
Daniel Cooley
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:252.2015-07-30RePEc:oup:biomet
article
‘Objective Bayesian analysis for the Student-t regression model’
1
2014
101
Biometrika
252
252
http://hdl.handle.net/10.1093/biomet/asu001
application/pdf
Access to full text is restricted to subscribers.
T. C. O. Fonseca
M. A. R. Ferreira
H. S. Migon
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:295-313.2015-07-30RePEc:oup:biomet
article
Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator
When an unbiased estimator of the likelihood is used within a Metropolis–Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of the averages computed under this chain. Using many Monte Carlo samples will typically result in Metropolis–Hastings averages with lower asymptotic variances than the corresponding averages that use fewer samples; however, the computing time required to construct the likelihood estimator increases with the number of samples. Under the assumption that the distribution of the additive noise introduced by the loglikelihood estimator is Gaussian with variance inversely proportional to the number of samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We illustrate our results by considering a stochastic volatility model applied to stock index returns.
2
2015
102
Biometrika
295
313
http://hdl.handle.net/10.1093/biomet/asu075
application/pdf
Access to full text is restricted to subscribers.
A. Doucet
M. K. Pitt
G. Deligiannidis
R. Kohn
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:815-829.2015-07-30RePEc:oup:biomet
article
Transformed sufficient dimension reduction
We propose a general framework for dimension reduction in regression to fill the gap between linear and fully nonlinear dimension reduction. The main idea is to first transform each of the raw predictors monotonically and then search for a low-dimensional projection in the space defined by the transformed variables. Both user-specified and data-driven transformations are suggested. In each case, the methodology is first discussed in generality and then a representative method is proposed and evaluated by simulation. The proposed methods are applied to a real dataset.
4
2014
101
Biometrika
815
829
http://hdl.handle.net/10.1093/biomet/asu037
application/pdf
Access to full text is restricted to subscribers.
T. Wang
X. Guo
L. Zhu
P. Xu
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:189-204.2015-07-30RePEc:oup:biomet
article
Retrospective-prospective symmetry in the likelihood and Bayesian analysis of case-control studies
Prentice & Pyke (1979) established that the maximum likelihood estimate of an odds ratio in a case-control study is the same as would be found by fitting a logistic regression; in other words, for this specific target the incorrect prospective model is inferentially equivalent to the correct retrospective model. Similar results have been obtained for other models, and conditions have also been identified under which the corresponding Bayesian property holds, namely that the posterior distribution of the odds ratio is the same whether it is computed using the prospective or the retrospective likelihood. In this article we demonstrate how these results follow directly from certain parameter independence properties of the models and priors, and identify prior laws that support such reverse analysis, for both standard and stratified designs.
1
2014
101
Biometrika
189
204
http://hdl.handle.net/10.1093/biomet/ast050
application/pdf
Access to full text is restricted to subscribers.
Simon P. J. Byrne
A. Philip Dawid
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:771-784.2015-07-30RePEc:oup:biomet
article
When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples
Regularization aims to improve prediction performance by trading an increase in training error for better agreement between training and prediction errors, which is often captured through decreased degrees of freedom. In this paper we give examples which show that regularization can increase the degrees of freedom in common models, including the lasso and ridge regression. In such situations, both training error and degrees of freedom increase, making the regularization inherently without merit. Two important scenarios are described where the expected reduction in degrees of freedom is guaranteed: all symmetric linear smoothers and convex constrained linear regression models like ridge regression and the lasso, when compared to unconstrained linear regression.
4
2014
101
Biometrika
771
784
http://hdl.handle.net/10.1093/biomet/asu034
application/pdf
Access to full text is restricted to subscribers.
S. Kaufman
S. Rosset
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:799-814.2015-07-30RePEc:oup:biomet
article
Censored rank independence screening for high-dimensional survival data
In modern statistical applications, the dimension of covariates can be much larger than the sample size. In the context of linear models, correlation screening (Fan & Lv, J. R. Statist. Soc. B, 70, 849–911, 2008) has been shown to reduce the dimension of such data effectively while achieving the sure screening property, i.e., all of the active variables can be retained with high probability. However, screening based on the Pearson correlation does not perform well when applied to contaminated covariates and/or censored outcomes. In this paper, we study censored rank independence screening of high-dimensional survival data. The proposed method is robust to predictors that contain outliers, works for a general class of survival models, and enjoys the sure screening property. Simulations and an analysis of real data demonstrate that the proposed method performs competitively on survival datasets of moderate size and high-dimensional predictors, even when these are contaminated.
4
2014
101
Biometrika
799
814
http://hdl.handle.net/10.1093/biomet/asu047
application/pdf
Access to full text is restricted to subscribers.
Rui Song
Wenbin Lu
Shuangge Ma
X. Jessie Jeng
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:409-420.2015-07-30RePEc:oup:biomet
article
A validated information criterion to determine the structural dimension in dimension reduction models
A crucial component of performing sufficient dimension reduction is to determine the structural dimension of the reduction model. We propose a novel information criterion-based method for this purpose, a special feature of which is that when examining the goodness-of-fit of the current model, one needs to perform model evaluation by using an enlarged candidate model. Although the procedure does not require estimation under the enlarged model of dimension k+1, the decision as to how well the current model of dimension k fits relies on the validation provided by the enlarged model; thus we call this procedure the validated information criterion, vic(k). Our method is different from existing information criterion-based model selection methods; it breaks free from dependence on the connection between dimension reduction models and their corresponding matrix eigenstructures, which relies heavily on a linearity condition that we no longer assume. We prove consistency of the proposed method, and its finite-sample performance is demonstrated numerically.
2
2015
102
Biometrika
409
420
http://hdl.handle.net/10.1093/biomet/asv004
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Xinyu Zhang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:899-911.2015-07-30RePEc:oup:biomet
article
A class of improved hybrid Hochberg–Hommel type step-up multiple test procedures
In this paper we derive a new p-value based multiple testing procedure that improves upon the Hommel procedure by gaining power as well as having a simpler step-up structure similar to the Hochberg procedure. The key to this improvement is that the Hommel procedure can be improved by a consonant procedure. Exact critical constants of this new procedure can be numerically determined. The zeroth-order approximations to the exact critical constants, albeit slightly conservative, are simple to use and need no tabling, and hence are recommended in practice. The proposed procedure is shown to control the familywise error rate under independence among the p-values. Simulations empirically demonstrate familywise error rate control under positive and negative dependence. Power superiority of the proposed procedure over competing ones is also empirically demonstrated. Illustrative examples are given.
4
2014
101
Biometrika
899
911
http://hdl.handle.net/10.1093/biomet/asu032
application/pdf
Access to full text is restricted to subscribers.
Jiangtao Gou
Ajit C. Tamhane
Dong Xi
Dror Rom
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:486-493.2015-07-30RePEc:oup:biomet
article
Semiparametric exponential families for heavy-tailed data
We propose a semiparametric method for fitting the tail of a heavy-tailed population given a relatively small sample from that population and a larger sample from a related background population. We model the tail of the small sample as an exponential tilt of the better-observed large-sample tail, using a robust sufficient statistic motivated by extreme value theory. In particular, our method induces an estimator of the small-population mean, and we give theoretical and empirical evidence that this estimator outperforms methods that do not use the background sample. We demonstrate substantial efficiency gains over competing methods in simulation and on data from a large controlled experiment conducted by Facebook.
2
2015
102
Biometrika
486
493
http://hdl.handle.net/10.1093/biomet/asu065
application/pdf
Access to full text is restricted to subscribers.
William Fithian
Stefan Wager
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:755-769.2015-07-30RePEc:oup:biomet
article
Classification with confidence
A framework for classification is developed with a notion of confidence. In this framework, a classifier consists of two tolerance regions in the predictor space, with a specified coverage level for each class. The classifier also produces an ambiguous region where the classification needs further investigation. Theoretical analysis reveals interesting structures of the confidence-ambiguity trade-off, and the optimal solution is characterized by extending the Neyman–Pearson lemma. We provide general estimating procedures, along with rates of convergence, based on estimates of the conditional probabilities. The method can be easily implemented with good robustness, as illustrated through theory, simulation and a data example.
4
2014
101
Biometrika
755
769
http://hdl.handle.net/10.1093/biomet/asu038
application/pdf
Access to full text is restricted to subscribers.
Jing Lei
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:351-363.2015-07-30RePEc:oup:biomet
article
Indicator functions and the algebra of the linear-quadratic parameterization
Indicator functions are constructed under the linear-quadratic parameterization for contrasts, and applied to the study of partial aliasing properties for three-level fractional factorial designs. An algebraic operation is introduced for the calculation of indicator function coefficients. This operation connects design construction methods to the analysis under the linear-quadratic system, and helps establish simple conditions for the estimability of interactions.
2
2014
101
Biometrika
351
363
http://hdl.handle.net/10.1093/biomet/ast070
application/pdf
Access to full text is restricted to subscribers.
Arman Sabbaghi
Tirthankar Dasgupta
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:1003.2015-07-30RePEc:oup:biomet
article
On exact forms of Taylor’s theorem for vector-valued functions
Exact forms of Taylor expansion for vector-valued functions have been incorrectly used in many statistical publications. We offer two methods to correct this error.
4
2014
101
Biometrika
1003
1003
http://hdl.handle.net/10.1093/biomet/asu061
application/pdf
Access to full text is restricted to subscribers.
Changyong Feng
Hongyue Wang
Tian Chen
Xin M. Tu
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:535-552.2015-07-30RePEc:oup:biomet
article
Tests for comparing estimated survival functions
We describe a class of statistical tests for the comparison of two or more survival curves, typically estimated using the Kaplan–Meier method. The class is based on the construction of O’Quigley (2003), and some special cases are of particular interest. Underlying the inferential development are the arguments of Efron & Hinkley (1978), leading to a theoretical sampling model that is in some sense closer to the observed data. The log-rank and weighted log-rank tests arise as special members of the class. In practice the log-rank test will often be a suboptimal, even poor, test due to the presence of non-proportional hazards. The proposed test maintains good power and, in all the cases considered, has greater power than the log-rank test under non-proportional hazards. The power will depend on the alternatives being considered, and under reasonable assumptions on the alternatives, we conclude that the proposed test is more powerful than the log-rank test. Simulations support these conclusions. An example is given as an illustration.
3
2014
101
Biometrika
535
552
http://hdl.handle.net/10.1093/biomet/asu015
application/pdf
Access to full text is restricted to subscribers.
C. Chauvel
J. O'Quigley
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:107-119.2015-07-30RePEc:oup:biomet
article
A transformation approach in linear mixed-effects models with informative missing responses
We consider a linear mixed-effects model in which the response panel vector has missing components and the missing data mechanism depends on observed data as well as missing responses through unobserved random effects. Using a transformation of the data that eliminates the random effects, we derive asymptotically unbiased and normally distributed estimators of certain model parameters. Estimators of model parameters that cannot be estimated using the transformed data are also constructed, and their asymptotic unbiasedness and normality are established. Simulation results are presented to examine the finite sample performance of the proposed estimators and a real data example is discussed.
1
2015
102
Biometrika
107
119
http://hdl.handle.net/10.1093/biomet/asu069
application/pdf
Access to full text is restricted to subscribers.
J. Shao
J. Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:43-552012-05-01RePEc:oup:biomet
article
Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds
A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
43
55
http://hdl.handle.net/10.1093/biomet/asr078
application/pdf
Access to full text is restricted to subscribers.
Emma F. Eastoe
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:230-2372012-05-01RePEc:oup:biomet
article
Estimating overdispersion when fitting a generalized linear model to sparse data
We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, φ, is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of φ on Pearson's lack-of-fit statistic, with or without Farrington's modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
230
237
http://hdl.handle.net/10.1093/biomet/asr083
application/pdf
Access to full text is restricted to subscribers.
D. J. Fletcher
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:85-1002012-05-01RePEc:oup:biomet
article
Combining data from two independent surveys: a model-assisted approach
Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
85
100
http://hdl.handle.net/10.1093/biomet/asr063
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:167-1842012-05-01RePEc:oup:biomet
article
Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
167
184
http://hdl.handle.net/10.1093/biomet/asr062
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Ming-Hui Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:15-282012-05-01RePEc:oup:biomet
article
Factor profiled sure independence screening
We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
15
28
http://hdl.handle.net/10.1093/biomet/asr074
application/pdf
Access to full text is restricted to subscribers.
H. Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:101-1132012-05-01RePEc:oup:biomet
article
Optimal allocation to maximize the power of two-sample tests for binary response
We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
101
113
http://hdl.handle.net/10.1093/biomet/asr077
application/pdf
Access to full text is restricted to subscribers.
D. Azriel
M. Mandel
Y. Rinott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:71-842012-05-01RePEc:oup:biomet
article
Optimal fractions of two-level factorials under a baseline parameterization
Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/asr071
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:151-1652012-05-01RePEc:oup:biomet
article
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
151
165
http://hdl.handle.net/10.1093/biomet/asr076
application/pdf
Access to full text is restricted to subscribers.
Grace Y. Yi
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:245-2512012-05-01RePEc:oup:biomet
article
Optimality of group testing in the presence of misclassification
Several optimality properties of Dorfman's (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/asr064
application/pdf
Access to full text is restricted to subscribers.
Aiyi Liu
Chunling Liu
Zhiwei Zhang
Paul S. Albert
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:238-2442012-05-01RePEc:oup:biomet
article
On robust estimation via pseudo-additive information
We consider a robust parameter estimator minimizing an empirical approximation to the q-entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
238
244
http://hdl.handle.net/10.1093/biomet/asr061
application/pdf
Access to full text is restricted to subscribers.
Davide Ferrari
Davide La Vecchia
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:57-692012-05-01RePEc:oup:biomet
article
Conservative hypothesis tests and confidence intervals using importance sampling
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
57
69
http://hdl.handle.net/10.1093/biomet/asr079
application/pdf
Access to full text is restricted to subscribers.
Matthew T. Harrison
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:211-2222012-05-01RePEc:oup:biomet
article
A proportional likelihood ratio model
We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
211
222
http://hdl.handle.net/10.1093/biomet/asr060
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:223-2292012-05-01RePEc:oup:biomet
article
Proportional likelihood ratio models for mean regression
The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
223
229
http://hdl.handle.net/10.1093/biomet/asr075
application/pdf
Access to full text is restricted to subscribers.
Alan Huang
Paul J. Rathouz
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:115-1262012-05-01RePEc:oup:biomet
article
Directed acyclic graphs with edge-specific bounds
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
115
126
http://hdl.handle.net/10.1093/biomet/asr059
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
Zhiqiang Tan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:127-1402012-05-01RePEc:oup:biomet
article
Bayesian analysis of multistate event history data: beta-Dirichlet process prior
Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
127
140
http://hdl.handle.net/10.1093/biomet/asr067
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Lancelot James
Rafael Weissbach
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:185-1972012-05-01RePEc:oup:biomet
article
Mean residual life models with time-dependent coefficients under right censoring
The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
185
197
http://hdl.handle.net/10.1093/biomet/asr065
application/pdf
Access to full text is restricted to subscribers.
Liuquan Sun
Xinyuan Song
Zhigang Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:29-422012-05-01RePEc:oup:biomet
article
A direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
29
42
http://hdl.handle.net/10.1093/biomet/asr066
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
Ming Yuan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:199-2102012-05-01RePEc:oup:biomet
article
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
199
210
http://hdl.handle.net/10.1093/biomet/asr072
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:1-142012-05-01RePEc:oup:biomet
article
Studies in the history of probability and statistics, L: Karl Pearson and the Rule of Three
Karl Pearson's role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson's work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson's statistical work, and his three major achievements are briefly described. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asr046
application/pdf
Access to full text is restricted to subscribers.
Stephen M. Stigler
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:253-2722012-08-31RePEc:oup:biomet
article
Dependence modelling for spatial extremes
Current dependence models for spatial extremes are based upon max-stable processes. Within this class, there are few inferentially viable models available, and we propose one further model. More problematic are the restrictive assumptions that must be made when using max-stable processes to model dependence for spatial extremes: it must be assumed that the dependence structure of the observed extremes is compatible with a limiting model that holds for all events more extreme than those that have already occurred. This problem has long been acknowledged in the context of finite-dimensional multivariate extremes, in particular when data display dependence at observable levels, but are independent in the limit. We propose a flexible class of models that is suitable for such data in a spatial context. In addition, we consider the situation where the extremal dependence structure may vary with distance. We apply our models to spatially referenced significant wave height data from the North Sea, finding evidence that their extremal structure is not compatible with a limiting dependence model. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
253
272
http://hdl.handle.net/10.1093/biomet/asr080
application/pdf
Access to full text is restricted to subscribers.
Jennifer L. Wadsworth
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:473-4802012-08-31RePEc:oup:biomet
article
A new residual for ordinal outcomes
We propose a new residual for regression models of ordinal outcomes, defined as E{sign(y,Y)}, where y is the observed outcome and Y is a random variable from the fitted distribution. This new residual is a single value per subject irrespective of the number of categories of the ordinal outcome, contains directional information between the observed value and the fitted distribution, and does not require the assignment of arbitrary numbers to categories. We study its properties, describe its connections with other residuals, ranks and ridits, and demonstrate its use in model diagnostics. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr073
application/pdf
Access to full text is restricted to subscribers.
Chun Li
Bryan E. Shepherd
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:327-3432012-08-31RePEc:oup:biomet
article
Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
327
343
http://hdl.handle.net/10.1093/biomet/ass006
application/pdf
Access to full text is restricted to subscribers.
Yongseok Park
Jeremy M. G. Taylor
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:569-5832012-08-31RePEc:oup:biomet
article
On multilinear principal component analysis of order-two tensors
Principal component analysis is commonly used for dimension reduction in analysing high-dimensional data. Multilinear principal component analysis aims to serve a similar function for analysing tensor structure data, and has empirically been shown effective in reducing dimensionality. In this paper, we investigate its statistical properties and demonstrate its advantages. Conventional principal component analysis, which vectorizes the tensor data, may lead to inefficient and unstable prediction due to the often extremely large dimensionality involved. Multilinear principal component analysis, in trying to preserve the data structure, searches for low-dimensional projections and, thereby, decreases dimensionality more efficiently. The asymptotic theory of order-two multilinear principal component analysis, including asymptotic efficiency and distributions of principal components, associated projections, and the explained variance, is developed. A test of dimensionality is also proposed. Finally, multilinear principal component analysis is shown to improve conventional principal component analysis in analysing the Olivetti faces dataset, which is achieved by extracting a more modularly oriented basis set in reconstructing the test faces. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
569
583
http://hdl.handle.net/10.1093/biomet/ass019
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
Peishien Wu
Iping Tu
Suyun Huang
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:615-6302012-08-31RePEc:oup:biomet
article
Predictive accuracy of covariates for event times
We propose a graphical measure, the generalized negative predictive function, to quantify the predictive accuracy of covariates for survival time or recurrent event times. This new measure characterizes the event-free probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two clinical studies are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
615
630
http://hdl.handle.net/10.1093/biomet/ass018
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:741-7472012-08-31RePEc:oup:biomet
article
The fitting of complex parametric models
Consider parametric models that are too complicated to allow calculation of a likelihood but from which observations can be simulated. We examine parameter estimators that are linear functions of a possibly large set of candidate features. A combination of simulations based on a fractional design and sets of discriminant analyses is then used to find an optimal estimator of the vector parameter and its covariance matrix. The procedure is an alternative to the approximate Bayesian computation scheme. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/ass030
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
Christiana Kartsonaki
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:687-7022012-08-31RePEc:oup:biomet
article
Inner envelopes: efficient estimation in multivariate linear regression
In this article we propose a new model, called the inner envelope model, which leads to efficient estimation in the context of multivariate normal linear regression. The asymptotic distribution and the consistency of its maximum likelihood estimators are established. Theoretical results, simulation studies and examples all show that the efficiency gains can be substantial relative to standard methods and to the maximum likelihood estimators from the envelope model introduced recently by Cook et al. (2010). Compared to the envelope model, the inner envelope model is based on a different construction and it can produce substantial efficiency gains in situations where the envelope model offers no gains. In effect, inner envelopes open a new frontier to the way in which reducing subspaces can be used to improve efficiency in multivariate problems. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
687
702
http://hdl.handle.net/10.1093/biomet/ass024
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:405-4212012-08-31RePEc:oup:biomet
article
Corrected-loss estimation for quantile regression with covariate measurement errors
We study estimation in quantile regression when covariates are measured with errors. Existing methods require stringent assumptions, such as spherically symmetric joint distribution of the regression and measurement error variables, or linearity of all quantile functions, which restrict model flexibility and complicate computation. In this paper, we develop a new estimation approach based on corrected scores to account for a class of covariate measurement errors in quantile regression. The proposed method is simple to implement. Its validity requires only linearity of the particular quantile function of interest, and it requires no parametric assumptions on the regression error distributions. Finite-sample results demonstrate that the proposed estimators are more efficient than the existing methods in various models considered. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
405
421
http://hdl.handle.net/10.1093/biomet/ass005
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Leonard A. Stefanski
Zhongyi Zhu
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:649-6622012-08-31RePEc:oup:biomet
article
Modelling covariance structure in bivariate marginal models for longitudinal data
It can be more challenging to efficiently model the covariance matrices for multivariate longitudinal data than for the univariate case, due to the correlations arising between multiple responses. The positive-definiteness constraint and the high dimensionality are further obstacles in covariance modelling. In this paper, we develop a data-based method by which the parameters in the covariance matrices are replaced by unconstrained and interpretable parameters with reduced dimensions. The maximum likelihood estimators for the mean and covariance parameters are shown to be consistent and asymptotically normally distributed. Simulations and real data analysis show that the new approach performs very well even when modelling bivariate nonstationary dependence structures. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
649
662
http://hdl.handle.net/10.1093/biomet/ass031
application/pdf
Access to full text is restricted to subscribers.
Jing Xu
Gilbert Mackenzie
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:511-5312012-08-31RePEc:oup:biomet
article
Nonparametric estimation of diffusions: a differential equations approach
We consider estimation of scalar functions that determine the dynamics of diffusion processes. It has been recently shown that nonparametric maximum likelihood estimation is ill-posed in this context. We adopt a probabilistic approach to regularize the problem by the adoption of a prior distribution for the unknown functional. A Gaussian prior measure is chosen in the function space by specifying its precision operator as an appropriate differential operator. We establish that a Bayesian--Gaussian conjugate analysis for the drift of one-dimensional nonlinear diffusions is feasible using high-frequency data, by expressing the loglikelihood as a quadratic function of the drift, with sufficient statistics given by the local time process and the end points of the observed path. Computationally efficient posterior inference is carried out using a finite element method. We embed this technology in partially observed situations and adopt a data augmentation approach whereby we iteratively generate missing data paths and draws from the unknown functional. Our methodology is applied to estimate the drift of models used in molecular dynamics and financial econometrics using high- and low-frequency observations. We discuss extensions to other partially observed schemes and connections to other types of nonparametric inference. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
511
531
http://hdl.handle.net/10.1093/biomet/ass034
application/pdf
Access to full text is restricted to subscribers.
Omiros Papaspiliopoulos
Yvo Pokern
Gareth O. Roberts
Andrew M. Stuart
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:551-5682012-08-31RePEc:oup:biomet
article
Analysis of principal nested spheres
A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall's shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
551
568
http://hdl.handle.net/10.1093/biomet/ass022
application/pdf
Access to full text is restricted to subscribers.
Sungkyu Jung
Ian L. Dryden
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:502-5082012-08-31RePEc:oup:biomet
article
Inference for additive interaction under exposure misclassification
Results are given concerning inferences that can be drawn about interaction when binary exposures are subject to certain forms of independent nondifferential misclassification. Tests for interaction, using the misclassified exposures, are valid provided the probability of misclassification satisfies certain bounds. Results are given for additive statistical interactions, for causal interactions corresponding to synergism in the sufficient cause framework and for so-called compositional epistasis. Both two-way and three-way interactions are considered. The results require only that the probability of misclassification be no larger than 1/2 or 1/4, depending on the test. For additive statistical interaction, a method to correct estimates and confidence intervals for misclassification is described. The consequences for power of interaction tests under exposure misclassification are explored through simulations. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
502
508
http://hdl.handle.net/10.1093/biomet/ass012
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:631-6482012-08-31RePEc:oup:biomet
article
An efficient method of estimation for longitudinal surveys with monotone missing data
Panel attrition is frequently encountered in panel sample surveys. When it is related to the observed study variable, the classical approach of nonresponse adjustment using a covariate-dependent dropout mechanism can be biased. We consider an efficient method of estimation with monotone panel attrition when the response probability depends on the previous values of study variable as well as other covariates. Because of the monotone structure of the missing pattern, the response mechanism is missing at random. The proposed estimator is asymptotically optimal in the sense that it minimizes the asymptotic variance of a class of estimators that can be written as a linear combination of the unbiased estimators of the panel estimates for each wave, and incorporates all available information using generalized least squares. Variance estimation is discussed and results from a simulation study are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
631
648
http://hdl.handle.net/10.1093/biomet/ass026
application/pdf
Access to full text is restricted to subscribers.
Ming Zhou
Jae Kwang Kim
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:457-4722012-08-31RePEc:oup:biomet
article
Empirical bootstrap bias correction and estimation of prediction mean square error in small area estimation
We develop a method for bias correction, which models the error of the target estimator as a function of the corresponding estimator obtained from bootstrap samples, and the original estimators and bootstrap estimators of the parameters governing the model fitted to the sample data. This is achieved by considering a number of plausible parameter values, generating a pseudo original sample for each parameter and bootstrap samples for each such sample, and then searching for an appropriate functional relationship. Under certain conditions, the procedure also permits estimation of the mean square error of the bias corrected estimator. The method is applied for estimating the prediction mean square error in small area estimation of proportions under a generalized mixed model. Empirical comparisons with jackknife and bootstrap methods are presented. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
457
472
http://hdl.handle.net/10.1093/biomet/ass010
application/pdf
Access to full text is restricted to subscribers.
D. Pfeffermann
S. Correa
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:703-7162012-08-31RePEc:oup:biomet
article
Penalized empirical likelihood and growing dimensional general estimating equations
When a parametric likelihood function is not specified for a model, estimating equations may provide an instrument for statistical inference. Qin and Lawless (1994) illustrated that empirical likelihood makes optimal use of these equations in inferences for fixed low-dimensional unknown parameters. In this paper, we study empirical likelihood for general estimating equations with growing high dimensionality and propose a penalized empirical likelihood approach for parameter estimation and variable selection. We quantify the asymptotic properties of empirical likelihood and its penalized version, and show that penalized empirical likelihood has the oracle property. The performance of the proposed method is illustrated via simulated applications and a data analysis. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
703
716
http://hdl.handle.net/10.1093/biomet/ass014
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Cheng Yong Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:481-4872012-08-31RePEc:oup:biomet
article
Structuring shrinkage: some correlated priors for regression
This paper develops a rich class of sparsity priors for regression effects that encourage shrinkage of both regression effects and contrasts between effects to zero whilst leaving sizeable real effects largely unshrunk. The construction of these priors uses some properties of normal-gamma distributions to include design features in the prior specification, but has general relevance to any continuous sparsity prior. Specific prior distributions are developed for serial dependence between regression effects and correlation within groups of regression effects. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
481
487
http://hdl.handle.net/10.1093/biomet/asr082
application/pdf
Access to full text is restricted to subscribers.
J. E. Griffin
P. J. Brown
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:733-7402012-08-31RePEc:oup:biomet
article
Positive definite estimators of large covariance matrices
Using convex optimization, we construct a sparse estimator of the covariance matrix that is positive definite and performs well in high-dimensional settings. A lasso-type penalty is used to encourage sparsity and a logarithmic barrier function is used to enforce positive definiteness. Consistency and convergence rate bounds are established as both the number of variables and sample size diverge. An efficient computational algorithm is developed and the merits of the approach are illustrated with simulations and a speech signal classification example. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
733
740
http://hdl.handle.net/10.1093/biomet/ass025
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:299-3132012-08-31RePEc:oup:biomet
article
Componentwise classification and clustering of functional data
The infinite dimension of functional data can challenge conventional methods for classification and clustering. A variety of techniques have been introduced to address this problem, particularly in the case of prediction, but the structural models that they involve can be too inaccurate, or too abstract, or too difficult to interpret, for practitioners. In this paper, we develop approaches to adaptively choose components, enabling classification and clustering to be reduced to finite-dimensional problems. We explore and discuss properties of these methodologies. Our techniques involve methods for estimating classifier error rate and cluster tightness, and for choosing both the number of components, and their locations, to optimize these quantities. A major attraction of this approach is that it allows identification of parts of the function domain that convey important information for classification and clustering. It also permits us to determine regions that are relevant to one of these analyses but not the other. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
299
313
http://hdl.handle.net/10.1093/biomet/ass003
application/pdf
Access to full text is restricted to subscribers.
A. Delaigle
P. Hall
N. Bathia
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:599-6132012-08-31RePEc:oup:biomet
article
Nonparametric incidence estimation from prevalent cohort survival data
Incidence is an important epidemiological concept most suitably studied using an incident cohort study. However, data are often collected from the more feasible prevalent cohort study, whereby diseased individuals are recruited through a cross-sectional survey and followed in time. In the absence of temporal trends in survival, we derive an efficient nonparametric estimator of the cumulative incidence based on such data and study its asymptotic properties. Arbitrary calendar time variations in disease incidence are allowed. Age-specific incidence and adjustments for both stratified sampling and temporal variations in survival are also discussed. Simulation results are presented and data from the Canadian Study of Health and Aging are analysed to infer the incidence of dementia in the Canadian elderly population. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
599
613
http://hdl.handle.net/10.1093/biomet/ass017
application/pdf
Access to full text is restricted to subscribers.
Marco Carone
Masoud Asgharian
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:273-2842012-08-31RePEc:oup:biomet
article
Stochastic blockmodels with a growing number of classes
We present asymptotic and finite-sample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size. We also establish finite-sample confidence bounds on maximum-likelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising self-reported school friendships, resulting in block estimates that reveal residual structure. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
273
284
http://hdl.handle.net/10.1093/biomet/asr053
application/pdf
Access to full text is restricted to subscribers.
D. S. Choi
P. J. Wolfe
E. M. Airoldi
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:717-7312012-08-31RePEc:oup:biomet
article
On the robustness of the adaptive lasso to model misspecification
Penalization methods have been shown to yield both consistent variable selection and oracle parameter estimation under correct model specification. In this article, we study such methods under model misspecification, where the assumed form of the regression function is incorrect, including generalized linear models for uncensored outcomes and the proportional hazards model for censored responses. Estimation with the adaptive least absolute shrinkage and selection operator, lasso, penalty is proven to achieve sparse estimation of regression coefficients under misspecification. The resulting estimators are selection consistent, asymptotically normal and oracle, where the selection is based on the limiting values of the parameter estimators obtained using the misspecified model without penalization. We further derive conditions under which the penalized estimators from the misspecified model may yield selection consistency under the true model. The robustness is explored numerically via simulation and an application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
717
731
http://hdl.handle.net/10.1093/biomet/ass027
application/pdf
Access to full text is restricted to subscribers.
W. Lu
Y. Goldberg
J. P. Fine
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:315-3252012-08-31RePEc:oup:biomet
article
Global optimality of nonconvex penalized estimators
Nonconvex penalties such as the smoothly clipped absolute deviation or minimax concave penalties have desirable properties such as the oracle property, even when the dimension of the predictive variables is large. However, checking whether a given local minimizer has such properties is not easy since there can be many local minimizers. In this paper, we give sufficient conditions under which a local minimizer is unique, and show that the oracle estimator becomes the unique local minimizer with probability tending to one. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
315
325
http://hdl.handle.net/10.1093/biomet/asr084
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Sunghoon Kwon
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:379-3922012-08-31RePEc:oup:biomet
article
Efficient estimation for the Cox model with varying coefficients
A proportional hazards model with varying coefficients allows one to examine the extent to which covariates interact nonlinearly with an exposure variable. A global partial likelihood method, in contrast with the local partial likelihood method of Fan et al. (2006), is proposed for estimation of varying coefficient functions. The proposed estimators are proved to be consistent and asymptotically normal. Semiparametric efficiency of the estimators is demonstrated in terms of their linear functionals. Evidence in support of the superiority of the method is presented in numerical studies and real examples. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
379
392
http://hdl.handle.net/10.1093/biomet/asr081
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Huazhen Lin
Yong Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:439-4562012-08-31RePEc:oup:biomet
article
Improved double-robust estimation in missing data and causal inference models
Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
439
456
http://hdl.handle.net/10.1093/biomet/ass013
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Quanhong Lei
Mariela Sued
James M. Robins
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:285-2982012-08-31RePEc:oup:biomet
article
Doubly misspecified models
Estimation bias arising from local model uncertainty and incomplete data has been studied by Copas & Eguchi (2005) under the assumption of a correctly specified marginal model. We extend the approach to allow additional local uncertainty in the assumed marginal model, arguing that this is almost unavoidable for nonlinear problems. We present a general bias analysis and sensitivity procedure for such doubly misspecified models and illustrate the breadth of application through three examples: logistic regression with a missing confounder, measurement error for binary responses and survival analysis with frailty. We show that a double-the-variance rule is not conservative under double misspecification. The ideas are brought together in a meta-analysis of studies of rehabilitation rates for juvenile offenders. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
285
298
http://hdl.handle.net/10.1093/biomet/asr085
application/pdf
Access to full text is restricted to subscribers.
N. X. Lin
J. Q. Shi
R. Henderson
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:393-4042012-08-31RePEc:oup:biomet
article
Nonparametric inference for assessing treatment efficacy in randomized clinical trials with a time-to-event outcome and all-or-none compliance
To evaluate the biological efficacy of a treatment in a randomized clinical trial, one needs to compare patients in the treatment arm who actually received treatment with the subgroup of patients in the control arm who would have received treatment had they been randomized into the treatment arm. In practice, subgroup membership in the control arm is usually unobservable. This paper develops a nonparametric inference procedure to compare subgroup probabilities with right-censored time-to-event data and unobservable subgroup membership in the control arm. We also present a procedure to estimate the onset and duration of treatment effect. The performance of our method is evaluated by simulation. An illustration is given using a randomized clinical trial for melanoma. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
393
404
http://hdl.handle.net/10.1093/biomet/ass004
application/pdf
Access to full text is restricted to subscribers.
Robert M. Elashoff
Gang Li
Ying Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:345-3612012-08-31RePEc:oup:biomet
article
Analysing bivariate survival data with interval sampling and application to cancer epidemiology
In biomedical studies, ordered bivariate survival data are frequently encountered when bivariate failure events are used as outcomes to identify the progression of a disease. In cancer studies, interest could be focused on bivariate failure times, for example, time from birth to cancer onset and time from cancer onset to death. This paper considers a sampling scheme, termed interval sampling, in which the first failure event is identified within a calendar time interval, the time of the initiating event can be retrospectively confirmed and the occurrence of the second failure event is observed subject to right censoring. In a cancer data application, the initiating, first and second events could correspond to birth, cancer onset and death. The fact that the data are collected conditional on the first failure event occurring within a time interval induces bias. Interval sampling is widely used for collection of disease registry data by governments and medical institutions, though the interval sampling bias is frequently overlooked by researchers. This paper develops statistical methods for analysing such data. Semiparametric methods are proposed under semi-stationarity and stationarity. Numerical studies demonstrate that the proposed estimation approaches perform well with moderate sample sizes. We apply the proposed methods to ovarian cancer registry data. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
345
361
http://hdl.handle.net/10.1093/biomet/ass009
application/pdf
Access to full text is restricted to subscribers.
Hong Zhu
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:488-4932012-08-31RePEc:oup:biomet
article
Information dynamics and optimal sampling in capture-recapture
The build up of information in a continued capture-recapture experiment of simple random sampling of an open population is studied by predicting the conditional approximate Fisher information for abundance in data from one survey given the previous data. By neglecting the stochasticity in survival, a simple approximate likelihood is obtained. Optimal temporal allocation of a given total effort is found by numerical optimization for various objective functions based on the approximate Fisher information. For aerial photographic surveys of bowhead whales, the performance of estimates of abundance and of demographic parameters is compared between constant yearly survey effort and nominally optimal sampling by simulating a realistic model over 50 years. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
488
493
http://hdl.handle.net/10.1093/biomet/ass001
application/pdf
Access to full text is restricted to subscribers.
T. Schweder
D. Sadykova
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:675-6862012-08-31RePEc:oup:biomet
article
Objective Bayes, conditional inference and the signed root likelihood ratio statistic
Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic. Conditions for conditional probability matching in ancillary statistic models are derived and discussed. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
675
686
http://hdl.handle.net/10.1093/biomet/ass028
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
Todd A. Kuffner
G. Alastair Young
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:755-7622012-08-31RePEc:oup:biomet
article
Quadratic inference function approach to merging longitudinal studies: validation and joint estimation
Merging data from multiple studies has been widely adopted in biomedical research. In this paper, we consider two major issues related to merging longitudinal datasets. We first develop a rigorous hypothesis testing procedure to assess the validity of data merging, and then propose a flexible joint estimation procedure that enables us to analyse merged data and to account for different within-subject correlations and follow-up schedules in different studies. We establish large sample properties for the proposed procedures. We compare our method with meta analysis and generalized estimating equations and show that our test provides robust control of Type I error against both misspecification of working correlation structures and heterogeneous dispersion parameters. Our joint estimating procedure leads to an improvement in estimation efficiency on all regression coefficients after data merging is validated. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
755
762
http://hdl.handle.net/10.1093/biomet/ass021
application/pdf
Access to full text is restricted to subscribers.
Fei Wang
Lu Wang
Peter X.-K. Song
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:423-4382012-08-31RePEc:oup:biomet
article
Multiple imputation in quantile regression
We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American's Table Study data, investigating the association between two measures of dietary intake. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
423
438
http://hdl.handle.net/10.1093/biomet/ass007
application/pdf
Access to full text is restricted to subscribers.
Ying Wei
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:494-5012012-08-31RePEc:oup:biomet
article
A generalized Dunnett test for multi-arm multi-stage clinical studies with treatment selection
We generalize the Dunnett test to derive efficacy and futility boundaries for a flexible multi-arm multi-stage clinical trial for a normally distributed endpoint with known variance. We show that the boundaries control the familywise error rate in the strong sense. The method is applicable for any number of treatment arms, number of stages and number of patients per treatment per stage. It can be used for a wide variety of boundary types or rules derived from α-spending functions. Additionally, we show how sample size can be computed under a least favourable configuration power requirement and derive formulae for expected sample sizes. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
494
501
http://hdl.handle.net/10.1093/biomet/ass002
application/pdf
Access to full text is restricted to subscribers.
D. Magirr
T. Jaki
J. Whitehead
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:509-5092012-08-31RePEc:oup:biomet
article
'On measuring the variability of small area estimators under a basic area level model'
2
2012
99
Biometrika
509
509
http://hdl.handle.net/10.1093/biomet/ass016
application/pdf
Access to full text is restricted to subscribers.
Gauri Sankar Datta
J. N. K. Rao
David Daniel Smith
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:385-3982013-08-02RePEc:oup:biomet
article
Weighting in survey analysis under informative sampling
Sampling related to the outcome variable of a regression analysis conditional on covariates is called informative sampling and may lead to bias in ordinary least squares estimation. Weighting by the reciprocal of the inclusion probability approximately removes such bias but may inflate variance. This paper investigates two ways of modifying such weights to improve efficiency while retaining consistency. One approach is to multiply the inverse probability weights by functions of the covariates. The second is to smooth the weights given values of the outcome variable and covariates. Optimal ways of constructing weights by these two approaches are explored. Both approaches require the fitting of auxiliary weight models. The asymptotic properties of the resulting estimators are investigated and linearization variance estimators are obtained. The approach is extended to pseudo maximum likelihood estimation for generalized linear models. The properties of the different weighted estimators are compared in a limited simulation study. The robustness of the estimators to misspecification of the auxiliary weight model or of the regression model of interest is discussed. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
385
398
http://hdl.handle.net/10.1093/biomet/ass085
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
C. J. Skinner
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:519-5242013-08-02RePEc:oup:biomet
article
A central limit theorem in the β-model for undirected random graphs with a diverging number of vertices
Chatterjee et al. (2011) established the consistency of the maximum likelihood estimator in the β-model for undirected random graphs when the number of vertices goes to infinity. By approximating the inverse of the Fisher information matrix, we prove asymptotic normality of the maximum likelihood estimator under mild conditions. Simulation studies and a data example illustrate the theoretical results. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
519
524
http://hdl.handle.net/10.1093/biomet/ass084
application/pdf
Access to full text is restricted to subscribers.
Ting Yan
Jinfeng Xu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:485-4942013-08-02RePEc:oup:biomet
article
Log-mean linear models for binary data
This paper introduces a novel class of models for binary data, which we call log-mean linear models. They are specified by linear constraints on the log-mean linear parameter, defined as a log-linear expansion of the mean parameter of the multivariate Bernoulli distribution. We show that marginal independence relationships between variables can be specified by setting certain log-mean linear interactions to zero and, more specifically, that graphical models of marginal independence are log-mean linear models. Our approach overcomes some drawbacks of the existing parameterizations of graphical models of marginal independence. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
485
494
http://hdl.handle.net/10.1093/biomet/ass080
application/pdf
Access to full text is restricted to subscribers.
A. Roverato
M. Lupparelli
L. La Rocca
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:473-4842013-08-02RePEc:oup:biomet
article
The role of the range parameter for estimation and prediction in geostatistics
Two canonical problems in geostatistics are estimating the parameters in a specified family of stochastic process models and predicting the process at new locations. We show that asymptotic results for a Gaussian process over a fixed domain with Matérn covariance function, previously proven only in the case of a fixed range parameter, can be extended to the case of jointly estimating the range and the variance of the process. Moreover, we show that intuition and approximations derived from asymptotics using a fixed range parameter can be problematic when applied to finite samples, even for large sample sizes. In contrast, we show via simulation that performance is improved and asymptotic approximations are applicable for smaller sample sizes when the parameters are jointly estimated. These effects are particularly apparent when the process is mean square differentiable or the effective range of spatial correlation is small. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
473
484
http://hdl.handle.net/10.1093/biomet/ass079
application/pdf
Access to full text is restricted to subscribers.
C. G. Kaufman
B. A. Shaby
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:269-2762013-08-02RePEc:oup:biomet
article
Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation
We show that the proportional likelihood ratio model proposed recently by Luo & Tsai (2012) enjoys model-invariant properties under certain forms of nonignorable missing mechanisms and randomly double-truncated data, so that target parameters in the population can be estimated consistently from those biased samples. We also construct an alternative estimator for the target parameters by maximizing a pseudolikelihood that eliminates a functional nuisance parameter in the model. The corresponding estimating equation has a U-statistic structure. As an added advantage of the proposed method, a simple score-type test is developed to test a null hypothesis on the regression coefficients. Simulations show that the proposed estimator has a small-sample efficiency similar to that of the nonparametric likelihood estimator and performs well for certain nonignorable missing data problems. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
269
276
http://hdl.handle.net/10.1093/biomet/ass056
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:431-4452013-08-02RePEc:oup:biomet
article
Simple tiered classifiers
In this paper we propose simple, general tiered classifiers for relatively complex data. Empirical studies on real and simulated data show that three two-tier classifiers, which are respective extensions of linear discriminant analysis, linear logistic regression and support vector machines, can reduce noticeably the relatively high misclassification error of their original single-tier counterparts, without significantly increasing computational labour. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
431
445
http://hdl.handle.net/10.1093/biomet/ass086
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Yingcun Xia
Jing-Hao Xue
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:417-4302013-08-02RePEc:oup:biomet
article
Estimation with missing data: beyond double robustness
We propose an estimator that is more robust than doubly robust estimators, based on weighting complete cases using weights other than inverse probability when estimating the population mean of a response variable subject to ignorable missingness. We allow multiple models for both the propensity score and the outcome regression. Our estimator is consistent if any of the multiple models is correctly specified. Such multiple robustness against model misspecification is a significant improvement over double robustness, which allows only one propensity score model and one outcome regression model. Our estimator attains the semiparametric efficiency bound when one propensity score model and one outcome regression model are correctly specified, without requiring knowledge of which models are correct. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
417
430
http://hdl.handle.net/10.1093/biomet/ass087
application/pdf
Access to full text is restricted to subscribers.
Peisong Han
Lu Wang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:91-1102013-08-02RePEc:oup:biomet
article
Sampling decomposable graphs using a Markov chain on junction trees
Full Bayesian computational inference for model determination in undirected graphical models is currently restricted to decomposable graphs or other special cases, except for small-scale problems, say up to 15 variables. In this paper we develop new, more efficient methodology for such inference, by making two contributions to the computational geometry of decomposable graphs. The first of these provides sufficient conditions under which it is possible to completely connect two disconnected complete subsets of vertices, or perform the reverse procedure, yet maintain decomposability of the graph. The second is a new Markov chainMonte Carlo sampler for arbitrary positive distributions on decomposable graphs, taking a junction tree representing the graph as its state variable. The resulting methodology is illustrated with numerical experiments on three models. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
91
110
http://hdl.handle.net/10.1093/biomet/ass052
application/pdf
Access to full text is restricted to subscribers.
Peter J. Green
Alun Thomas
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:3-152013-08-02RePEc:oup:biomet
article
Karl Pearson's Biometrika: 1901--36
Karl Pearson edited Biometrika for the first 35 years of its existence. Not only did he shape the journal, he also contributed over 200 pieces and inspired, more or less directly, most of the other contributions. The journal could not be separated from the man. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
3
15
http://hdl.handle.net/10.1093/biomet/ass077
application/pdf
Access to full text is restricted to subscribers.
John Aldrich
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:525-5302013-08-02RePEc:oup:biomet
article
Efficient estimation of the censored linear regression model
In linear regression or accelerated failure time models, complications in efficient estimation arise from the multiple roots of the efficient score and density estimation. This paper proposes a one-step efficient estimation method based on a counting process martingale, which has several advantages: it avoids the multiple-root problem, the initial estimator is easily available and the variance estimator can be obtained by employing plug-in rules. A simple and effective data-driven bandwidth selector is provided. The proposed estimator is proved to be semiparametric efficient, with the same asymptotic variance as the efficient estimator when the error distribution is known up to a location shift. Numerical studies with supportive evidence are presented. The proposal is applied to the Colorado Plateau uranium miners data. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
525
530
http://hdl.handle.net/10.1093/biomet/ass073
application/pdf
Access to full text is restricted to subscribers.
Yuanyuan Lin
Kani Chen
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:511-5182013-08-02RePEc:oup:biomet
article
Composite likelihood estimation for the Brown--Resnick process
Genton et al. (2011) investigated the gain in efficiency when triplewise, rather than pairwise, likelihood is used to fit the popular Smith max-stable model for spatial extremes. We generalize their results to the Brown--Resnick model and show that the efficiency gain is substantial only for very smooth processes, which are generally unrealistic in applications. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
511
518
http://hdl.handle.net/10.1093/biomet/ass089
application/pdf
Access to full text is restricted to subscribers.
R. Huser
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:319-3382013-08-02RePEc:oup:biomet
article
Using shared genetic controls in studies of gene-environment interactions
With the advent of modern genomic methods to adjust for population stratification, the use of external or publicly available controls has become an attractive option for reducing the cost of large-scale case-control genetic association studies. In this article, we study the estimation of joint effects of genetic and environmental exposures from a case-control study where data on genome-wide markers are available on the cases and a set of external controls while data on environmental exposures are available on the cases and a set of internal controls. We show that under such a design, one can exploit an assumption of gene-environment independence in the underlying population to estimate the gene-environment joint effects, after adjustment for population stratification. We develop a semiparametric profile likelihood method and related pseudolikelihood and working likelihood methods that are easy to implement in practice. We propose variance estimators for the methods based on asymptotic theory. Simulation is used to study the performance of the methods, and data from a multi-centre genome-wide association study of bladder cancer is further used to illustrate their application. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
319
338
http://hdl.handle.net/10.1093/biomet/ass078
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
Nilanjan Chatterjee
Raymond J. Carroll
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:213-2202013-08-02RePEc:oup:biomet
article
Spatially varying cross-correlation coefficients in the presence of nugget effects
We derive sufficient conditions for the cross-correlation coefficient of a multivariate spatial process to vary with location when the spatial model is augmented with nugget effects. The derived class is valid for any choice of covariance functions, and yields substantial flexibility between multiple processes. The key is to identify the cross-correlation coefficient matrix with a contraction matrix, which can be either diagonal, implying a parsimonious formulation, or a fully general contraction matrix, yielding greater flexibility but added model complexity. We illustrate the approach with a bivariate minimum and maximum temperature dataset in Colorado, allowing the two variables to be positively correlated at low elevations and nearly independent at high elevations, while still yielding a positive definite covariance matrix. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
213
220
http://hdl.handle.net/10.1093/biomet/ass057
application/pdf
Access to full text is restricted to subscribers.
William Kleiber
Marc G. Genton
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:189-2022013-08-02RePEc:oup:biomet
article
Benchmarking small area estimators
This paper considers benchmarking issues in the context of small area estimation. We find optimal estimators within the class of benchmarked linear estimators under linear constraints. This extends existing results for external and internal benchmarking, and also links the two. Necessary and sufficient conditions for self-benchmarking are found for an augmented model. Most results of this paper are found using ideas of orthogonal projection Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
189
202
http://hdl.handle.net/10.1093/biomet/ass063
application/pdf
Access to full text is restricted to subscribers.
W. R. Bell
G. S. Datta
M. Ghosh
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:339-3542013-08-02RePEc:oup:biomet
article
Estimating time-varying effects for overdispersed recurrent events data with treatment switching
In the analysis of multivariate event times, frailty models assuming time-independent regression coefficients are often considered, mainly due to their mathematical convenience. In practice, regression coefficients are often time dependent and the temporal effects are of clinical interest. Motivated by a phase III clinical trial in multiple sclerosis, we develop a semiparametric frailty modelling approach to estimate time-varying effects for overdispersed recurrent events data with treatment switching. The proposed model incorporates the treatment switching time in the time-varying coefficients. Theoretical properties of the proposed model are established and an efficient expectation-maximization algorithm is derived to obtain the maximum likelihood estimates. Simulation studies evaluate the numerical performance of the proposed model under various temporal treatment effect curves. The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching as well as for alternative models when the proportional hazard assumption is violated. A multiple sclerosis dataset is analysed to illustrate our methodology. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
339
354
http://hdl.handle.net/10.1093/biomet/ass091
application/pdf
Access to full text is restricted to subscribers.
Qingxia Chen
Donglin Zeng
Joseph G. Ibrahim
Mouna Akacha
Heinz Schmidli
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:229-2342013-08-02RePEc:oup:biomet
article
The Kolmogorov filter for variable screening in high-dimensional binary classification
Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems, including t-test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods rely on strong modelling assumptions that are easily violated in real applications. To circumvent the parametric modelling assumptions, we propose a new variable screening technique for binary classification based on the Kolmogorov--Smirnov statistic. We prove that this so-called Kolmogorov filter enjoys the sure screening property under much weakened model assumptions. We supplement our theoretical study by a simulation study. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
229
234
http://hdl.handle.net/10.1093/biomet/ass062
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:459-4712013-08-02RePEc:oup:biomet
article
Data augmentation for non-Gaussian regression models using variance-mean mixtures
We use the theory of normal variance-mean mixtures to derive a data-augmentation scheme for a class of common regularization problems. This generalizes existing theory on normal variance mixtures for priors in regression and classification. It also allows variants of the expectation-maximization algorithm to be brought to bear on a wider range of models than previously appreciated. We demonstrate the method on several examples, focusing on the case of binary logistic regression. We also show that quasi-Newton acceleration can substantially improve the speed of the algorithm without compromising its robustness. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/ass081
application/pdf
Access to full text is restricted to subscribers.
N. G. Polson
J. G. Scott
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:1-12013-08-02RePEc:oup:biomet
article
Editorial
1
2013
100
Biometrika
1
1
http://hdl.handle.net/10.1093/biomet/ast003
application/pdf
Access to full text is restricted to subscribers.
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:249-2532013-08-02RePEc:oup:biomet
article
Blocked two-level regular factorial designs with weak minimum aberration
This paper considers the construction of blocked two-level regular designs with weak minimum aberration. We first obtain the minimum value of the number of two-factor interactions which are aliased with the block effects. Based on this result, two methods are then proposed in two different scenarios to construct weak minimum aberration blocked two-level designs with respect to some existing combined wordlength patterns. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
249
253
http://hdl.handle.net/10.1093/biomet/ass061
application/pdf
Access to full text is restricted to subscribers.
Shengli Zhao
Pengfei Li
Rohana Karunamuni
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:241-2482013-08-02RePEc:oup:biomet
article
Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders
Suppose we are interested in the effect of a binary treatment on an outcome where that relationship is confounded by an ordinal confounder. We assume that the true confounder is not observed but, rather, we observe a nondifferentially mismeasured version of it. We show that, under certain monotonicity assumptions about its effect on the treatment and on the outcome, an effect measure controlling for the mismeasured confounder will fall between the corresponding crude and true effect measures. We also present results for coarsened and, under further assumptions, multiple misclassified confounders. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
241
248
http://hdl.handle.net/10.1093/biomet/ass054
application/pdf
Access to full text is restricted to subscribers.
Elizabeth L. Ogburn
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:157-1722013-08-02RePEc:oup:biomet
article
Simultaneous discovery of rare and common segment variants
Copy number variant is an important type of genetic structural variation appearing in germline DNA, ranging from common to rare in a population. Both rare and common copy number variants have been reported to be associated with complex diseases, so it is important to identify both simultaneously based on a large set of population samples. We develop a proportion adaptive segment selection procedure that automatically adjusts to the unknown proportions of the carriers of the segment variants. We characterize the detection boundary that separates the region where a segment variant is detectable by some method from the region where it cannot be detected. Although the detection boundaries are very different for the rare and common segment variants, it is shown that the proposed procedure can reliably identify both whenever they are detectable. Compared with methods for single-sample analysis, this procedure gains power by pooling information from multiple samples. The method is applied to analyse neuroblastoma samples and identifies a large number of copy number variants that are missed by single-sample methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
157
172
http://hdl.handle.net/10.1093/biomet/ass059
application/pdf
Access to full text is restricted to subscribers.
X. Jessie Jeng
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:447-4582013-08-02RePEc:oup:biomet
article
Penalized multivariate Whittle likelihood for power spectrum estimation
Nonparametric estimation procedures that can flexibly account for varying levels of smoothness among different functional parameters, such as penalized likelihoods, have been developed in a variety of settings. However, geometric constraints on power spectra have limited the development of such methods when estimating the power spectrum of a vector-valued time series. This article introduces a penalized likelihood approach to nonparametric multivariate spectral analysis through the minimization of a penalized Whittle negative loglikelihood. This likelihood is derived from the large-sample distribution of the periodogram and includes a penalty function that forms a measure of regularity on multivariate power spectra. The approach allows for varying levels of smoothness among spectral components while accounting for the positive definiteness of spectral matrices and the Hermitian and periodic structures of power spectra as functions of frequency. The consistency of the proposed estimator is derived and its empirical performance is demonstrated in a simulation study and in an analysis of indoor air quality. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
447
458
http://hdl.handle.net/10.1093/biomet/ass088
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
William O. Collinge
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:203-2122013-08-02RePEc:oup:biomet
article
Unified inference for sparse and dense longitudinal models
In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing, the estimate of the mean function, the convergence rates and the limiting variance functions are different in the two scenarios. This phenomenon poses challenges for statistical inference, as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop methods based on self-normalization that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
203
212
http://hdl.handle.net/10.1093/biomet/ass050
application/pdf
Access to full text is restricted to subscribers.
Seonjin Kim
Zhibiao Zhao
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:235-2402013-08-02RePEc:oup:biomet
article
Interval estimation of population means under unknown but bounded probabilities of sample selection
Applying concepts from partial identification to the domain of finite population sampling, we propose a method for interval estimation of a population mean when the probabilities of sample selection lie within a posited interval. The interval estimate is derived from sharp bounds on the Hajek (1971) estimator of the population mean. We demonstrate the method's utility for sensitivity analysis by applying it to a sample of needles collected as part of a syringe tracking and testing programme in New Haven, Connecticut. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
235
240
http://hdl.handle.net/10.1093/biomet/ass064
application/pdf
Access to full text is restricted to subscribers.
Peter M. Aronow
Donald K. K. Lee
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:75-892013-08-02RePEc:oup:biomet
article
Efficient Gaussian process regression for large datasets
Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n-super-3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/ass068
application/pdf
Access to full text is restricted to subscribers.
Anjishnu Banerjee
David B. Dunson
Surya T. Tokdar
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:399-4152013-08-02RePEc:oup:biomet
article
Simple design-efficient calibration estimators for rejective and high-entropy sampling
For survey calibration, consider the situation where the population totals of auxiliary variables are known or where auxiliary variables are measured for all population units. For each situation, we develop design-efficient calibration estimators under rejective or high-entropy sampling. A general approach is to extend efficient estimators for missing-data problems with independent and identically distributed data to the survey setting. We show that this approach effectively resolves two long-standing issues in existing approaches: how to achieve design efficiency regardless of a linear superpopulation model in generalized regression and calibration estimation, and how to find a simple approximation in optimal regression estimation. Moreover, the proposed approach sheds light on several issues that seem not to be well studied in the literature. Examples include use of the weighted Kullback--Leibler distance in calibration estimation, and efficient estimation allowing for misspecification of a nonlinear superpopulation model. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
399
415
http://hdl.handle.net/10.1093/biomet/ass090
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:355-3702013-08-02RePEc:oup:biomet
article
Estimation of a sparse group of sparse vectors
We consider estimating a sparse group of sparse normal mean vectors, based on penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and the numbers of their significant components, which can be performed by a fast algorithm. The resulting estimators are developed within a Bayesian framework and can be viewed as maximum a posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense settings. A simulation study demonstrates the efficiency of the proposed approach, which successfully competes with the sparse group lasso estimator. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/ass082
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:277-2812013-08-02RePEc:oup:biomet
article
Optimal estimation of Poisson intensity with partially observed covariates
Rathbun et al. (2007) and Waagepetersen (2008) propose estimating functions for parameters of Poisson point process intensity that may be applied when space- and/or time-varying covariates are sampled from a probability-based sampling design. This paper demonstrates that Waageptersen's estimating function is optimal in a class of weighted estimating functions. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
277
281
http://hdl.handle.net/10.1093/biomet/ass069
application/pdf
Access to full text is restricted to subscribers.
S. L. Rathbun
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:495-5022013-08-02RePEc:oup:biomet
article
The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing
In single hypothesis testing, power is a nondecreasing function of Type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The optimal power puzzle arises from the fact that for multiple testing under the false discovery rate paradigm, such a monotonic relationship may not hold. In particular, exact false discovery rate control may lead to a less powerful testing procedure if a test statistic fails to fulfil the monotone likelihood ratio condition. In this article, we identify different scenarios wherein the condition fails and give caveats for conducting multiple testing in practical settings. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
495
502
http://hdl.handle.net/10.1093/biomet/ast001
application/pdf
Access to full text is restricted to subscribers.
Hongyuan Cao
Wenguang Sun
Michael R. Kosorok
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:503-5102013-08-02RePEc:oup:biomet
article
A consistent multivariate test of association based on ranks of distances
We consider the problem of detecting associations between random vectors of any dimension. Few tests of independence exist that are consistent against all dependent alternatives. We propose a powerful test that is applicable in all dimensions and consistent against all alternatives. The test has a simple form, is easy to implement, and has good power. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
503
510
http://hdl.handle.net/10.1093/biomet/ass070
application/pdf
Access to full text is restricted to subscribers.
Ruth Heller
Yair Heller
Malka Gorfine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:301-3172013-08-02RePEc:oup:biomet
article
A multiple comparison procedure for hypotheses with gatekeeping structure
We develop gatekeeping procedures that focus on comparing multiple treatments with a control when there are multiple endpoints. Our procedures utilize estimated correlations among individual test statistics without parametric assumptions. We make comparisons with other gatekeeping procedures with respect to properties of the trade-off in statistical power between families of hypotheses. We introduce a reward function to facilitate these comparisons. We illustrate our methods by simulation and an analysis of data from a randomized, multi-armed clinical trial. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
301
317
http://hdl.handle.net/10.1093/biomet/ass083
application/pdf
Access to full text is restricted to subscribers.
Xiaolong Luo
Guang Chen
S. Peter Ouyang
Bruce W. Turnbull
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:125-1382013-08-02RePEc:oup:biomet
article
A nonparametric prior for simultaneous covariance estimation
In the modelling of longitudinal data from several groups, appropriate handling of the dependence structure is of central importance. Standard methods include specifying a single covariance matrix for all groups or independently estimating the covariance matrix for each group without regard to the others, but when these model assumptions are incorrect, these techniques can lead to biased mean effects or loss of efficiency, respectively. Thus, it is desirable to develop methods for simultaneously estimating the covariance matrix for each group that will borrow strength across groups in a way that is ultimately informed by the data. In addition, for several groups with covariance matrices of even medium dimension, it is difficult to manually select a single best parametric model among the huge number of possibilities given by incorporating structural zeros and/or commonality of individual parameters across groups. In this paper we develop a family of nonparametric priors using the matrix stick-breaking process of Dunson et al. (2008) that seeks to accomplish this task by parameterizing the covariance matrices in terms of their modified Cholesky decompositions (Pourahmadi, 1999). We establish some theoretical properties of these priors, examine their effectiveness via a simulation study, and illustrate the priors using data from a longitudinal clinical trial. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
125
138
http://hdl.handle.net/10.1093/biomet/ass060
application/pdf
Access to full text is restricted to subscribers.
Jeremy T. Gaskins
Michael J. Daniels
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:17-732013-08-02RePEc:oup:biomet
article
Biometrika highlights from volume 28 onwards
Highlights, trends and influences are identified associated with the pages of Biometrika subsequent to the editorship of Karl Pearson. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
17
73
http://hdl.handle.net/10.1093/biomet/ass076
application/pdf
Access to full text is restricted to subscribers.
D. M. Titterington
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:254-2602013-08-02RePEc:oup:biomet
article
Strong orthogonal arrays and associated Latin hypercubes for computer experiments
This paper introduces, constructs and studies a new class of arrays, called strong orthogonal arrays, as suitable designs for computer experiments. A strong orthogonal array of strength t enjoys better space-filling properties than a comparable orthogonal array in all dimensions lower than t while retaining the space-filling properties of the latter in t dimensions. Latin hypercubes based on strong orthogonal arrays of strength t are more space-filling than comparable orthogonal array-based Latin hypercubes in all g dimensions for any 2 ≤ g ≤ t - 1. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
254
260
http://hdl.handle.net/10.1093/biomet/ass065
application/pdf
Access to full text is restricted to subscribers.
Yuanzhen He
Boxin Tang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:173-1872013-08-02RePEc:oup:biomet
article
Smoothed nonparametric estimation for current status competing risks data
We study the nonparametric estimation of the cumulative incidence function and the cause-specific hazard function for current status data with competing risks via kernel smoothing. A smoothed naive nonparametric maximum likelihood estimator and a smoothed full nonparametric maximum likelihood estimator are shown to have pointwise asymptotic normality and faster convergence rates than the corresponding unsmoothed nonparametric likelihood estimators. Using the smoothed estimators and the plug-in principle, we can estimate the cause-specific hazard function, which has not been studied previously. We also propose semi-smoothed estimators of the cause-specific hazard as an alternative to the smoothed estimator and demonstrate that neither is uniformly more efficient than the other. Numerical studies show that a smoothed bootstrap method works well for selecting the bandwidths in the smoothed nonparametric estimation. The use of the estimators is exemplified by an application to cumulative incidence and hazard of subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
173
187
http://hdl.handle.net/10.1093/biomet/ass053
application/pdf
Access to full text is restricted to subscribers.
Chenxi Li
Jason P. Fine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:283-3002013-08-02RePEc:oup:biomet
article
Simultaneous confidence intervals uniformly more likely to determine signs
Many studies draw inferences about multiple endpoints but ignore the statistical implications of multiplicity. Effects inferred to be positive when there is no adjustment for multiplicity can lose their statistical significance when multiplicity is taken into account, perhaps explaining why such adjustments are so often omitted. We develop new simultaneous confidence intervals that mitigate this problem; these are uniformly more likely to determine signs than are standard simultaneous confidence intervals. When one or more of the parameter estimates are small, the new intervals sacrifice some length to avoid crossing zero; but when all the parameter estimates are large, the new intervals coincide with standard simultaneous confidence intervals, so there is no loss of precision. When only a small fraction of the estimates are small, the procedure can determine signs essentially as well as one-sided tests with prespecified directions, incurring only a modest penalty in maximum length. The intervals are constructed by inverting level-α tests to form a 1 - α confidence set, and then projecting that set onto the coordinate axes to get confidence intervals. The tests have hyper-rectangular acceptance regions that minimize the maximum amount by which the acceptance region protrudes from the orthant that contains the hypothesized parameter value, subject to a constraint on the maximum side-length of the hyper-rectangle. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
283
300
http://hdl.handle.net/10.1093/biomet/ass074
application/pdf
Access to full text is restricted to subscribers.
Yoav Benjamini
Vered Madar
Philip B. Stark
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:371-3832013-08-02RePEc:oup:biomet
article
Efficiency loss and the linearity condition in dimension reduction
Linearity, sometimes jointly with constant variance, is routinely assumed in the context of sufficient dimension reduction. It is well understood that, when these conditions do not hold, blindly using them may lead to inconsistency in estimating the central subspace and the central mean subspace. Surprisingly, we discover that even if these conditions do hold, using them will bring efficiency loss. This paradoxical phenomenon is illustrated through sliced inverse regression and principal Hessian directions. The efficiency loss also applies to other dimension reduction procedures. We explain this empirical discovery by theoretical investigation. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
371
383
http://hdl.handle.net/10.1093/biomet/ass075
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Liping Zhu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:531-5372013-08-02RePEc:oup:biomet
article
On the likelihood ratio test for envelope models in multivariate linear regression
We investigate the likelihood ratio test for a hypothesis regarding the dimension of the Σ-envelope of span(β) in a multivariate linear regression model. The asymptotic null distribution of the likelihood ratio statistic is obtained as some nuisance parameters approach infinity. A saddlepoint approximation is also given for this limiting distribution. The accuracy of this approximation and its comparison to the standard chi-squared approximation are assessed via simulation. The results can be used in a similar test for partial envelope models. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
531
537
http://hdl.handle.net/10.1093/biomet/ast002
application/pdf
Access to full text is restricted to subscribers.
James R. Schott
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:571-5862013-11-29RePEc:oup:biomet
article
Statistics of orthogonal axial frames
An orthogonal axial frame is a set of orthonormal unit vectors which are known only up to sign. Such frames arise in crystallography and seismology and as principal axes of multivariate data or of some physical tensors. We develop methods for analysing data of this form. A test of uniformity is given. Parametric models for orthogonal axial frames are presented and tests of location are considered. A brief illustrative example involving earthquakes is given. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
571
586
http://hdl.handle.net/10.1093/biomet/ast017
application/pdf
Access to full text is restricted to subscribers.
R. Arnold
P. E. Jupp
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:778-7802013-11-29RePEc:oup:biomet
article
Convergence of Luo and Tsai's iterative algorithm for estimation in proportional likelihood ratio models
Luo & Tsai (2012, Biometrika) introduced the proportional likelihood ratio model. They proposed an iterative algorithm for the estimation of the baseline distribution function but did not establish its convergence. Here we provide a proof of convergence. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
778
780
http://hdl.handle.net/10.1093/biomet/ast019
application/pdf
Access to full text is restricted to subscribers.
O. Davidov
G. Iliopoulos
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:695-7082013-11-29RePEc:oup:biomet
article
More efficient estimators for case-cohort studies
The case-cohort study design, used to reduce costs in large cohort studies, involves a random sample of the entire cohort, called the subcohort, augmented with subjects having the disease of interest but not in the subcohort sample. When several diseases are of interest, multiple case-cohort studies may be conducted using the same subcohort, with each disease analysed separately, ignoring the additional exposure measurements collected on subjects with the other diseases. This is not an efficient use of the data, and in this paper we propose more efficient estimators. We consider both joint and separate analyses for the multiple diseases. We propose an estimating equation approach with a new weight function, and we establish the consistency and asymptotic normality of the resulting estimator. Simulation studies show that the proposed methods using all available information lead to gains in efficiency. We apply our proposed method to data from the Busselton Health Study. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
695
708
http://hdl.handle.net/10.1093/biomet/ast018
application/pdf
Access to full text is restricted to subscribers.
S. Kim
J. Cai
W. Lu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:727-7402013-11-29RePEc:oup:biomet
article
Nonparametric estimation of the mean function for recurrent event data with missing event category
Recurrent event data frequently arise in longitudinal studies when study subjects possibly experience more than one event during the observation period. Often, such recurrent events can be categorized. However, part of the categorization may be missing due to technical difficulties. If the event types are missing completely at random, then a complete case analysis may provide consistent estimates of regression parameters in certain regression models, but estimates of the baseline event rates are generally biased. Previous work on nonparametric estimation of these rates has utilized parametric missingness models. In this paper, we develop fully nonparametric methods in which the missingness mechanism is completely unspecified. Consistency and asymptotic normality of the nonparametric estimators of the mean event functions accommodate nonparametric estimators of the event category probabilities, which converge more slowly than the parametric rate. Plug-in variance estimators are provided and perform well in simulation studies, where complete case estimators may exhibit large biases and parametric estimators generally have a larger mean squared error when the model is misspecified. The proposed methods are applied to data from a cystic fibrosis registry. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/ast016
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Jianwen Cai
Jason P. Fine
Huichuan J. Lai
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:655-6702013-11-29RePEc:oup:biomet
article
High-dimensional semiparametric bigraphical models
In multivariate analysis, a Gaussian bigraphical model is commonly used for modelling matrix-valued data. In this paper, we propose a semiparametric extension of the Gaussian bigraphical model, called the nonparanormal bigraphical model. A projected nonparametric rank-based regularization approach is employed to estimate sparse precision matrices and produce graphs under a penalized likelihood framework. Theoretically, our semiparametric procedure achieves the parametric rates of convergence for both matrix estimation and graph recovery. Empirically, our approach outperforms the parametric Gaussian model for non-Gaussian data and is competitive with its parametric counterpart for Gaussian data. Extensions to the categorical bigraphical model and the missing data problem are discussed. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
655
670
http://hdl.handle.net/10.1093/biomet/ast009
application/pdf
Access to full text is restricted to subscribers.
Yang Ning
Han Liu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:671-6802013-11-29RePEc:oup:biomet
article
Inverse probability weighting with error-prone covariates
Inverse probability-weighted estimators are widely used in applications where data are missing due to nonresponse or censoring and in the estimation of causal effects from observational studies. Current estimators rely on ignorability assumptions for response indicators or treatment assignment and outcomes being conditional on observed covariates which are assumed to be measured without error. However, measurement error is common for the variables collected in many applications. For example, in studies of educational interventions, student achievement as measured by standardized tests is almost always used as the key covariate for removing hidden biases, but standardized test scores may have substantial measurement errors. We provide several expressions for a weighting function that can yield a consistent estimator for population means using incomplete data and covariates measured with error. We propose a method to estimate the weighting function from data. The results of a simulation study show that the estimator is consistent and has no bias and small variance. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
671
680
http://hdl.handle.net/10.1093/biomet/ast022
application/pdf
Access to full text is restricted to subscribers.
Daniel F. McCaffrey
J. R. Lockwood
Claude M. Setodji
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:587-6062013-11-29RePEc:oup:biomet
article
Automatic declustering of rare events
The analysis of events with low probability but disastrous impact entails understanding how they cluster in time. We present an automatic three-step procedure for identifying clusters, estimating the cluster size distribution and constructing confidence intervals for the extremal index, which measures the degree of clustering of rare events. The third step combines empirical likelihood and parametric likelihood approaches. Simulations show that our new procedure performs very well for finite samples and outperforms previous methods in constructing confidence intervals for the extremal index when there is clustering in the data, as well as in estimating probabilities for small clusters. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
587
606
http://hdl.handle.net/10.1093/biomet/ast013
application/pdf
Access to full text is restricted to subscribers.
C. Y. Robert
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:555-5692013-11-29RePEc:oup:biomet
article
A unified approach to robust estimation in finite population sampling
We argue that the conditional bias associated with a sample unit can be a useful measure of influence in finite population sampling. We use the conditional bias to derive robust estimators that are obtained by downweighting the most influential sample units. Under the model-based approach to inference, our proposed robust estimator is closely related to the well-known estimator of Chambers (1986). Under the design-based approach, it possesses the desirable feature of being applicable with most sampling designs used in practice. For stratified simple random sampling, it is essentially equivalent to the estimator of Kokic & Bell (1994). The proposed robust estimator depends on a tuning constant. In this paper, we propose a method for determining the tuning constant and show that the resulting estimator is consistent. Results from a simulation study suggest that our approach improves the efficiency of standard nonrobust estimators when the population contains units that may be influential if selected in the sample. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
555
569
http://hdl.handle.net/10.1093/biomet/ast010
application/pdf
Access to full text is restricted to subscribers.
J.-F. Beaumont
D. Haziza
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:771-7772013-11-29RePEc:oup:biomet
article
Species sampling models: consistency for the number of species
This paper considers species sampling models using constructions that arise from Bayesian nonparametric prior distributions. A discrete random measure, used to generate a species sampling model, can have either a countable infinite number of atoms, which has been the emphasis in the recent literature, or a finite number of atoms K, while allowing K to be assigned a prior probability distribution on the positive integers. It is the latter class of model we consider here, due to the interpretation of K as the number of species. We demonstrate the consistency of the posterior distribution of K as the sample size increases. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
771
777
http://hdl.handle.net/10.1093/biomet/ast006
application/pdf
Access to full text is restricted to subscribers.
P. G. Bissiri
A. Ongaro
S. G. Walker
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:681-6942013-11-29RePEc:oup:biomet
article
Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
A dynamic treatment regime is a list of sequential decision rules for assigning treatment based on a patient's history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignment. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method's performance and robustness to model misspecification, which is a key concern. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
681
694
http://hdl.handle.net/10.1093/biomet/ast014
application/pdf
Access to full text is restricted to subscribers.
Baqun Zhang
Anastasios A. Tsiatis
Eric B. Laber
Marie Davidian
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:539-5532013-11-29RePEc:oup:biomet
article
A general modelling framework for multivariate disease mapping
This paper deals with multivariate disease mapping. We propose a novel framework that encompasses most of the models already proposed. Our framework starts with a simple identity, reformulating Kronecker products of covariance matrices as simple matrix products. This formula is computationally convenient, and its generalizations reproduce most of the proposals in the disease mapping literature. Use of the identity leads to a flexible, general and computationally convenient modelling framework, making it possible to combine spatial dependence structures and different relationships between diseases with limited effort. Moreover, as the proposed modelling framework covers most of the Gaussian Markov random field-based multivariate disease mapping models in the literature, it allows comparison of all these models in a common context, thus helping us to understand them better. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/ast023
application/pdf
Access to full text is restricted to subscribers.
Miguel A. Martinez-Beneito
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:757-7632013-11-29RePEc:oup:biomet
article
Adjusted regression estimation for time-to-event data with differential measurement error
Differential measurement error data plausibly arise in epidemiology and biomedical studies but have been rarely dealt with explicitly, especially for time-to-event data. We propose an estimation equation correction method in semiparametric censored linear regression to deal with differential measurement error for time-to-event data with validation samples. The method does not require explicit modelling of the error-prone covariates and leads to unbiased estimation. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
757
763
http://hdl.handle.net/10.1093/biomet/ast007
application/pdf
Access to full text is restricted to subscribers.
Menggang Yu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:764-7702013-11-29RePEc:oup:biomet
article
Survival analysis without survival data: connecting length-biased and case-control data
We show that relative mean survival parameters of a semiparametric log-linear model can be estimated using covariate data from an incident sample and a prevalent sample, even when there is no prospective follow-up to collect any survival data. Estimation is based on an induced semiparametric density ratio model for covariates from the two samples, and it shares the same structure as for a logistic regression model for case-control data. Likelihood inference coincides with well-established methods for case-control data. We show two further related results. First, estimation of interaction parameters in a survival model can be performed using covariate information only from a prevalent sample, analogous to a case-only analysis. Furthermore, propensity score and conditional exposure effect parameters on survival can be estimated using only covariate data collected from incident and prevalent samples. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
764
770
http://hdl.handle.net/10.1093/biomet/ast008
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:709-7262013-11-29RePEc:oup:biomet
article
Robust analysis of semiparametric renewal process models
A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood, with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated, owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
709
726
http://hdl.handle.net/10.1093/biomet/ast011
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Young K. Truong
Jason P. Fine
oai:RePEc:oup:biomet:v:100:y:2013:i:4:p:1024-10242015-03-30RePEc:oup:biomet
article
'Biometrika highlights from volume 28 onwards'
4
2013
100
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/ast061
application/pdf
Access to full text is restricted to subscribers.
D. M. Titterington
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:65-782013-11-15RePEc:oup:biomet
article
Marginal analyses of longitudinal data with an informative pattern of observations
We consider solutions to generalized estimating equations with singular working correlation matrices, of which the estimator of Diggle et al. (2007) is a special case. We give explicit conditions for consistent estimation when the pattern of observations may be informative. In such cases, simulations reveal reduced bias and reduced mean squared error compared with existing alternatives. A study of peritoneal dialysis is used to illustrate the methodology. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
65
78
http://hdl.handle.net/10.1093/biomet/asp068
application/pdf
Access to full text is restricted to subscribers.
D. M. Farewell
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:31-482013-11-15RePEc:oup:biomet
article
Incorporating prior probabilities into high-dimensional classifiers
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
31
48
http://hdl.handle.net/10.1093/biomet/asp081
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Jing-Hao Xue
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:254-2592013-11-15RePEc:oup:biomet
article
The maximal data piling direction for discrimination
We study a discriminant direction vector that generally exists only in high-dimension, low sample size settings. Projections of data onto this direction vector take on only two distinct values, one for each class. There exist infinitely many such directions in the subspace generated by the data; but the maximal data piling vector has the longest distance between the projections. This paper investigates mathematical properties and classification performance of this discrimination method. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
254
259
http://hdl.handle.net/10.1093/biomet/asp084
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:223-2302013-11-15RePEc:oup:biomet
article
Global and local spectral-based tests for periodicities
We investigate tests for periodicity based on a spectral analysis of a time series, differentiating between global and local spectral-based tests. Global tests use information across the entire frequency band,whereas local tests are based on a window around the test frequency.We show that many spectral-based tests can be expressed in terms of a regression-based F test, which allows for approximate size and power calculations. Since global tests are usually derived assuming white noise errors, we extend to the correlated noise case. We demonstrate via a Monte Carlo study that although the global test may have better size and power, local tests are easier to use, and are comparable or better in terms of the power to detect periodicities, especially for spectra with a large dynamic range. We apply this methodology to a nonbehavioural test of hearing. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
223
230
http://hdl.handle.net/10.1093/biomet/asp079
application/pdf
Access to full text is restricted to subscribers.
L. Wei
P. F. Craigmile
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:209-2142013-11-15RePEc:oup:biomet
article
A note on the sensitivity to assumptions of a generalized linear mixed model
A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
209
214
http://hdl.handle.net/10.1093/biomet/asp083
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
M. Y. Wong
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:199-2082013-11-15RePEc:oup:biomet
article
Forecasting for quantile self-exciting threshold autoregressive time series models
Self-exciting threshold autoregressive time series models have been used extensively, and the conditional mean obtained from these models can be used to predict the future value of a random variable. In this paper we consider quantile forecasts of a time series based on the quantile self-exciting threshold autoregressive time series models proposed by Cai and Stander (2008) and present a new forecasting method for them. Simulation studies and application to real time series show that the method works very well. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
199
208
http://hdl.handle.net/10.1093/biomet/asp070
application/pdf
Access to full text is restricted to subscribers.
Yuzhi Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:123-1322013-11-15RePEc:oup:biomet
article
Sharp bounds on causal effects in case-control and cohort studies
Evaluating the causal effect of an exposure on a response from case-control and cohort studies is a major concern in epidemiological and medical research. Since causal effects are in general nonidentifiable from such studies, this paper derives bounds on two causal measures: the causal risk difference and the causal risk ratio. We use the potential response approach and a linear programming method to derive sharp bounds on the causal risk difference, and a novel fractional programming method to derive bounds on the causal risk ratio. In addition, in the presence of missing data, we consider three different missingness mechanisms and propose sharp bounds under these situations. The results provide new guidance on causal inference in case-control and cohort studies. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
123
132
http://hdl.handle.net/10.1093/biomet/asp076
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
Zhi Geng
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:215-2222013-11-15RePEc:oup:biomet
article
Pseudo-score confidence intervals for parameters in discrete statistical models
We propose pseudo-score confidence intervals for parameters in models for discrete data. The confidence interval is obtained by inverting a test that uses a Pearson chi-squared statistic to compare fitted values for the working model with fitted values of the model when a parameter of interest takes various fixed values. For multinomial models, the pseudo-score method simplifies to the score method when the model is saturated and otherwise it is asymptotically equivalent to score and likelihood ratio test-based inferences. For cases in which ordinary score methods are impractical, such as when the likelihood function is not an explicit function of model parameters, the pseudo-score method is feasible. We illustrate the method for four such examples. Generalizations of the method are also presented for future research, including inference for complex sampling designs using a quasilikelihood Pearson statistic that compares fitted values for two models relative to the variance of the observations under the simpler model. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
215
222
http://hdl.handle.net/10.1093/biomet/asp074
application/pdf
Access to full text is restricted to subscribers.
Alan Agresti
Euijung Ryu
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:238-2452013-11-15RePEc:oup:biomet
article
Nonparametric Bayesian inference for the spectral density function of a random field
A powerful technique for inference concerning spatial dependence in a random field is to use spectral methods based on frequency domain analysis. Here we develop a nonparametric Bayesian approach to statistical inference for the spectral density of a random field. We construct a multi-dimensional Bernstein polynomial prior for the spectral density and devise a Markov chain Monte Carlo algorithm to simulate from the posterior of the spectral density. The posterior sampling enables us to obtain a smoothed estimate of the spectral density as well as credible bands at desired levels. Simulation shows that our proposed method is more robust than a parametric approach. For illustration, we analyse a soil data example. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
238
245
http://hdl.handle.net/10.1093/biomet/asp066
application/pdf
Access to full text is restricted to subscribers.
Yanbing Zheng
Jun Zhu
Anindya Roy
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:159-1702013-11-15RePEc:oup:biomet
article
Mean loglikelihood and higher-order approximations
Higher-order approximations to p-values can be obtained from the loglikelihood function and a reparameterization that can be viewed as a canonical parameter in an exponential family approximation to the model. This approach clarifies the connection between Skovgaard (1996) and Fraser et al. (1999a), and shows that the Skovgaard approximation can be obtained directly using the mean loglikelihood function. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
159
170
http://hdl.handle.net/10.1093/biomet/asq001
application/pdf
Access to full text is restricted to subscribers.
N. Reid
D. A. S. Fraser
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:133-1452013-11-15RePEc:oup:biomet
article
A semiparametric random effects model for multivariate competing risks data
We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
133
145
http://hdl.handle.net/10.1093/biomet/asp082
application/pdf
Access to full text is restricted to subscribers.
Thomas H. Scheike
Yanqing Sun
Mei-Jie Zhang
Tina Kold Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:1-132013-11-15RePEc:oup:biomet
article
Systematic sampling with errors in sample locations
Systematic sampling of points in continuous space is widely used in microscopy and spatial surveys. Classical theory provides asymptotic expressions for the variance of estimators based on systematic sampling as the grid spacing decreases. However, the classical theory assumes that the sample grid is exactly periodic; real physical sampling procedures may introduce errors in the placement of the sample points. This paper studies the effect of errors in sample positioning on the variance of estimators in the case of one-dimensional systematic sampling. First we sketch a general approach to variance analysis using point process methods. We then analyze three different models for the error process, calculate exact expressions for the variances, and derive asymptotic variances. Errors in the placement of sample points can lead to substantial inflation of the variance, dampening of zitterbewegung, that is fluctuation effects, and a slower order of convergence. This suggests that the current practice in some areas of microscopy may be based on over-optimistic predictions of estimator accuracy. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
1
13
http://hdl.handle.net/10.1093/biomet/asp067
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Adrian Baddeley
Karl-Anton Dorph-Petersen
Eva B. Vedel Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:95-1082013-11-15RePEc:oup:biomet
article
On the use of stochastic ordering to test for trend with clustered binary data
We introduce the use of stochastic ordering for defining treatment-related trend in clustered exchangeable binary data for both when cluster sizes are fixed and when they vary randomly. In the latter case, there is a well-documented tendency for such data to be sparse, a problem we address by making an assumption of interpretability or, equivalently, marginal compatibility. Our procedures are based on a representation of the joint distribution of binary exchangeable random variables by a saturated model, and may hence be considered nonparametric. The definition of trend by stochastic ordering is proposed to ensure a flexibility that allows for various forms of monotone increases in response to the cluster as a whole to be included in the evaluation of the trend. We obtain maximum likelihood estimates of probability functions under stochastic ordering using mixture-likelihood-based algorithms. Since the data are sparse, we avoid the use of asymptotic results and obtain p-values of the likelihood ratio procedures by permutation resampling. We demonstrate how the proposed framework can be used in risk assessment, and provide comparisons with existing procedures. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
95
108
http://hdl.handle.net/10.1093/biomet/asp077
application/pdf
Access to full text is restricted to subscribers.
Aniko Szabo
E. Olusegun George
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:79-932013-11-15RePEc:oup:biomet
article
Generalized empirical likelihood methods for analyzing longitudinal data
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
79
93
http://hdl.handle.net/10.1093/biomet/asp073
application/pdf
Access to full text is restricted to subscribers.
Suojin Wang
Lianfen Qian
Raymond J. Carroll
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:109-1212013-11-15RePEc:oup:biomet
article
Stochastic approximation with virtual observations for dose-finding on discrete levels
Phase I clinical studies are experiments in which a new drug is administered to humans to determine the maximum dose that causes toxicity with a target probability. Phase I dose-finding is often formulated as a quantile estimation problem. For studies with a biological endpoint, it is common to define toxicity by dichotomizing the continuous biomarker expression. In this article, we propose a novel variant of the Robbins--Monro stochastic approximation that utilizes the continuous measurements for quantile estimation. The Robbins--Monro method has seldom seen clinical applications, because it does not perform well for quantile estimation with binary data and it works with a continuum of doses that are generally not available in practice. To address these issues, we formulate the dose-finding problem as root-finding for the mean of a continuous variable, for which the stochastic approximation procedure is efficient. To accommodate the use of discrete doses, we introduce the idea of virtual observation that is defined on a continuous dosage range. Our proposed method inherits the convergence properties of the stochastic approximation algorithm and its computational simplicity. Simulations based on real trial data show that our proposed method improves accuracy compared with the continual re-assessment method and produces results robust to model misspecification. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
109
121
http://hdl.handle.net/10.1093/biomet/asp065
application/pdf
Access to full text is restricted to subscribers.
Ying Kuen Cheung
Mitchell S. V. Elkind
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:15-302013-11-15RePEc:oup:biomet
article
Cross-covariance functions for multivariate random fields based on latent dimensions
The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
15
30
http://hdl.handle.net/10.1093/biomet/asp078
application/pdf
Access to full text is restricted to subscribers.
Tatiyana V. Apanasovich
Marc G. Genton
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:246-2532013-11-15RePEc:oup:biomet
article
The distribution-based p-value for the outlier sum in differential gene expression analysis
Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
246
253
http://hdl.handle.net/10.1093/biomet/asp075
application/pdf
Access to full text is restricted to subscribers.
Lin-An Chen
Dung-Tsa Chen
Wenyaw Chan
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:147-1582013-11-15RePEc:oup:biomet
article
Estimation of the retransformed conditional mean in health care cost studies
We propose a new approach for analyzing skewed and heteroscedastic health care cost data through regression of the conditional quantiles of the transformed cost. Using the appealing equivariance property of quantiles to monotone transformations, we propose a distribution-free estimator of the conditional mean cost on the original scale. The proposed method is extended to a two-part heteroscedastic model to account for zero costs commonly seen in health care cost studies. Simulation studies indicate that the proposed estimator has competitive and more robust performance than existing estimators in various heteroscedastic models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
147
158
http://hdl.handle.net/10.1093/biomet/asp072
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Xiao-Hua Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:171-1802013-11-15RePEc:oup:biomet
article
On doubly robust estimation in a semiparametric odds ratio model
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007). Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
171
180
http://hdl.handle.net/10.1093/biomet/asp062
application/pdf
Access to full text is restricted to subscribers.
Eric J. Tchetgen Tchetgen
James M. Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:231-2372013-11-15RePEc:oup:biomet
article
Weighted least squares approximate restricted likelihood estimation for vector autoregressive processes
We derive a weighted least squares approximate restricted likelihood estimator for a k-dimensional pth-order autoregressive model with intercept. Exact likelihood optimization of this model is generally infeasible due to the parameter space, which is complicated and high-dimensional, involving pk-super-2 parameters. The weighted least squares estimator has significantly reduced bias and mean squared error than the ordinary least squares estimator for both stationary and nonstationary processes. Furthermore, at the unit root, the limiting distribution of the weighted least squares approximate restricted likelihood estimator is shown to be the zero-intercept Dickey--Fuller distribution, unlike the ordinary least squares with intercept estimator that has a different distribution with significantly higher bias. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
231
237
http://hdl.handle.net/10.1093/biomet/asp071
application/pdf
Access to full text is restricted to subscribers.
Willa W. Chen
Rohit S. Deo
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:601-6152013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood for proportional hazards models with biased-sampling data
We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
601
615
http://hdl.handle.net/10.1093/biomet/asp026
application/pdf
Access to full text is restricted to subscribers.
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:677-6902013-06-14RePEc:oup:biomet
article
Optimal repeated measurement designs for a model with partial interactions
We consider crossover designs for a model with partial interactions. In this model, the carryover effect depends on whether the treatment is preceded by itself or not. When the aim of the experiment is to study the total effects corresponding to a single treatment, we obtain approximate optimal symmetric designs, within the competing class of circular designs, by generalizing the method introduced by Kushner (1997) and Kunert & Martin (2000). This generalization places the method proposed by Bailey & Druilhet (2004) into Kushner's context. The optimal designs obtained are not binary, as in Kunert & Martin (2000). We also propose efficient designs generated by only one sequence. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
677
690
http://hdl.handle.net/10.1093/biomet/asp034
application/pdf
Access to full text is restricted to subscribers.
P. Druilhet
W. Tinsson
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:723-7342013-06-14RePEc:oup:biomet
article
Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
723
734
http://hdl.handle.net/10.1093/biomet/asp033
application/pdf
Access to full text is restricted to subscribers.
Weihua Cao
Anastasios A. Tsiatis
Marie Davidian
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:497-5122013-06-14RePEc:oup:biomet
article
Objective Bayesian model selection in Gaussian graphical models
This paper presents a default model-selection procedure for Gaussian graphical models that involves two new developments. First, we develop a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyper-inverse Wishart g-prior, and show how it corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors. Second, we apply a class of priors that automatically handles the problem of multiple hypothesis testing. We demonstrate our methods on a variety of simulated examples, concluding with a real example analyzing covariation in mutual-fund returns. These studies reveal that the combined use of a multiplicity-correction prior on graphs and fractional Bayes factors for computing marginal likelihoods yields better performance than existing Bayesian methods. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
497
512
http://hdl.handle.net/10.1093/biomet/asp017
application/pdf
Access to full text is restricted to subscribers.
C. M. Carvalho
J. G. Scott
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:529-5442013-06-14RePEc:oup:biomet
article
Asymptotic properties of penalized spline estimators
We study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines, with knots equal to the data points and a penalty controlling the roughness of the fit. Depending on the number of knots, sample size and penalty, we show that the theoretical properties of penalized regression spline estimators are either similar to those of regression splines or to those of smoothing splines, with a clear breakpoint distinguishing the cases. We prove that using fewer knots results in better asymptotic rates than when using a large number of knots. We obtain expressions for bias and variance and asymptotic rates for the number of knots and penalty parameter. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
529
544
http://hdl.handle.net/10.1093/biomet/asp035
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Tatyana Krivobokova
Jean D. Opsomer
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:751-7602013-06-14RePEc:oup:biomet
article
A Student t-mixture autoregressive model with applications to heavy-tailed financial data
We introduce the class of Student t-mixture autoregressive models, which is promising for financial time series modelling. The model is able to capture serial correlations, time-varying means and volatilities, and the shape of the conditional distributions can be time varied from short-tailed to long-tailed, or from unimodal to multimodal. The use of t-distributed errors in each component of the model allows conditional leptokurtic distributions that account for the commonly observed excess unconditional kurtosis in financial data. Methods of parameter estimation and model selection are given. Finally, the proposed modelling procedure is illustrated through a real example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
751
760
http://hdl.handle.net/10.1093/biomet/asp031
application/pdf
Access to full text is restricted to subscribers.
C. S. Wong
W. S. Chan
P. L. Kam
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:617-6332013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood estimators for the Cox regression model with missing covariates
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
617
633
http://hdl.handle.net/10.1093/biomet/asp027
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
Qiang Xu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:691-7092013-06-14RePEc:oup:biomet
article
Use of functionals in linearization and composite estimation with application to two-sample survey data
An important problem associated with two-sample surveys is the estimation of nonlinear functions of finite population totals such as ratios, correlation coefficients or measures of income inequality. Computation and estimation of the variance of such complex statistics are made more difficult by the existence of overlapping units. In one-sample surveys, the linearization method based on the influence function approach is a powerful tool for variance estimation. We introduce a two-sample linearization technique that can be viewed as a generalization of the one-sample influence function approach. Our technique is based on expressing the parameters of interest as multivariate functionals of finite and discrete measures and then using partial influence functions to compute the linearized variables. Under broad assumptions, the asymptotic variance of the substitution estimator, derived from Deville (1999), is shown to be the variance of a weighted sum of the linearized variables. The paper then focuses on a general class of composite substitution estimators, and from this class the optimal estimator for minimizing the asymptotic variance is obtained. The efficiency of the optimal composite estimator is demonstrated through an empirical study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
691
709
http://hdl.handle.net/10.1093/biomet/asp039
application/pdf
Access to full text is restricted to subscribers.
C. Goga
J.-C. Deville
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:577-5902013-06-14RePEc:oup:biomet
article
Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data
This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton--Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
577
590
http://hdl.handle.net/10.1093/biomet/asp025
application/pdf
Access to full text is restricted to subscribers.
Lynn M. Johnson
Robert L. Strawderman
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:545-5582013-06-14RePEc:oup:biomet
article
Empirical Bayes estimation for additive hazards regression models
We develop a novel empirical Bayesian framework for the semiparametric additive hazards regression model. The integrated likelihood, obtained by integration over the unknown prior of the nonparametric baseline cumulative hazard, can be maximized using standard statistical software. Unlike the corresponding full Bayes method, our empirical Bayes estimators of regression parameters, survival curves and their corresponding standard errors have easily computed closed-form expressions and require no elicitation of hyperparameters of the prior. The method guarantees a monotone estimator of the survival function and accommodates time-varying regression coefficients and covariates. To facilitate frequentist-type inference based on large-sample approximation, we present the asymptotic properties of the semiparametric empirical Bayes estimates. We illustrate the implementation and advantages of our methodology with a reanalysis of a survival dataset and a simulation study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
545
558
http://hdl.handle.net/10.1093/biomet/asp024
application/pdf
Access to full text is restricted to subscribers.
Debajyoti Sinha
M. Brent McHenry
Stuart R. Lipsitz
Malay Ghosh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:635-6442013-06-14RePEc:oup:biomet
article
Approximating the α-permanent
The standard matrix permanent is the solution to a number of combinatorial and graph-theoretic problems, and the α-weighted permanent is the density function for a class of Cox processes called boson processes. The exact computation of the ordinary permanent is known to be #P-complete, and the same appears to be the case for the α-permanent for most values of α. At present, the lack of a satisfactory algorithm for approximating the α-permanent is a formidable obstacle to the use of boson processes in applied work. This paper proposes an importance-sampling estimator using nonuniform random permutations generated in a cycle format. Empirical investigation reveals that the estimator works well for the sorts of matrices that arise in point-process applications, involving up to a few hundred points. We conclude with a numerical illustration of the Bayes estimate of the intensity function of a boson point process, which is a ratio of α-permanents. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
635
644
http://hdl.handle.net/10.1093/biomet/asp036
application/pdf
Access to full text is restricted to subscribers.
S. C. Kou
P. McCullagh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:711-7222013-06-14RePEc:oup:biomet
article
Effects of data dimension on empirical likelihood
We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (2008). Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
711
722
http://hdl.handle.net/10.1093/biomet/asp037
application/pdf
Access to full text is restricted to subscribers.
Song Xi Chen
Liang Peng
Ying-Li Qin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:735-7492013-06-14RePEc:oup:biomet
article
A negative binomial model for time series of counts
We study generalized linear models for time series of counts, where serial dependence is introduced through a dependent latent process in the link function. Conditional on the covariates and the latent process, the observation is modelled by a negative binomial distribution. To estimate the regression coefficients, we maximize the pseudolikelihood that is based on a generalized linear model with the latent process suppressed. We show the consistency and asymptotic normality of the generalized linear model estimator when the latent process is a stationary strongly mixing process. We extend the asymptotic results to generalized linear models for time series, where the observation variable, conditional on covariates and a latent process, is assumed to have a distribution from a one-parameter exponential family. Thus, we unify in a common framework the results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
735
749
http://hdl.handle.net/10.1093/biomet/asp029
application/pdf
Access to full text is restricted to subscribers.
Richard A. Davis
Rongning Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:663-6762013-06-14RePEc:oup:biomet
article
Gaussian process emulation of dynamic computer codes
Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. The approach is less straightforward for dynamic codes, which represent time-evolving systems. We develop a novel iterative system to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
663
676
http://hdl.handle.net/10.1093/biomet/asp028
application/pdf
Access to full text is restricted to subscribers.
S. Conti
J. P. Gosling
J. E. Oakley
A. O'Hagan
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:559-5752013-06-14RePEc:oup:biomet
article
Improving point and interval estimators of monotone functions by rearrangement
Suppose that a target function is monotonic and an available original estimate of this target function is not monotonic. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original confidence interval, which covers the target function with probability at least 1-α, is defined by an upper and lower endpoint functions that are not monotonic. Then the rearranged confidence interval, defined by the rearranged upper and lower endpoint functions, is monotonic, shorter in length in common norms than the original interval, and covers the target function with probability at least 1-α. We illustrate the results with a growth chart example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
559
575
http://hdl.handle.net/10.1093/biomet/asp030
application/pdf
Access to full text is restricted to subscribers.
V. Chernozhukov
I. Fernández-Val
A. Galichon
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:273-2902015-03-25RePEc:oup:biomet
article
Sample size and power analysis for sparse signal recovery in genome-wide association studies
Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondisease-associated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
273
290
http://hdl.handle.net/10.1093/biomet/asr003
application/pdf
Access to full text is restricted to subscribers.
Jichun Xie
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:243-2502015-03-25RePEc:oup:biomet
article
Testing a linear time series model against its threshold extension
This paper derives the asymptotic null distribution of a quasilikelihood ratio test statistic for an autoregressive moving average model against its threshold extension. The null hypothesis is that of no threshold, and the error term could be dependent. The asymptotic distribution is rather complicated, and all existing methods for approximating a distribution in the related literature fail to work. Hence, a novel bootstrap approximation based on stochastic permutation is proposed in this paper. Besides being robust to the assumptions on the error term, our method enjoys more flexibility and needs less computation when compared with methods currently used in the literature. Monte Carlo experiments give further support to the new approach, and an illustration is reported. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
243
250
http://hdl.handle.net/10.1093/biomet/asq074
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:489-4942015-03-25RePEc:oup:biomet
article
The dimple in Gneiting's spatial-temporal covariance model
Gneiting (2002) proposed a nonseparable covariance model for spatial-temporal data. In the present paper we show that in certain circumstances his model possesses a counterintuitive dimple. In some cases, the magnitude of the dimple can be nontrivial. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
489
494
http://hdl.handle.net/10.1093/biomet/asr006
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Mohsen Mohammadzadeh
Ali M. Mosammam
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:355-3702015-03-25RePEc:oup:biomet
article
Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation
For longitudinal data, when the within-subject covariance is misspecified, the semiparametric regression estimator may be inefficient. We propose a method that combines the efficient semiparametric estimator with nonparametric covariance estimation, and is robust against misspecification of covariance models. We show that kernel covariance estimation provides uniformly consistent estimators for the within-subject covariance matrices, and the semiparametric profile estimator with substituted nonparametric covariance is still semiparametrically efficient. The finite sample performance of the proposed estimator is illustrated by simulation. In an application to CD4 count data from an AIDS clinical trial, we extend the proposed method to a functional analysis of the covariance model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/asq080
application/pdf
Access to full text is restricted to subscribers.
Yehua Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:481-4882015-03-25RePEc:oup:biomet
article
On the likelihood function of Gaussian max-stable processes
We derive a closed form expression for the likelihood function of a Gaussian max-stable process indexed by ℝ-super-d at p≤d+1 sites, d≥1. We demonstrate the gain in efficiency in the maximum composite likelihood estimators of the covariance matrix from p=2 to p=3 sites in ℝ-super-2 by means of a Monte Carlo simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asr020
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Yanyuan Ma
Huiyan Sang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:997-10012015-03-25RePEc:oup:biomet
article
A note on overadjustment in inverse probability weighted estimation
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asq049
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Lingling Li
Xiaochun Li
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:851-8652015-03-25RePEc:oup:biomet
article
Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback--Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
851
865
http://hdl.handle.net/10.1093/biomet/asq044
application/pdf
Access to full text is restricted to subscribers.
Abhishek Bhattacharya
David B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:237-2422015-03-25RePEc:oup:biomet
article
Recapture models under equality constraints for the conditional capture probabilities
We introduce a general class of capture-recapture models in which capture probabilities depend on capture history. We discuss constrained versions of the saturated model based on equality constraints. Inference can be performed through a simple estimating equation. The approach is illustrated on a dataset concerning Great Copper butterflies in Willamette Valley of Oregon. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asq068
application/pdf
Access to full text is restricted to subscribers.
A. Farcomeni
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:147-1622015-03-25RePEc:oup:biomet
article
Estimation of covariate effects in generalized linear mixed models with informative cluster sizes
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
147
162
http://hdl.handle.net/10.1093/biomet/asq066
application/pdf
Access to full text is restricted to subscribers.
John M. Neuhaus
Charles E. McCulloch
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:81-902015-03-25RePEc:oup:biomet
article
A self-normalized confidence interval for the mean of a class of nonstationary processes
We construct an asymptotic confidence interval for the mean of a class of nonstationary processes with constant mean and time-varying variances. Due to the large number of unknown parameters, traditional approaches based on consistent estimation of the limiting variance of sample mean through moving block or non-overlapping block methods are not applicable. Under a block-wise asymptotically equal cumulative variance assumption, we propose a self-normalized confidence interval that is robust against the nonstationarity and dependence structure of the data. We also apply the same idea to construct an asymptotic confidence interval for the mean difference of nonstationary processes with piecewise constant means. The proposed methods are illustrated through simulations and an application to global temperature series. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
81
90
http://hdl.handle.net/10.1093/biomet/asq076
application/pdf
Access to full text is restricted to subscribers.
Zhibiao Zhao
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:905-9202015-03-25RePEc:oup:biomet
article
Penalized high-dimensional empirical likelihood
We propose penalized empirical likelihood for parameter estimation and variable selection for problems with diverging numbers of parameters. Our results are demonstrated for estimating the mean vector in multivariate analysis and regression coefficients in linear models. By using an appropriate penalty function, we showthat penalized empirical likelihood has the oracle property. That is, with probability tending to 1, penalized empirical likelihood identifies the true model and estimates the nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. The advantage of penalized empirical likelihood as a nonparametric likelihood approach is illustrated by testing hypotheses and constructing confidence regions. Numerical simulations confirm our theoretical findings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
905
920
http://hdl.handle.net/10.1093/biomet/asq057
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1013-10132015-03-25RePEc:oup:biomet
article
Amendments and Corrections
4
2010
97
Biometrika
1013
1013
http://hdl.handle.net/10.1093/biomet/asq052
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:867-8802015-03-25RePEc:oup:biomet
article
A weighted estimating equation approach for inhomogeneous spatial point processes
We introduce a new estimation method for parametric intensity function models of inhomogeneous spatial point processes based on weighted estimating equations. The weights can incorporate information on both inhomogeneity and dependence of the process. Simulations show that significant efficiency gains can be achieved for non-Poisson processes, compared to the Poisson maximum likelihood estimator. An application to tropical forest data illustrates the use of the proposed method. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
867
880
http://hdl.handle.net/10.1093/biomet/asq043
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
Ye Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:825-8382015-03-25RePEc:oup:biomet
article
Noncrossing quantile regression curve estimation
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
825
838
http://hdl.handle.net/10.1093/biomet/asq048
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
Brian J. Reich
Huixia Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:49-632015-03-25RePEc:oup:biomet
article
Bootstrap inference for mean reflection shape and size-and-shape with three-dimensional landmark data
Working within the framework of a multi-dimensional scaling approach to shape analysis, we develop bootstrap methods for inference about mean reflection shape and size-and-shape based on labelled landmark data. The approach is developed in general dimensions though we focus on the three-dimensional case. We consider two pivotal statistics which we use to construct bootstrap confidence regions for the mean reflection shape or size-and-shape, and present simulation results which show that these statistics perform well in a variety of examples. We also suggest regularized versions of the test statistics that are suitable for more challenging cases where sample size is not sufficiently large in relation to the number of landmarks and present numerical results confirming that regularization indeed leads to better performance. An algorithm for producing a graphical representation of the confidence region for the mean reflection shape is presented and applied in an example involving molecular dynamics simulation data. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
49
63
http://hdl.handle.net/10.1093/biomet/asq065
application/pdf
Access to full text is restricted to subscribers.
S. P. Preston
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:231-2362015-03-25RePEc:oup:biomet
article
A novel reversible jump algorithm for generalized linear models
We propose a novel methodology to construct proposal densities in reversible jump algorithms that obtain samples from parameter subspaces of competing generalized linear models with differing dimensions. The derived proposal densities are not restricted to moves between nested models and are applicable even to models that share no common parameters. We illustrate our methodology on competing logistic regression and log-linear graphical models, demonstrating how our suggested proposal densities, together with the resulting freedom to propose moves between any models, improve the performance of the reversible jump algorithm. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
231
236
http://hdl.handle.net/10.1093/biomet/asq071
application/pdf
Access to full text is restricted to subscribers.
M. Papathomas
P. Dellaportas
V. G. S. Vasdekis
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:921-9342015-03-25RePEc:oup:biomet
article
Estimation of controlled direct effects on a dichotomous outcome using logistic structural direct effect models
We consider the problem of assessing whether an exposure affects a dichotomous outcome other than by modifying a given mediator. The standard approach, logistic regression adjusting for both exposure and the mediator, is known to be biased in the presence of confounders for the mediator-outcome relationship. Because additional regression adjustment for such confounders is only justified when they are not affected by the exposure, inverse probability weighting has been advocated, but is not ideally tailored to mediators that are continuous or have strong measured predictors. We overcome this limitation by developing inference for a novel class of causal models that are closely related to Robins' logistic structural direct effect models, but do not inherit their difficulties of estimation. We study identification and efficient estimation under the assumption that all confounders for the exposure-outcome and mediator-outcome relationships have been measured, and find adequate performance in simulation studies. We discuss extensions to case-control studies and relevant implications for the generic problem of adjustment for time-varying confounding. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
921
934
http://hdl.handle.net/10.1093/biomet/asq053
application/pdf
Access to full text is restricted to subscribers.
Stijn Vansteelandt
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:391-4012015-03-25RePEc:oup:biomet
article
The union closure method for testing a fixed sequence of families of hypotheses
Statistical analyses often involve testing multiple hypotheses that are naturally grouped into a fixed sequence of families. An effective approach to control the familywise error rate is to prioritize the importance of prespecification in the testing order. A gatekeeping testing procedure examines the first family with no multiple adjustment and then examines the subsequent family depending on the decision made with respect to the previous one. In this paper, we describe the union closure method that can be used to design gatekeeping procedures. A bipolar disorder trial with three primary and two secondary outcomes is presented as an example. Power comparisons based on the bipolar disorder trial show that the proposed gatekeeping procedures under the union closure framework are more powerful than competing methods. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
391
401
http://hdl.handle.net/10.1093/biomet/asr015
application/pdf
Access to full text is restricted to subscribers.
Han-Joo Kim
A. Richard Entsuah
Justine Shults
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:433-4482015-03-25RePEc:oup:biomet
article
Maximum likelihood estimation of a generalized threshold stochastic regression model
There is hardly any literature on modelling nonlinear dynamic relations involving nonnormal time series data. This is a serious lacuna because nonnormal data are far more abundant than normal ones, for example, time series of counts and positive time series. While there are various forms of nonlinearities, the class of piecewise-linear models is particularly appealing for its relative ease of tractability and interpretation. We propose to study the generalized threshold model which specifies that the conditional probability distribution of the response variable belongs to an exponential family, and the conditional mean response is linked to some piecewise-linear stochastic regression function. We introduce a likelihood-based estimation scheme, and the consistency and limiting distribution of the maximum likelihood estimator are derived. We illustrate the proposed approach with an analysis of a hare abundance time series, which gives new insights on how phase-dependent predator-prey-climate interactions shaped the ten-year hare population cycle. A simulation study is conducted to examine the finite-sample performance of the proposed estimation method. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
433
448
http://hdl.handle.net/10.1093/biomet/asr008
application/pdf
Access to full text is restricted to subscribers.
Noelle I. Samia
Kung-Sik Chan
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:119-1322015-03-25RePEc:oup:biomet
article
Parametric fractional imputation for missing data analysis
Parametric fractional imputation is proposed as a general tool for missing data analysis. Using fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Computational efficiency can be achieved using the idea of importance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also provides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is discussed and results from a limited simulation study are presented. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asq073
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:107-1182015-03-25RePEc:oup:biomet
article
Horvitz--Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove, under mild regularity conditions, that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional central limit theorem and obtain asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule, considering a mean variance criterion. These techniques are illustrated by a test population of N=18 902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared with simple random sampling without replacement. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
107
118
http://hdl.handle.net/10.1093/biomet/asq070
application/pdf
Access to full text is restricted to subscribers.
Hervé Cardot
Etienne Josserand
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:839-8502015-03-25RePEc:oup:biomet
article
Censored quantile regression with partially functional effects
Quantile regression offers a flexible approach to analyzing survival data, allowing each covariate effect to vary with quantiles. In practice, constancy is often found to be adequate for some covariates. In this paper, we study censored quantile regression tailored to the partially functional effect setting with a mixture of varying and constant effects. Such a model can offer a simpler view regarding covariate-survival association and, moreover, can enable improvement in estimation efficiency. We propose profile estimating equations and present an iterative algorithm that can be readily and stably implemented. Asymptotic properties of the resultant estimators are established. A simple resampling-based inference procedure is developed and justified. Extensive simulation studies demonstrate efficiency gains of the proposed method over a naive two-stage procedure. The proposed method is illustrated via an application to a recent renal dialysis study. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
839
850
http://hdl.handle.net/10.1093/biomet/asq050
application/pdf
Access to full text is restricted to subscribers.
Jing Qian
Limin Peng
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:215-2242015-03-25RePEc:oup:biomet
article
Assessing the validity of weighted generalized estimating equations
The inverse probability weighted generalized estimating equations approach (Robins et al. 1994; Robins et al. 1995), effectively removes bias and provides valid statistical inference for regression parameter estimation in marginal models when longitudinal data contain missing values. The validity of the weighted generalized estimating equations regarding consistent estimation depends on whether the underlying missing data process is properly modelled. However, there is little work available to examine whether or not this condition holds. In this paper we propose a test constructed from two sets of estimating equations: one set is known to be unbiased, but the other set is not known. We utilize the quadratic inference function (Qu et al. 2000) method to assess their compatibility, which is equivalent to testing for the validity of the weighted generalized estimating equations approach. We conduct simulation studies to assess the performance of the proposed method. The test procedure is illustrated through a real data example. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
215
224
http://hdl.handle.net/10.1093/biomet/asq078
application/pdf
Access to full text is restricted to subscribers.
A. Qu
G. Y. Yi
P. X.-K. Song
P. Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:177-1862015-03-25RePEc:oup:biomet
article
Nonparametric estimation for length-biased and right-censored data
This paper considers survival data arising from length-biased sampling, where the survival times are left truncated by uniformly distributed random truncation times. We propose a nonparametric estimator that incorporates the information about the length-biased sampling scheme. The new estimator retains the simplicity of the truncation product-limit estimator with a closed-form expression, and has a small efficiency loss compared with the nonparametric maximum likelihood estimator, which requires an iterative algorithm. Moreover, the asymptotic variance of the proposed estimator has a closed form, and a variance estimator is easily obtained by plug-in methods. Numerical simulation studies with practical sample sizes are conducted to compare the performance of the proposed method with its competitors. A data analysis of the Canadian Study of Health and Aging is conducted to illustrate the methods and theory. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
177
186
http://hdl.handle.net/10.1093/biomet/asq069
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:163-1752015-03-25RePEc:oup:biomet
article
A unified framework for studying parameter identifiability and estimation in biased sampling designs
Based on the odds ratio representation of a joint density, we propose a unified framework to study parameter identifiability in biased sampling designs. It is shown that most of these designs encountered in practice can be reformulated within the proposed framework and, as a result, the question of parameter identifiability can be largely clarified. Estimation of the identifiable parameters is considered and traditional results on the equivalence of the prospective and retrospective likelihoods are extended. Information contained in data on certain identifiable parameters is often very limited. Such parameters can be poorly estimated by the likelihood approach with practically attainable sample sizes, which can substantially affect the estimates of parameters of primary interest. A partially penalized likelihood approach is proposed to address this. Simulation results suggest that the proposed approach has good performance. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
163
175
http://hdl.handle.net/10.1093/biomet/asq059
application/pdf
Access to full text is restricted to subscribers.
Hua Yun Chen
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:199-2142015-03-25RePEc:oup:biomet
article
The effect of correlation in false discovery rate estimation
The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
199
214
http://hdl.handle.net/10.1093/biomet/asq075
application/pdf
Access to full text is restricted to subscribers.
Armin Schwartzman
Xihong Lin
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:251-2712015-03-25RePEc:oup:biomet
article
False discovery rates and copy number variation
Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumour genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than a million chromosome locations and for hundreds of subjects. This leads to massive data sets and complicated inference problems concerning which locations are more likely to vary. In this paper we consider a relatively simple false discovery rate approach to copy number analysis. More careful parametric change-point methods can then be focused on promising regions of the genome. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
251
271
http://hdl.handle.net/10.1093/biomet/asr018
application/pdf
Access to full text is restricted to subscribers.
Bradley Efron
Nancy R. Zhang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:473-4802015-03-25RePEc:oup:biomet
article
Empirical likelihood for small area estimation
Current methodologies in small area estimation are mostly either parametric or heavily dependent on the assumed linearity of the estimators of the small area means. We discuss an alternative empirical likelihood-based Bayesian approach, which neither requires a parametric likelihood nor assumes linearity of the estimators, and can handle both discrete and continuous data in a unified manner. Empirical likelihoods for both area- and unit-level models are introduced. We discuss the suitability of the proposed likelihoods in Bayesian inference and illustrate their performances on a real dataset and a simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr004
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Malay Ghosh
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:947-9602015-03-25RePEc:oup:biomet
article
Enhancing the sample average approximation method with U designs
Many computational problems in statistics can be cast as stochastic programs that are optimization problems whose objective functions are multi-dimensional integrals. The sample average approximation method is widely used for solving such a problem, which first constructs a sampling-based approximation to the objective function and then finds the solution to the approximated problem. Independent and identically distributed sampling is a prevailing choice for constructing such approximations. Recently it was found that the use of Latin hypercube designs can improve sample average approximations. In computer experiments, U designs are known to possess better space-filling properties than Latin hypercube designs. Inspired by this fact, we propose to use U designs to further enhance the accuracy of the sample average approximation method. Theoretical results are derived to show that sample average approximations with U designs can significantly outperform those with Latin hypercube designs. Numerical examples are provided to corroborate the developed theoretical results. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asq046
application/pdf
Access to full text is restricted to subscribers.
Qi Tang
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:990-9962015-03-25RePEc:oup:biomet
article
On the equivalence of prospective and retrospective likelihood methods in case-control studies
We present new approaches to analyzing case-control studies using prospective likelihood methods. In the classical framework, we extend the equality of the profile likelihoods to the Barndorff-Nielsen modified profile likelihoods for prospective and retrospective models. This enables simple and accurate approximate conditional inference for stratified case-control studies of moderate stratum size. In the Bayesian framework, we provide sufficient conditions on priors for the prospective model parameters to yield a prospective marginal posterior density equal to its retrospective counterpart. Our results extend the prospective-retrospective equivalence in the Bayesian paradigm with a more general class of priors than has previously been investigated. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
990
996
http://hdl.handle.net/10.1093/biomet/asq054
application/pdf
Access to full text is restricted to subscribers.
Ana-Maria Staicu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:371-3802015-03-25RePEc:oup:biomet
article
Sure independence screening and compressed random sensing
Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness structure of random sensing matrices to greatly boost computation speed. When using sub-Gaussian sensing matrices, which include the Gaussian and Bernoulli sensing matrices as special cases, our proposal has the exact recovery property with overwhelming probability. We also consider sparse recovery with noise and explicitly reveal the impact of noise-to-signal ratio on the probability of sure screening. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asr010
application/pdf
Access to full text is restricted to subscribers.
Lingzhou Xue
Hui Zou
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:325-3402015-03-25RePEc:oup:biomet
article
Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
325
340
http://hdl.handle.net/10.1093/biomet/asq083
application/pdf
Access to full text is restricted to subscribers.
M. H. Maathuis
M. G. Hudgens
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:381-3902015-03-25RePEc:oup:biomet
article
Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control
Testing a low-dimensional null hypothesis against a high-dimensional alternative in a generalized linear model may lead to a test statistic that is a quadratic form in the residuals under the null model. Using asymptotic arguments, we show that the distribution of such a test statistic can be approximated by a ratio of quadratic forms in normal variables, for which algorithms are readily available. For generalized linear models, the asymptotic distribution shows good control of type I error for moderate to small samples, even when the number of covariates in the model far exceeds the sample size. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
381
390
http://hdl.handle.net/10.1093/biomet/asr016
application/pdf
Access to full text is restricted to subscribers.
Jelle J. Goeman
Hans C. van Houwelingen
Livio Finos
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:495-5012015-03-25RePEc:oup:biomet
article
An Akaike-type information criterion for model selection under inequality constraints
The Akaike information criterion for model selection presupposes that the parameter space is not subject to order restrictions or inequality constraints. Anraku (1999) proposed a modified version of this criterion, called the order-restricted information criterion, for model selection in the one-way analysis of variance model when the population means are monotonic. We propose a generalization of this to the case when the population means may be restricted by a mixture of linear equality and inequality constraints. If the model has no inequality constraints, then the generalized order-restricted information criterion coincides with the Akaike information criterion. Thus, the former extends the applicability of the latter to model selection in multi-way analysis of variance models when some models may have inequality constraints while others may not. Simulation shows that the information criterion proposed in this paper performs well in selecting the correct model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
495
501
http://hdl.handle.net/10.1093/biomet/asr002
application/pdf
Access to full text is restricted to subscribers.
R. M. Kuiper
H. Hoijtink
M. J. Silvapulle
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:935-9462015-03-25RePEc:oup:biomet
article
Compound optimal allocation for individual and collective ethics in binary clinical trials
In recent years, several authors have investigated response-adaptive allocation rules for comparative clinical trials, in order to favour, at each stage of the trial, the treatment that appears to be best. In this paper, we define admissible allocations, namely treatment assignments that cannot be simultaneously improved upon with respect to both a specific design criterion, reflecting the inferential properties of the experiment, and the proportion of patients assigned to the best treatment or treatments; we survey existing designs from this viewpoint. We also suggest combining information and ethical considerations by taking a suitable weighted mean of two corresponding standardized criteria, with weights that depend on the actual treatment effects. This compound criterion leads to a locally optimal allocation that can be targeted by some response-adaptive randomization rule. The paper mainly deals with the case of two treatments, but the suggested methodology is shown to extend to more than two. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
935
946
http://hdl.handle.net/10.1093/biomet/asq055
application/pdf
Access to full text is restricted to subscribers.
Alessandro Baldi Antognini
Alessandra Giovagnoli
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1002-10052015-03-25RePEc:oup:biomet
article
Parameter redundancy with covariates
We show how to determine the parameter redundancy status of a model with covariates from that of the same model without covariates, thereby simplifying the calculation considerably. A matrix decomposition is necessary to ensure that the symbolic computation computer programmes return correct results. The paper is illustrated by mark-recovery and latent-class models, with associated Maple code. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asq041
application/pdf
Access to full text is restricted to subscribers.
Diana J. Cole
Byron J. T. Morgan
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:985-9892015-03-25RePEc:oup:biomet
article
Some insights into continuum regression and its asymptotic properties
Continuum regression encompasses ordinary least squares regression, partial least squares regression and principal component regression under the same umbrella using a nonnegative parameter Gamma. However, there seems to be no literature discussing the asymptotic properties for arbitrary continuum regression parameter Gamma. This article establishes a relation between continuum regression and sufficient dimension reduction and studies the asymptotic properties of continuum regression for arbitrary Gamma under inverse regression models. Theoretical and simulation results show that the continuum seems unnecessary when the conditional distribution of the predictors given the response follows the multivariate normal distribution. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
985
989
http://hdl.handle.net/10.1093/biomet/asq024
application/pdf
Access to full text is restricted to subscribers.
Xin Chen
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:893-9042015-03-25RePEc:oup:biomet
article
Consistent selection of the number of clusters via crossvalidation
In cluster analysis, one of the major challenges is to estimate the number of clusters. Most existing approaches attempt to minimize some distance-based dissimilarity measure within clusters. This article proposes a novel selection criterion that is applicable to all kinds of clustering algorithms, including distance based or non-distance based algorithms. The key idea is to select the number of clusters that minimizes the algorithm's instability, which measures the robustness of any given clustering algorithm against the randomness in sampling.Anovel estimation scheme for clustering instability is developed based on crossvalidation. The proposed selection criterion's effectiveness is demonstrated on a variety of numerical experiments, and its asymptotic selection consistency is established when the dataset is properly split. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asq061
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:291-3062015-03-25RePEc:oup:biomet
article
Sparse Bayesian infinite factor models
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
291
306
http://hdl.handle.net/10.1093/biomet/asr013
application/pdf
Access to full text is restricted to subscribers.
A. Bhattacharya
D. B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:1-152015-03-25RePEc:oup:biomet
article
Joint estimation of multiple graphical models
Gaussian graphical models explore dependence relationships between random variables, through the estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method that jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is included. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/asq060
application/pdf
Access to full text is restricted to subscribers.
Jian Guo
Elizaveta Levina
George Michailidis
Ji Zhu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:341-3542015-03-25RePEc:oup:biomet
article
Time-dependent cross ratio estimation for bivariate failure times
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox's partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
341
354
http://hdl.handle.net/10.1093/biomet/asr005
application/pdf
Access to full text is restricted to subscribers.
Tianle Hu
Bin Nan
Xihong Lin
James M. Robins
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:187-1982015-03-25RePEc:oup:biomet
article
Variance estimation for generalized Cavalieri estimators
The precision of stereological estimators based on systematic sampling is of great practical importance. This paper presents methods of data-based variance estimation for generalized Cavalieri estimators where errors in sampling positions may occur. Variance estimators are derived under perturbed systematic sampling, systematic sampling with cumulative errors and systematic sampling with random dropouts. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
187
198
http://hdl.handle.net/10.1093/biomet/asq064
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Eva B. Vedel Jensen
Karl-Anton Dorph-Petersen
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:459-4712015-03-25RePEc:oup:biomet
article
On balanced random imputation in surveys
Random imputation methods are often used in practice because they tend to preserve the distribution of the variable being imputed, which is an important property when the goal is to estimate population quantiles. However, this type of imputation method introduces additional variability, the imputation variance, due to the random selection of residuals. In this paper, we propose a class of random balanced imputation methods under which the imputation variance is eliminated while the distribution of the variable being imputed is preserved. The rationale behind balanced imputation is to select residuals at random so that appropriate constraints are satisfied. We describe an algorithm for selecting the random residuals that can be viewed as an adaptation of the cube algorithm proposed in the context of balanced sampling (Deville & Tille, 2004). Results of a simulation study support our findings. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/asr011
application/pdf
Access to full text is restricted to subscribers.
G. Chauvet
J.-C. Deville
D. Haziza
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:881-8922015-03-25RePEc:oup:biomet
article
Bootstrap confidence intervals and hypothesis tests for extrema of parameters
The bootstrap provides effective and accurate methodology for a wide variety of statistical problems which might not otherwise enjoy practicable solutions. However, there still exist important problems where standard bootstrap estimators are not consistent, and where alternative approaches, for example the m-out-of-n bootstrap and asymptotic methods, also face significant challenges. One of these is the problem of constructing confidence intervals or hypothesis tests for extrema of parameters, for example for the maximum of p parameters where each has to be estimated from data. In the present paper we suggest approaches to solving this problem. We use the bootstrap to construct an accurate estimator of the joint distribution of centred parameter estimators, and we base the procedure, either a confidence interval or a hypothesis test, on that distribution estimator. Our methodology is designed so that it errs on the side of conservatism, modulo the small inaccuracy of the bootstrap step. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
881
892
http://hdl.handle.net/10.1093/biomet/asq045
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Hugh Miller
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:417-4312015-03-25RePEc:oup:biomet
article
Distribution estimators and confidence intervals for stereological volumes
Assessing the precision of volume estimates from systematic samples is a question of great practical importance, but statistically a challenging task due to the strong spatial dependence of the data and typically small sample sizes. The approach taken in this paper is more ambitious than earlier methodologies, the goal of which was estimation of the variance of a volume estimator v̂, rather than estimation of the distribution of v̂. We shall show that bootstrap methods yield consistent estimators of the distribution of v̂, and also suggest a variety of confidence intervals for the true volume. Our new methodology covers cases where serial sections are exactly periodic, as well as instances where the physical slicing procedure introduces errors in the placement of the sampling points. Measurement errors within sections are also taken into account. The performance of the method is illustrated by a simulation study with synthetic data, and also applied to real datasets. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
417
431
http://hdl.handle.net/10.1093/biomet/asr012
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Johanna Ziegel
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:307-3232015-03-25RePEc:oup:biomet
article
Bayesian influence analysis: a geometric approach
In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
307
323
http://hdl.handle.net/10.1093/biomet/asr009
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:961-9682015-03-25RePEc:oup:biomet
article
Probability-based Latin hypercube designs for slid-rectangular regions
Existing space-filling designs are based on the assumption that the experimental region is rectangular, while in practice this assumption can be violated. Motivated by a data centre thermal management study, a class of probability-based Latin hypercube designs is proposed to accommodate a specific type of irregular region. A heuristic algorithm is proposed to search efficiently for optimal designs. Unbiased estimators are proposed, their variances are given and their performances are compared empirically. The proposed method is applied to obtain an optimal sensor placement plan to monitor and study the thermal distribution in a data centre. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
961
968
http://hdl.handle.net/10.1093/biomet/asq051
application/pdf
Access to full text is restricted to subscribers.
Ying Hung
Yasuo Amemiya
Chien-Fu Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:133-1462015-03-25RePEc:oup:biomet
article
Partial envelopes for efficient estimation in multivariate linear regression
We introduce the partial envelope model, which leads to a parsimonious method for multivariate linear regression when some of the predictors are of special interest. It has the potential to achieve massive efficiency gains compared with the standard model in the estimation of the coefficients for the selected predictors. The partial envelope model is a variation on the envelope model proposed by Cook et al. (2010) but, as it focuses on part of the predictors, it has looser restrictions and can further improve the efficiency. We develop maximum likelihood estimation for the partial envelope model and discuss applications of the bootstrap. An example is provided to illustrate some of its operating characteristics. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
133
146
http://hdl.handle.net/10.1093/biomet/asq063
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:977-9842015-03-25RePEc:oup:biomet
article
On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process
The Voronoi estimator may be defined for any location as the inverse of the area of the corresponding Voronoi cell. We investigate the statistical properties of this estimator for the intensity of an inhomogeneous Poisson process, and demonstrate it is approximately unbiased with a gamma sampling distribution. We also introduce the centroidal Voronoi estimator, a simple extension based on spatial regularization of the point pattern. Simulations show the Voronoi estimator has remarkably low bias, while the centroidal Voronoi estimator has slightly more bias but is much less variable. The performance is compared to kernel estimators using two simulated datasets and a dataset consisting of earthquakes within the continental United States. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asq047
application/pdf
Access to full text is restricted to subscribers.
C. D. Barr
F. P. Schoenberg
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:91-1062015-03-25RePEc:oup:biomet
article
On asymptotic normality and variance estimation for nondifferentiable survey estimators
Survey estimators of population quantities such as distribution functions and quantiles contain nondifferentiable functions of estimated quantities. The theoretical properties of such estimators are substantially more complicated to derive than those of differentiable estimators. In this article, we provide a unified framework for obtaining the asymptotic design-based properties of two common types of nondifferentiable estimators. Estimators of the first type have an explicit expression, while those of the second are defined only as the solution to estimating equations. We propose both analytical and replication-based design-consistent variance estimators for both cases, based on kernel regression. The practical behaviour of the variance estimators is demonstrated in a simulation experiment. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
91
106
http://hdl.handle.net/10.1093/biomet/asq077
application/pdf
Access to full text is restricted to subscribers.
Jianqiang C. Wang
J. D. Opsomer
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:35-482015-03-25RePEc:oup:biomet
article
Bayesian geostatistical modelling with informative sampling locations
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
35
48
http://hdl.handle.net/10.1093/biomet/asq067
application/pdf
Access to full text is restricted to subscribers.
D. Pati
B. J. Reich
D. B. Dunson
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:969-9762015-03-25RePEc:oup:biomet
article
Varying coefficient transformation models with censored data
A maximum likelihood method with spline smoothing is proposed for linear transformation models with varying coefficients. The estimation and inference procedures are computat