2015-05-23T13:21:18Z
http://oai.repec.openlib.org/oai.php
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:363-3782014-11-17RePEc:oup:biomet
article
Likelihood approaches for the invariant density ratio model with biased-sampling data
The full likelihood approach in statistical analysis is regarded as the most efficient means for estimation and inference. For complex length-biased failure time data, computational algorithms and theoretical properties are not readily available, especially when a likelihood function involves infinite-dimensional parameters. Relying on the invariance property of length-biased failure time data under the semiparametric density ratio model, we present two likelihood approaches for the estimation and assessment of the difference between two survival distributions. The most efficient maximum likelihood estimators are obtained by the <sc>em</sc> algorithm and profile likelihood. We also provide a simple numerical method for estimation and inference based on conditional likelihood, which can be generalized to k-arm settings. Unlike conventional survival data, the mean of the population failure times can be consistently estimated given right-censored length-biased data under mild regularity conditions. To check the semiparametric density ratio model assumption, we use a test statistic based on the area between two survival distributions. Simulation studies confirm that the full likelihood estimators are more efficient than the conditional likelihood estimators. We analyse an epidemiological study to illustrate the proposed methods. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
363
378
http://hdl.handle.net/10.1093/biomet/ass008
application/pdf
Access to full text is restricted to subscribers.
Yu Shen
Jing Ning
Jing Qin
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:325-3332014-11-17RePEc:oup:biomet
article
Objective Bayesian analysis for the Student-t regression model
We develop a Bayesian analysis based on two different Jeffreys priors for the Student-t regression model with unknown degrees of freedom. It is typically difficult to estimate the number of degrees of freedom: improper prior distributions may lead to improper posterior distributions, whereas proper prior distributions may dominate the analysis. We show that Bayesian analysis with either of the two considered Jeffreys priors provides a proper posterior distribution. Finally, we show that Bayesian estimators based on Jeffreys analysis compare favourably to other Bayesian estimators based on priors previously proposed in the literature. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
325
333
http://hdl.handle.net/10.1093/biomet/asn001
application/pdf
Access to full text is restricted to subscribers.
Thaís C. O. Fonseca
Marco A. R. Ferreira
Helio S. Migon
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1005-10112014-11-17RePEc:oup:biomet
article
A note on automatic variable selection using smooth-threshold estimating equations
This paper develops smooth-threshold estimating equations that can automatically eliminate irrelevant parameters by setting them as zero. The resulting estimator enjoys the oracle property in the sense of Fan & Li (2001), even in estimators for which the covariance assumption of Wang & Leng (2007) is violated, such as the Buckley--James estimator. Furthermore, the estimator can be obtained without solving a convex optimization problem. A <sc>bic</sc>-type criterion for tuning parameter selection is also proposed. It is shown that the criterion achieves consistent model selection. A numerical study confirms the performance of the method. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1005
1011
http://hdl.handle.net/10.1093/biomet/asp060
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:387-4022014-11-17RePEc:oup:biomet
article
Estimating a treatment effect with repeated measurements accounting for varying effectiveness duration
To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured over time for the same subject. The difference in their means may characterize a treatment effect. Since treatment effectiveness lag and saturation times may exist, erosion of treatment effect often occurs during the observation period. Instead of using models based on ad hoc parametric or purely nonparametric time-varying coefficients, we model the treatment effectiveness durations, which are the time intervals between the lag and saturation times. Then we use some mean response models to include such treatment effectiveness durations. Our methodology is demonstrated by simulations and analysis of a landmark <sc>HIV</sc>/<sc>AIDS</sc> clinical trial of short-course nevirapine against mother-to-child <sc>HIV</sc> vertical transmission during labour and delivery. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
387
402
http://hdl.handle.net/10.1093/biomet/asm019
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Chen
J. Yang
S. Cheng
J. B. Jackson
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:135-1522014-11-17RePEc:oup:biomet
article
Extending conventional priors for testing general hypotheses in linear models
We consider that observations come from a general normal linear model and that it is desirable to test a simplifying null hypothesis about the parameters. We approach this problem from an objective Bayesian, model-selection perspective. Crucial ingredients for this approach are 'proper objective priors' to be used for deriving the Bayes factors. Jeffreys-Zellner-Siow priors have good properties for testing null hypotheses defined by specific values of the parameters in full-rank linear models. We extend these priors to deal with general hypotheses in general linear models, not necessarily of full rank. The resulting priors, which we call 'conventional priors', are expressed as a generalization of recently introduced 'partially informative distributions'. The corresponding Bayes factors are fully automatic, easily computed and very reasonable. The methodology is illustrated for the change-point problem and the equality of treatments effects problem. We compare the conventional priors derived for these problems with other objective Bayesian proposals like the intrinsic priors. It is concluded that both priors behave similarly although interesting subtle differences arise. We adapt the conventional priors to deal with nonnested model selection as well as multiple-model comparison. Finally, we briefly address a generalization of conventional priors to nonnormal scenarios. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
135
152
http://hdl.handle.net/10.1093/biomet/asm014
application/pdf
Access to full text is restricted to subscribers.
M.J. Bayarri
Gonzalo García-Donato
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:773-7782014-11-17RePEc:oup:biomet
article
A note on conditional <sc>aic</sc> for linear mixed-effects models
The conventional model selection criterion, the Akaike information criterion, <sc>aic</sc>, has been applied to choose candidate models in mixed-effects models by the consideration of marginal likelihood. Vaida & Blanchard (2005) demonstrated that such a marginal <sc>aic</sc> and its small sample correction are inappropriate when the research focus is on clusters. Correspondingly, these authors suggested the use of conditional <sc>aic</sc>. Their conditional <sc>aic</sc> is derived under the assumption that the variance-covariance matrix or scaled variance-covariance matrix of random effects is known. This note provides a general conditional <sc>aic</sc> but without these strong assumptions. Simulation studies show that the proposed method is promising. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
773
778
http://hdl.handle.net/10.1093/biomet/asn023
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Hulin Wu
Guohua Zou
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:767-7672014-11-17RePEc:oup:biomet
article
'Nonparametric inference in multivariate mixtures'<break/>Biometrika (2005), 92, pp. 667–678
The left-hand side of equation (2·8), on p. 671, should read {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) rather than {(1 − π<sub>1</sub>)/π<sub>1</sub>}-super-1/2 (2π<sub>1</sub> − 1). Reflecting this change, the left-hand side of equation (3·1) on the same page should be altered to <inline-formula><mml:math><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mo ver><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>Ȣ 7;</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn>< /mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml :msub><mml:mrow><mml:mover><mml:mrow><mml:mi>π</mml:mi></mml:mrow>< mml:mrow><mml:mi>∧</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:m row><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mo>−</mml:mo>< mml:mn>1/2</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mrow><mml:move r><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>∧ </mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></m ml:mrow></mml:msub><mml:mo>−</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and the formula at the foot of p. 677 should be modified to {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) + O<sub>p</sub>(n-super-−1/2). No other formula is affected, and the left-hand side of (2·8) is still increasing in π<sub>1</sub>. The numerical results, discussed in §4, are influenced in minor ways. In the simulation study, absolute bias is reduced, and variance is either slightly increased or slightly decreased. In the real-data example, using the nonparametric approach to analysis, mean squared error is further reduced, from 0·0011 to 0·0004. We are grateful to Hiro Kasahara and Katsumi Shimotsu for pointing out the error. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
767
767
http://hdl.handle.net/10.1093/biomet/asm042
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:513-5272014-11-17RePEc:oup:biomet
article
Adaptive regularization using the entire solution surface
Several sparseness penalties have been suggested for delivery of good predictive performance in automatic variable selection within the framework of regularization. All assume that the true model is sparse. We propose a penalty, a convex combination of the L<sub>1</sub>- and L<sub>∞</sub>-norms, that adapts to a variety of situations including sparseness and nonsparseness, grouping and nongrouping. The proposed penalty performs grouping and adaptive regularization. In addition, we introduce a novel homotopy algorithm utilizing subgradients for developing regularization solution surfaces involving multiple regularizers. This permits efficient computation and adaptive tuning. Numerical experiments are conducted using simulation. In simulated and real examples, the proposed penalty compares well against popular alternatives. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
513
527
http://hdl.handle.net/10.1093/biomet/asp038
application/pdf
Access to full text is restricted to subscribers.
S. Wu
X. Shen
C. J. Geyer
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:583-5982014-11-17RePEc:oup:biomet
article
Functional mixed effects spectral analysis
In many experiments, time series data can be collected from multiple units and multiple time series segments can be collected from the same unit. This article introduces a mixed effects Cramér spectral representation which can be used to model the effects of design covariates on the second-order power spectrum while accounting for potential correlations among the time series segments collected from the same unit. The transfer function is composed of a deterministic component to account for the population-average effects and a random component to account for the unit-specific deviations. The resulting log-spectrum has a functional mixed effects representation where both the fixed effects and random effects are functions in the frequency domain. It is shown that, when the replicate-specific spectra are smooth, the log-periodograms converge to a functional mixed effects model. A data-driven iterative estimation procedure is offered for the periodic smoothing spline estimation of the fixed effects, penalized estimation of the functional covariance of the random effects, and unit-specific random effects prediction via the best linear unbiased predictor. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
583
598
http://hdl.handle.net/10.1093/biomet/asr032
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
Martica Hall
Wensheng Guo
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:679-6942014-11-17RePEc:oup:biomet
article
Improving the efficiency of the log-rank test using auxiliary covariates
Under the assumption of proportional hazards, the log-rank test is optimal for testing the null hypothesis <inline-formula><inline-graphic xlink:href="asn003ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asn003ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> denotes the logarithm of the hazard ratio. However, if there are additional covariates that correlate with survival times, making use of their information will increase the efficiency of the log-rank test. We apply the theory of semiparametrics to characterize a class of regular and asymptotically linear estimators for <inline-formula><inline-graphic xlink:href="asn003ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> when auxiliary covariates are incorporated into the model, and derive estimators that are more efficient. The Wald tests induced by these estimators are shown to be more powerful than the log-rank test. Simulation studies are used to illustrate the gains in efficiency. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
679
694
http://hdl.handle.net/10.1093/biomet/asn003
application/pdf
Access to full text is restricted to subscribers.
Xiaomin Lu
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:877-8912014-11-17RePEc:oup:biomet
article
Adaptive cluster double sampling
We present a multi-phase variant of adaptive cluster sampling which allows the sampler to control the number of measurements of the variable of interest. A first-phase sample is selected using an adaptive cluster sampling design based on an inexpensive auxiliary variable associated with the survey variable. Then the network structure of the adaptive cluster sample is used to select an ordinary one-phase or two-phase subsample of units and the values of the survey variable associated with those units are recorded. The population mean is estimated by either a regression-type estimator or a Horvitz--Thompson-type estimator. The results of a simulation study show good performance of the proposed design, and suggest that in many real situations this design might be preferred to the ordinary adaptive cluster sampling design. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
877
891
http://hdl.handle.net/10.1093/biomet/91.4.877
text/html
Access to full text is restricted to subscribers.
Martín H. Felix-Medina
Steven K. Thompson
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:533-5502014-11-17RePEc:oup:biomet
article
Inferring stochastic dynamics from functional data
In most current data modelling for time-dynamic systems, one works with a prespecified differential equation and attempts to estimate its parameters. In contrast, we demonstrate that in the case of functional data, the equation itself can be inferred. Assuming only that the dynamics are described by a first-order nonlinear differential equation with a random component, we obtain data-adaptive dynamic equations from the observed data via a simple smoothing-based procedure. We prove consistency and introduce diagnostics to ascertain the fraction of variance that is explained by the deterministic part of the equation. This approach is shown to yield useful insights into the time-dynamic nature of human growth. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
533
550
http://hdl.handle.net/10.1093/biomet/ass015
application/pdf
Access to full text is restricted to subscribers.
Nicolas Verzelen
Wenwen Tao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:529-5382014-11-17RePEc:oup:biomet
article
Bayesian nonparametric multiple imputation of partially observed data with ignorable nonresponse
We present a new, nonparametric Bayesian method for multiple imputation of partially observed data for which the pattern of missingness is arbitrary and the data are missing at random with ignorable nonresponse with respect to the model specification. Motivation for the method is provided, followed by an overview of Pólya trees and their application to multiple imputation, and a comparison of the new method to existing approaches is presented. The method is illustrated on a dataset of colleges and universities in the United States. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
529
538
Susan M. Paddock
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:257-2632014-11-17RePEc:oup:biomet
article
Asymptotic inference for a nonstationary double <sc>AR</sc>(1) model
We investigate the nonstationary double <sc>ar(1)</sc> model, <disp-formula><graphic xlink:href="asm084ueq1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></disp-formula> where ω > 0, α > 0, the η<sub>t</sub> are independent standard normal random variables and Elog |φ + η<sub>t</sub>√α| ⩾ 0. We show that the maximum likelihood estimator of (φ, α) is consistent and asymptotically normal. Combination of this result with that in Ling ([11]) for the stationary case gives the asymptotic normality of the maximum likelihood estimator of φ for any φ in the real line, with a root-n rate of convergence. This is in contrast to the results for the classical <sc>ar(1)</sc> model, corresponding to α = 0. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
257
263
http://hdl.handle.net/10.1093/biomet/asm084
application/pdf
Access to full text is restricted to subscribers.
Shiqing Ling
Dong Li
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:741-7552014-11-17RePEc:oup:biomet
article
Kernel smoothed profile likelihood estimation in the accelerated failure time frailty model for clustered survival data
Clustered survival data frequently arise in biomedical applications, where event times of interest are clustered into groups such as families. In this article we consider an accelerated failure time frailty model for clustered survival data and develop nonparametric maximum likelihood estimation for it via a kernel smoother-aided <sc>em</sc> algorithm. We show that the proposed estimator for the regression coefficients is consistent, asymptotically normal, and semiparametric efficient when the kernel bandwidth is properly chosen. An <sc>em</sc>-aided numerical differentiation method is derived for estimating its variance. Simulation studies evaluate the finite sample performance of the estimator, and it is applied to the diabetic retinopathy dataset. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/ast012
application/pdf
Access to full text is restricted to subscribers.
Bo Liu
Wenbin Lu
Jiajia Zhang
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:235-2542014-11-17RePEc:oup:biomet
article
Bayesian alignment using hierarchical models, with applications in protein bioinformatics
An important problem in shape analysis is to match configurations of points in space after filtering out some geometrical transformation. In this paper we introduce hierarchical models for such tasks, in which the points in the configurations are either unlabelled or have at most a partial labelling constraining the matching, and in which some points may only appear in one of the configurations. We derive procedures for simultaneous inference about the matching and the transformation, using a Bayesian approach. Our hierarchical model is based on a Poisson process for hidden true point locations; this leads to considerable mathematical simplification and efficiency of implementation of <EM t="s">EM and Markov chain Monte Carlo algorithms. We find a novel use for classical distributions from directional statistics in a conditionally conjugate specification for the case where the geometrical transformation includes an unknown rotation. Throughout, we focus on the case of affine or rigid motion transformations. Under a broad parametric family of loss functions, an optimal Bayesian point estimate of the matching matrix can be constructed that depends only on a single parameter of the family. Our methods are illustrated by two applications from bioinformatics. The first problem is of matching protein gels in two dimensions, and the second consists of aligning active sites of proteins in three dimensions. In the latter case, we also use information related to the grouping of the amino acids, as an example of a more general capability of our methodology to include partial labelling information. We discuss some open problems and suggest directions for future work. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
235
254
http://hdl.handle.net/10.1093/biomet/93.2.235
text/html
Access to full text is restricted to subscribers.
Peter J. Green
Kanti V. Mardia
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:761-7742014-11-17RePEc:oup:biomet
article
Non-Gaussian spatiotemporal modelling through scale mixing
We construct non-Gaussian processes that vary continuously in space and time with nonseparable covariance functions. Starting from a general and flexible way of constructing valid nonseparable covariance functions through mixing over separable covariance functions, the resulting models are generalized by allowing for outliers as well as regions with larger variances. We induce this through scale mixing with separate positive-valued processes. Smooth mixing processes are applied to the underlying correlated processes in space and in time, thus leading to regions in space and time of increased spread. An uncorrelated mixing process on the nugget effect accommodates outliers. Posterior and predictive Bayesian inference with these models is implemented through a Markov chain Monte Carlo sampler. An application to temperature data in the Basque country illustrates the potential of this model in the identification of outliers and regions with inflated variance, and shows that this improves the predictive performance. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
761
774
http://hdl.handle.net/10.1093/biomet/asr047
application/pdf
Access to full text is restricted to subscribers.
Thaís C. O. Fonseca
Mark F. J. Steel
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:847-8582014-11-17RePEc:oup:biomet
article
Estimating equations for spatially correlated data in multi-dimensional space
We use the quasilikelihood concept to propose an estimating equation for spatial data with correlation across the study region in a multi-dimensional space. With appropriate mixing conditions, we develop a central limit theorem for a random field under various L<sub>p</sub> metrics. The consistency and asymptotic normality of quasilikelihood estimators can then be derived. We also conduct simulations to evaluate the performance of the proposed estimating equation, and a dataset from East Lansing Woods is used to illustrate the method. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
847
858
http://hdl.handle.net/10.1093/biomet/asn046
application/pdf
Access to full text is restricted to subscribers.
Pei-Sheng Lin
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:209-2222014-11-17RePEc:oup:biomet
article
Bürmann expansion and test for additivity
We propose a Lagrange multiplier test for additivity based on the Bürmann expansion of a conditional mean function. The asymptotic null distribution of the test is shown to be x-super-2, under some regularity conditions. In contrast, the Lagrange multiplier test proposed by Chen et al. (1995) is based on the Volterra expansion of the conditional mean function. We discuss some desirable advantages of the Bürmann expansion over the Volterra expansion for nonlinear time series modelling. We also reported an empirical study which shows that, in terms of empirical power, the Lagrange multiplier test motivated by the Bürmann expansion outperforms the test of Chen et al. (1995) for the cases for which the Lagrange multiplier test is designed. For other cases for which none of the tests is specifically designed, the empirical powers of the two tests are comparable. Finally, we illustrated the use of the Lagrange multiplier test with a blowfly experimental system. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
209
222
K. S. Chan
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:539-5522014-11-17RePEc:oup:biomet
article
A sequential particle filter method for static models
Particle filter methods are complex inference procedures, which combine importance sampling and Monte Carlo schemes in order to explore consistently a sequence of multiple distributions of interest. We show that such methods can also offer an efficient estimation tool in 'static' set-ups, in which case &pgr;(&thgr; | y-sub-1, …, y-sub-N) (n < N) is the only posterior distribution of interest but the preliminary exploration of partial posteriors &pgr;(&thgr; | y-sub-1, …, y-sub-n) makes it possible to save computing time. A complete algorithm is proposed for independent or Markov models. Our method is shown to challenge other common estimation procedures in terms of robustness and execution time, especially when the sample size is important. Two classes of examples, mixture models and discrete generalised linear models, are discussed and illustrated by numerical results. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
539
552
Nicolas Chopin
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:449-4582014-11-17RePEc:oup:biomet
article
Optimal design for additive partially nonlinear models
We develop optimal design theory for additive partially nonlinear regression models, showing that Bayesian and standardized maximin D-optimal designs can be found as the products of the corresponding optimal designs in one dimension. A sufficient condition under which analogous results hold for D<sub>s</sub>-optimality is derived to accommodate situations in which only a subset of the model parameters is of interest. To facilitate prediction of the response at unobserved locations, we prove similar results for Q-optimality in the class of all product designs. The usefulness of this approach is demonstrated through an application from the automotive industry, where optimal designs for least squares regression splines are determined and compared with designs commonly used in practice. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
449
458
http://hdl.handle.net/10.1093/biomet/asr001
application/pdf
Access to full text is restricted to subscribers.
S. Biedermann
H. Dette
D. C. Woods
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:261-2682014-11-17RePEc:oup:biomet
article
Non-restarting cumulative sum charts and control of the false discovery rate
Cumulative sum or <sc>cusum</sc> charts are typically used to detect a change in the distribution of a sequence of observations, e.g., shifts in the mean. Usually, after signalling, the chart is restarted by setting it to some value below the signalling threshold. We propose a non-restarting <sc>cusum</sc> chart which is able to detect periods during which the stream is out of control. Further, we advocate an upper boundary to prevent the <sc>cusum</sc> chart rising too high, which helps to detect a change back into control. We present an algorithm to control the false discovery rate when considering <sc>cusum</sc> charts based on multiple streams of data. We consider two definitions of a false discovery: signalling out-of-control when the observations have been in control since the start and signalling out-of-control when the observations have been in control since the last time the chart was at zero. We prove that the false discovery rate is controlled under both these definitions simultaneously. Simulations reveal the difference in false discovery rate control when using these and other desirable definitions of a false discovery. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
261
268
http://hdl.handle.net/10.1093/biomet/ass066
application/pdf
Access to full text is restricted to subscribers.
Axel Gandy
F. Din-Houn Lau
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:111-1282014-11-17RePEc:oup:biomet
article
Varying-coefficient models and basis function approximations for the analysis of repeated measurements
<?Pub Caret> A global smoothing procedure is developed using basis function approximations for estimating the parameters of a varying-coefficient model with repeated measurements. Inference procedures based on a resampling subject bootstrap are proposed to construct confidence regions and to perform hypothesis testing. Conditional biases and variances of our estimators and their asymptotic consistency are developed explicitly. Finite sample properties of our procedures are investigated through a simulation study. Application of the proposed approach is demonstrated through an example in epidemiology. In contrast to the existing methods, this approach applies whether or not the covariates are time-invariant and does not require binning of the data when observations are sparse at distinct observation times. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
111
128
Jianhua Z. Huang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:875-8892014-11-17RePEc:oup:biomet
article
Pairwise curve synchronization for functional data
Data collected by scientists are increasingly in the form of trajectories or curves. Often these can be viewed as realizations of a composite process driven by both amplitude and time variation. We consider the situation in which functional variation is dominated by time variation, and develop a curve-synchronization method that uses every trajectory in the sample as a reference to obtain pairwise warping functions in the first step. These initial pairwise warping functions are then used to create improved estimators of the underlying individual warping functions in the second step. A truncated averaging process is used to obtain robust estimation of individual warping functions. The method compares well with other available time-synchronization approaches and is illustrated with Berkeley growth data and gene expression data for multiple sclerosis. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
875
889
http://hdl.handle.net/10.1093/biomet/asn047
application/pdf
Access to full text is restricted to subscribers.
Rong Tang
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:181-1982014-11-17RePEc:oup:biomet
article
On Bayesian testimation and its application to wavelet thresholding
We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse l<sub>p</sub>-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense l<sub>p</sub>-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
181
198
http://hdl.handle.net/10.1093/biomet/asp080
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
Athanasia Petsa
Theofanis Sapatinas
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:647-6592014-11-17RePEc:oup:biomet
article
Simulation of hyper-inverse Wishart distributions in graphical models
We introduce and exemplify an efficient method for direct sampling from hyper-inverse Wishart distributions. The method relies very naturally on the use of standard junction-tree representation of graphs, and couples these with matrix results for inverse Wishart distributions. We describe the theory and resulting computational algorithms for both decomposable and nondecomposable graphical models. An example drawn from financial time series demonstrates application in a context where inferences on a structured covariance model are required. We discuss and investigate questions of scalability of the simulation methods to higher-dimensional distributions. The paper concludes with general comments about the approach, including its use in connection with existing Markov chain Monte Carlo methods that deal with uncertainty about the graphical model structure. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asm056
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Hélène Massam
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:645-6612014-11-17RePEc:oup:biomet
article
Markov models for accumulating mutations
We introduce and analyze a waiting time model for the accumulation of genetic changes. The continuous-time conjunctive Bayesian network is defined by a partially ordered set of mutations and by the rate of fixation of each mutation. The partial order encodes constraints on the order in which mutations can fixate in the population, shedding light on the mutational pathways underlying the evolutionary process. We study a censored version of the model and derive equations for an <sc>em</sc> algorithm to perform maximum likelihood estimation of the model parameters. We also show how to select the maximum likelihood partially ordered set. The model is applied to genetic data from cancer cells and from drug resistant human immunodeficiency viruses, indicating implications for diagnosis and treatment. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
645
661
http://hdl.handle.net/10.1093/biomet/asp023
application/pdf
Access to full text is restricted to subscribers.
N. Beerenwinkel
S. Sullivant
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:741-7472014-11-17RePEc:oup:biomet
article
Construction of φ<sub>p</sub>-optimal exact designs with minimum experimental run size for a linear log contrast model in mixture experiments
We propose a new method with minimum experimental run size using the properties of Hadamard matrices through which some φ<sub>p</sub>-optimal exact designs including A-, D- and E-optimal designs are constructed for a linear log contrast model in mixture experiments. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asr014
application/pdf
Access to full text is restricted to subscribers.
Baisuo Jin
Mong-Na Lo Huang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:197-2102014-11-17RePEc:oup:biomet
article
Spectral methods for nonstationary spatial processes
<?Pub Caret> We propose a nonstationary periodogram and various parametric approaches for estimating the spectral density of a nonstationary spatial process. We also study the asymptotic properties of the proposed estimators via shrinking asymptotics, assuming the distance between neighbouring observations tends to zero as the size of the observation region grows without bound. With this type of asymptotic model we can uniquely determine the spectral density, avoiding the aliasing problem. We also present a new class of nonstationary processes, based on a convolution of local stationary processes. This model has the advantage that the model is simultaneously defined everywhere, unlike 'moving window' approaches, but it retains the attractive property that, locally in small regions, it behaves like a stationary spatial process. Applications include the spatial analysis and modelling of air pollution data provided by the US Environmental Protection Agency. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
197
210
Montserrat Fuentes
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:791-8052014-11-17RePEc:oup:biomet
article
Additive modelling of functional gradients
We consider the problem of estimating functional derivatives and gradients in the framework of a regression setting where one observes functional predictors and scalar responses. Derivatives are then defined as functional directional derivatives that indicate how changes in the predictor function in a specified functional direction are associated with corresponding changes in the scalar response. For a model-free approach, navigating the curse of dimensionality requires the imposition of suitable structural constraints. Accordingly, we develop functional derivative estimation within an additive regression framework. Here, the additive components of functional derivatives correspond to derivatives of nonparametric one-dimensional regression functions with the functional principal components of predictor processes as arguments. This approach requires nothing more than estimating derivatives of one-dimensional nonparametric regressions, and thus is computationally very straightforward to implement, while it also provides substantial flexibility, fast computation and consistent estimation. We illustrate the consistent estimation and interpretation of the resulting functional derivatives and functional gradient fields in a study of the dependence of lifetime fertility of flies on early life reproductive trajectories. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
791
805
http://hdl.handle.net/10.1093/biomet/asq056
application/pdf
Access to full text is restricted to subscribers.
Hans-Georg Müller
Fang Yao
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:653-6662014-11-17RePEc:oup:biomet
article
Generalized varying coefficient models for longitudinal data
We propose a generalization of the varying coefficient model for longitudinal data to cases where not only current but also recent past values of the predictor process affect current response. More precisely, the targeted regression coefficient functions of the proposed model have sliding window supports around current time t. A variant of a recently proposed two-step estimation method for varying coefficient models is proposed for estimation in the context of these generalized varying coefficient models, and is found to lead to improvements, especially for the case of additive measurement errors in both response and predictors. The proposed methodology for estimation and inference is also applicable for the case of additive measurement error in the common versions of varying coefficient models that relate only current observations of predictor and response processes to each other. Asymptotic distributions of the proposed estimators are derived, and the model is applied to the problem of predicting protein concentrations in a longitudinal study. Simulation studies demonstrate the efficacy of the proposed estimation procedure. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
653
666
http://hdl.handle.net/10.1093/biomet/asn006
application/pdf
Access to full text is restricted to subscribers.
Damla Şentürk
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:591-6022014-11-17RePEc:oup:biomet
article
Testing model adequacy for dynamic panel data with intercorrelation
We give several definitions of residual autocorrelations and derive their joint asymptotic distribution for the panel time series model of Hjellvik & Tjøstheim (1999a). A portmanteau goodness-of-fit test arises naturally from the asymptotic distribution. Simulation results show that the asymptotic standard errors compared satisfactorily with the empirical standard errors, that the goodness-of-fit test has reasonable empirical size, and that it is powerful enough to be useful with a modest sample size. The results of this paper are illustrated with a real-data example. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
591
602
Bo Fu
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:451-4582014-11-17RePEc:oup:biomet
article
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis-Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
451
458
http://hdl.handle.net/10.1093/biomet/93.2.451
text/html
Access to full text is restricted to subscribers.
J. Møller
A. N. Pettitt
R. Reeves
K. K. Berthelsen
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:553-5682014-11-17RePEc:oup:biomet
article
Tuning parameter selectors for the smoothly clipped absolute deviation method
The penalized least squares approach with smoothly clipped absolute deviation penalty has been consistently demonstrated to be an attractive regression shrinkage and selection method. It not only automatically and consistently selects the important variables, but also produces estimators which are as efficient as the oracle estimator. However, these attractive features depend on appropriate choice of the tuning parameter. We show that the commonly used generalized crossvalidation cannot select the tuning parameter satisfactorily, with a nonignorable overfitting effect in the resulting model. In addition, we propose a <sc>BIC</sc> tuning parameter selector, which is shown to be able to identify the true model consistently. Simulation studies are presented to support theoretical findings, and an empirical example is given to illustrate its use in the Female Labor Supply data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
553
568
http://hdl.handle.net/10.1093/biomet/asm053
application/pdf
Access to full text is restricted to subscribers.
Hansheng Wang
Runze Li
Chih-Ling Tsai
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:831-8452014-11-17RePEc:oup:biomet
article
A goodness-of-fit test for inhomogeneous spatial Poisson processes
We introduce a formal testing procedure to assess the fit of an inhomogeneous spatial Poisson process model, based on a discrepancy measure function <inline-formula><inline-graphic xlink:href="asn045ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> that is constructed from residuals obtained from the fitted model. We derive the asymptotic distributional properties of <inline-formula><inline-graphic xlink:href="asn045ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and develop a test statistic based on them. Our test statistic has a limiting standard normal distribution, so that the test can be performed by simply comparing the test statistic with readily available critical values. We perform a simulation study to assess the performance of the proposed method and apply it to a real data example. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
831
845
http://hdl.handle.net/10.1093/biomet/asn045
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:663-6842014-11-17RePEc:oup:biomet
article
Efficient restricted estimators for conditional mean models with missing data
Consider a conditional mean model with missing data on the response or explanatory variables due to two-phase sampling or nonresponse. Robins et al. (1994) introduced a class of augmented inverse-probability-weighted estimators, depending on a vector of functions of explanatory variables and a vector of functions of coarsened data. Tsiatis (2006) studied two classes of restricted estimators, class 1 with both vectors restricted to finite-dimensional linear subspaces and class 2 with the first vector of functions restricted to a finite-dimensional linear subspace. We introduce a third class of restricted estimators, class 3, with the second vector of functions restricted to a finite-dimensional subspace. We derive a new estimator, which is asymptotically optimal in class 1, by the methods of nonparametric and empirical likelihood. We propose a hybrid strategy to obtain estimators that are asymptotically optimal in class 1 and locally optimal in class 2 or class 3. The advantages of the hybrid, likelihood estimator based on classes 1 and 3 are shown in a simulation study and a real-data example. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
663
684
http://hdl.handle.net/10.1093/biomet/asr007
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:100:y:2013:i:4:p:781-8002014-11-17RePEc:oup:biomet
article
Bridging the ensemble Kalman and particle filters
In many applications of Monte Carlo nonlinear filtering, the propagation step is computationally expensive, and hence the sample size is limited. With small sample sizes, the update step becomes crucial. Particle filtering suffers from the well-known problem of sample degeneracy. Ensemble Kalman filtering avoids this, at the expense of treating non-Gaussian features of the forecast distribution incorrectly. Here we introduce a procedure that makes a continuous transition indexed by Gamma∈[0,1] between the ensemble and the particle filter update. We propose automatic choices of the parameter Gamma such that the update stays as close as possible to the particle filter update subject to avoiding degeneracy. In various examples, we show that this procedure leads to updates that are able to handle non-Gaussian features of the forecast sample even in high-dimensional situations. Copyright 2013, Oxford University Press.
4
2013
100
Biometrika
781
800
http://hdl.handle.net/10.1093/biomet/ast020
application/pdf
Access to full text is restricted to subscribers.
M. Frei
H. R. Künsch
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:805-8202014-11-17RePEc:oup:biomet
article
Inference on population size in binomial detectability models
Many models for biological populations, including simple mark-recapture models and distance sampling models, involve a binomially distributed number, n, of observations x<sub>1</sub>, …, x<sub>n</sub> on members of a population of size N. Two popular estimators of (N, θ), where θ is a vector parameter, are the maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the conditional maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> based on the conditional distribution of x<sub>1</sub>, …, x<sub>n</sub> given n. We derive the large-N asymptotic distributions of <inline-formula><inline-graphic xlink:href="asp051ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>and <inline-formula><inline-graphic xlink:href="asp051ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, and give formulae for the biases of <inline-formula><inline-graphic xlink:href="asp051ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and <inline-formula><inline-graphic xlink:href="asp051ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>. We show that the difference <inline-formula><inline-graphic xlink:href="asp051ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>is, remarkably, of order 1 and we give a simple formula for the leading part of this difference. Simulations indicate that in many cases this formula is very accurate and that confidence intervals based on the asymptotic distribution have excellent coverage. An extension to product-binomial models is given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
805
820
http://hdl.handle.net/10.1093/biomet/asp051
application/pdf
Access to full text is restricted to subscribers.
R. M. Fewster
P. E. Jupp
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:748-7542014-11-17RePEc:oup:biomet
article
Designs of variable resolution
Prior information or background knowledge may suggest that interactions arise only within certain factors. When such knowledge is available, we propose using a new class of designs: designs of variable resolution. Several constructions are presented. Statistical justifications for using such designs from minimum G<sub>2</sub> aberration and design efficiency perspectives are provided. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
748
754
http://hdl.handle.net/10.1093/biomet/ass035
application/pdf
Access to full text is restricted to subscribers.
C. Devon Lin
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:489-4902014-11-17RePEc:oup:biomet
article
A counterexample to a claim about stochastic simulations
Engen & Lillegård (1997) presented a general method for doing Monte Carlo simulations conditioned on a sufficient statistic. The basic idea was to adjust the parameter values in the corresponding unconditional simulation so that the actual value of the sufficient statistic is obtained, and the claim was that if this adjustment is unique then the modified simulation is from the conditional distribution. Unfortunately the claim is not correct, as shown by a counterexample. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
489
490
Bo Henry Lindqvist
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:49-642014-11-17RePEc:oup:biomet
article
Functional quadratic regression
We extend the common linear functional regression model to the case where the dependency of a scalar response on a functional predictor is of polynomial rather than linear nature. Focusing on the quadratic case, we demonstrate the usefulness of the polynomial functional regression model, which encompasses linear functional regression as a special case. Our approach works under mild conditions for the case of densely spaced observations and also can be extended to the important practical situation where the functional predictors are derived from sparse and irregular measurements, as is the case in many longitudinal studies. A key observation is the equivalence of the functional polynomial model with a regression model that is a polynomial of the same order in the functional principal component scores of the predictor processes. Theoretical analysis as well as practical implementations are based on this equivalence and on basis representations of predictor processes. We also obtain an explicit representation of the regression surface that defines quadratic functional regression and provide functional asymptotic results for an increasing number of model components as the number of subjects in the study increases. The improvements that can be gained by adopting quadratic as compared to linear functional regression are illustrated with a case study that includes absorption spectra as functional predictors. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
49
64
http://hdl.handle.net/10.1093/biomet/asp069
application/pdf
Access to full text is restricted to subscribers.
Fang Yao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:295-3042014-11-17RePEc:oup:biomet
article
Sufficient dimension reduction through discretization-expectation estimation
In the context of sufficient dimension reduction, the goal is to parsimoniously recover the central subspace of a regression model. Many inverse regression methods use slicing estimation to recover the central subspace. The efficacy of slicing estimation depends heavily upon the number of slices. However, the selection of the number of slices is an open and long-standing problem. In this paper, we propose a discretization-expectation estimation method, which avoids selecting the number of slices, while preserving the integrity of the central subspace. This generic method assures root-n consistency and asymptotic normality of slicing estimators for many inverse regression methods, and can be applied to regressions with multivariate responses. A <sc>BIC</sc>-type criterion for the dimension of the central subspace is proposed. Comprehensive simulations and an illustrative application show that our method compares favourably with existing estimators. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
295
304
http://hdl.handle.net/10.1093/biomet/asq018
application/pdf
Access to full text is restricted to subscribers.
Liping Zhu
Tao Wang
Lixing Zhu
Louis Ferré
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:211-2242014-11-17RePEc:oup:biomet
article
Empty confidence sets for epidemics, branching processes and Brownian motion
<?Pub Caret> This paper treats some examples where likelihood-based inference for certain model parameters may produce empty confidence sets. The first example concerns epidemics, and the parameter of interest is the basic reproduction number R-sub-0, which is to be estimated from the final size of an epidemic in a finite population. The second example treats estimation of the mean of the offspring distribution in a branching process, based on observing the total progeny, i.e. the total number of individuals ever born in the branching process. The final example considers estimation of the linear drift in a Brownian motion, based on observing the first hitting time of some horizontal barrier. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
211
224
Frank G Ball
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:317-3352014-11-17RePEc:oup:biomet
article
A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models
A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M-super-+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M-super-+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
317
335
http://hdl.handle.net/10.1093/biomet/92.2.317
text/html
Access to full text is restricted to subscribers.
Aliye Atay-Kayis
Helène Massam
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:357-3702014-11-17RePEc:oup:biomet
article
Covariate-adjusted generalized linear models
We propose covariate adjustment methodology for a situation where one wishes to study the dependence of a generalized response on predictors while both predictors and response are distorted by an observable covariate. The distorting covariate is thought of as a size measurement that affects predictors in a multiplicative fashion. The generalized response is modelled by means of a random threshold, where the subject-specific thresholds are affected by a multiplicative factor that is a function of the distorting covariate. While the various factors are modelled as smooth unknown functions of the distorting covariate, the underlying relationship between response and covariates is assumed to be governed by a generalized linear model with a known link function. This model provides an extension of a covariate-adjusted regression approach to the case of a generalized linear model. We demonstrate that this contamination model leads to a semiparametric varying-coefficient model. Numerical implementation is straightforward by combining binning, quasilikelihood, and smoothing steps. The asymptotic distribution of the proposed estimators for the regression coefficients of the latent generalized linear model is derived by means of a martingale central limit theorem. Combining this result with consistent estimators for the asymptotic variance makes it then possible to obtain asymptotic inference for the targeted parameters. Both real and simulated data are used in illustrating the proposed methodology. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
357
370
http://hdl.handle.net/10.1093/biomet/asp012
application/pdf
Access to full text is restricted to subscribers.
Damla Şentürk
Hans-Georg Müller
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:683-6982014-11-17RePEc:oup:biomet
article
Circular regression
A new model for an angular regression link function is introduced. The model employs an angular scale parameter, incorporates proper and improper rotations as special cases, and is equivalent to the Möbius circle mapping for complex variables. Desirable properties of the circle mapping carry over to angular regression. Parameter estimation and inferential methods are developed and illustrated. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
683
698
T. D. Downs
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:761-7802014-11-17RePEc:oup:biomet
article
Sinh-arcsinh distributions
We introduce the sinh-arcsinh transformation and hence, by applying it to a generating distribution with no parameters other than location and scale, usually the normal, a new family of sinh-arcsinh distributions. This four-parameter family has symmetric and skewed members and allows for tailweights that are both heavier and lighter than those of the generating distribution. The central place of the normal distribution in this family affords likelihood ratio tests of normality that are superior to the state-of-the-art in normality testing because of the range of alternatives against which they are very powerful. Likelihood ratio tests of symmetry are also available and are very successful. Three-parameter symmetric and asymmetric subfamilies of the full family are also of interest. Heavy-tailed symmetric sinh-arcsinh distributions behave like Johnson S<sub>U</sub> distributions, while their light-tailed counterparts behave like sinh-normal distributions, the sinh-arcsinh family allowing a seamless transition between the two, via the normal, controlled by a single parameter. The sinh-arcsinh family is very tractable and many properties are explored. Likelihood inference is pursued, including an attractive reparameterization. Illustrative examples are given. A multivariate version is considered. Options and extensions are discussed. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
761
780
http://hdl.handle.net/10.1093/biomet/asp053
application/pdf
Access to full text is restricted to subscribers.
M. C. Jones
Arthur Pewsey
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:225-2302014-11-17RePEc:oup:biomet
article
Data-driven selection of the spline dimension in penalized spline regression
A number of criteria exist to select the penalty in penalized spline regression, but the selection of the number of spline basis functions has received much less attention in the literature. We propose a likelihood-based criterion to select the number of basis functions in penalized spline regression. The criterion is easy to apply and we describe its theoretical and practical properties. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
225
230
http://hdl.handle.net/10.1093/biomet/asq081
application/pdf
Access to full text is restricted to subscribers.
Göran Kauermann
Jean D. Opsomer
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:661-6722014-11-17RePEc:oup:biomet
article
Recursive computing and simulation-free inference for general factorizable models
We illustrate how the recursive algorithm of Reeves & Pettitt (2004) for general factorizable models can be extended to allow exact sampling, maximization of distributions and computation of marginal distributions. All of the methods we describe apply to discrete-valued Markov random fields with nearest neighbour integrations defined on regular lattices; in particular we illustrate that exact inference can be performed for hidden autologistic models defined on moderately sized lattices. In this context we offer an extension of this methodology which allows approximate inference to be carried out for larger lattices without resorting to simulation techniques such as Markov chain Monte Carlo. In particular our work offers the basis for an automatic inference machine for such models. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
661
672
http://hdl.handle.net/10.1093/biomet/asm052
application/pdf
Access to full text is restricted to subscribers.
Nial Friel
Håvard Rue
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:333-3432014-11-17RePEc:oup:biomet
article
Modified estimating functions
In a parametric model the maximum likelihood estimator of a parameter of interest &psgr; may be viewed as the solution to the equation l′-sub-p(&psgr;) = 0, where l-sub-p denotes the profile <?Pub Caret>loglikelihood function. It is well known that the estimating function l′-sub-p(&psgr;) is not unbiased and that this bias can, in some cases, lead to poor estimates of &psgr;. An alternative approach is to use the modified profile likelihood function, or an approximation to the modified profile likelihood function, which yields an estimating function that is approximately unbiased. In many cases, the maximum likelihood estimating functions are unbiased under more general assumptions than those used to construct the likelihood function, for example under first- or second-moment conditions. Although the likelihood function itself may provide valid estimates under moment conditions alone, the modified profile likelihood requires a full parametric model. In this paper, modifications to l′-sub-p(&psgr;) are presented that yield an approximately unbiased estimating function under more general conditions. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
333
343
Thomas A. Severini
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:71-862014-11-17RePEc:oup:biomet
article
A pseudolikelihood method for analyzing interval censored data
We introduce a method based on a pseudolikelihood ratio for estimating the distribution function of the survival time in a mixed-case interval censoring model. In a mixed-case model, an individual is observed a random number of times, and at each time it is recorded whether an event has happened or not. One seeks to estimate the distribution of time to event. We use a Poisson process as the basis of a likelihood function to construct a pseudolikelihood ratio statistic for testing the value of the distribution function at a fixed point, and show that this converges under the null hypothesis to a known limit distribution, that can be expressed as a functional of different convex minorants of a two-sided Brownian motion process with parabolic drift. Construction of confidence sets then proceeds by standard inversion. The computation of the confidence sets is simple, requiring the use of the pool-adjacent-violators algorithm or a standard isotonic regression algorithm. We also illustrate the superiority of the proposed method over competitors based on resampling techniques or on the limit distribution of the maximum pseudolikelihood estimator, through simulation studies, and illustrate the different methods on a dataset involving time to <sc>HIV</sc> seroconversion in a group of haemophiliacs. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
71
86
http://hdl.handle.net/10.1093/biomet/asm011
application/pdf
Access to full text is restricted to subscribers.
Bodhisattva Sen
Moulinath Banerjee
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:230-2372014-11-17RePEc:oup:biomet
article
Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys
<?Pub Caret> Design weights in surveys are often adjusted to accommodate auxiliary information and to meet pre-specified range restrictions, typically via some ad hoc algorithmic adjustment to a generalised regression estimator. In this paper, we present a simple solution to this problem using empirical likelihood methods or generalised regression. We first develop algorithms for computing empirical likelihood estimators and model-calibrated empirical likelihood estimators. The first algorithm solves the computational problem of the empirical likelihood method in general, both in survey and non-survey settings, and theoretically guarantees its convergence. The second exploits properties of the model-calibration method and is particularly simple. The algorithms are adapted for handling benchmark constraints and pre-specified range restrictions on the weight adjustments. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
230
237
J. Chen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:773-7892014-11-17RePEc:oup:biomet
article
On the behaviour of marginal and conditional AIC in linear mixed models
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, <sc>aic</sc>, have been used, based either on the marginal or on the conditional distribution. We show that the marginal <sc>aic</sc> is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional <sc>aic</sc>, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional <sc>aic</sc>, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package (R Development Core Team, 2010) is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
773
789
http://hdl.handle.net/10.1093/biomet/asq042
application/pdf
Access to full text is restricted to subscribers.
Sonja Greven
Thomas Kneib
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:615-6252014-11-17RePEc:oup:biomet
article
Partial inverse regression
In regression with a vector of quantitative predictors, sufficient dimension reduction methods can effectively reduce the predictor dimension, while preserving full regression information and assuming no parametric model. However, all current reduction methods require the sample size n to be greater than the number of predictors p. It is well known that partial least squares can deal with problems with n < p. We first establish a link between partial least squares and sufficient dimension reduction. Motivated by this link, we then propose a new dimension reduction method, entitled partial inverse regression. We show that its sample estimator is consistent, and that its performance is similar to or superior to partial least squares when n < p, especially when the regression model is nonlinear or heteroscedastic. An example involving the spectroscopy analysis of biscuit dough is also given. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
615
625
http://hdl.handle.net/10.1093/biomet/asm043
application/pdf
Access to full text is restricted to subscribers.
Lexin Li
R. Dennis Cook
Chih-Ling Tsai
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:423-4342014-11-17RePEc:oup:biomet
article
Measures for designs in experiments with correlated errors
In this paper we consider optimal design of experiments in the case of correlated observations. We use and further develop the concept of design measures introduced by Pázman & Müller (1998) for the construction of a simple, quick and elegant design algorithm. We support the construction of this algorithm for a general correlation structure by an interpretation in terms of norms. Examples demonstrate that our results are useful for generating exact designs by sampling from the obtained design measures. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
423
434
Werner G. Müller
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:85-982014-11-17RePEc:oup:biomet
article
Covariance matrix selection and estimation via penalised normal likelihood
We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-sub-1 and L-sub-2 penalities are shown to be closely related to Tibshirani's (1996) <EM t="s">LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-sub-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 × 102 covariance matrix. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
85
98
http://hdl.handle.net/10.1093/biomet/93.1.85
text/html
Access to full text is restricted to subscribers.
Jianhua Z. Huang
Naiping Liu
Mohsen Pourahmadi
Linxu Liu
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:107-1222014-11-17RePEc:oup:biomet
article
Analysis of least absolute deviation
We develop a unified L<sub>1</sub>-based analysis-of-variance-type method for testing linear hypotheses. Like the classical L<sub>2</sub>-based analysis of variance, the method is coordinate-free in the sense that it is invariant under any linear transformation of the covariates or regression parameters. Moreover, it allows singular design matrices and heterogeneous error terms. A simple approximation using stochastic perturbation is proposed to obtain cut-off values for the resulting test statistics. Both test statistics and distributional approximations can be computed using standard linear programming. An asymptotic theory is derived for the method. Special cases of one- and multi-way analysis of variance and analysis of covariance models are worked out in detail. The main results of this paper can be extended to general quantile regression. Extensive simulations show that the method works well in practical settings. The method is also applied to a dataset from General Social Surveys. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
107
122
http://hdl.handle.net/10.1093/biomet/asm082
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Zhiliang Ying
Hong Zhang
Lincheng Zhao
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:769-7862014-11-17RePEc:oup:biomet
article
Bayesian Nonparametric Estimation of the Probability of Discovering New Species
We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
769
786
http://hdl.handle.net/10.1093/biomet/asm061
application/pdf
Access to full text is restricted to subscribers.
Antonio Lijoi
Ramsés H. Mena
Igor Prünster
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:641-6542014-11-17RePEc:oup:biomet
article
Dimension reduction and predictor selection in semiparametric models
Dimension reduction in semiparametric regressions includes construction of informative linear combinations and selection of contributing predictors. To reduce the predictor dimension in semiparametric regressions, we propose an ℓ<sub>1</sub>-minimization of sliced inverse regression with the Dantzig selector, and establish a non-asymptotic error bound for the resulting estimator. We also generalize the regularization concept to sliced inverse regression with an adaptive Dantzig selector. This ensures that all contributing predictors are selected with high probability, and that the resulting estimator is asymptotically normal even when the predictor dimension diverges to infinity. Numerical studies confirm our theoretical observations and demonstrate that our proposals are superior to existing estimators in terms of both dimension reduction and predictor selection. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/ast005
application/pdf
Access to full text is restricted to subscribers.
Zhou Yu
Liping Zhu
Heng Peng
Lixing Zhu
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:341-3532014-11-17RePEc:oup:biomet
article
Rank-based inference for the accelerated failure time model
A broad class of <?Pub Caret>rank-based monotone estimating functions is developed for the semiparametric accelerated failure time model with censored observations. The corresponding estimators can be obtained via linear programming, and are shown to be consistent and asymptotically normal. The limiting covariance matrices can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. The new estimators represent consistent roots of the non-monotone estimating equations based on the familiar weighted log-rank statistics. Simulation studies demonstrate that the proposed methods perform well in practical settings. Two real examples are provided. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
341
353
Zhezhen Jin
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:779-7982014-11-17RePEc:oup:biomet
article
A multi-dimensional scaling approach to shape analysis
We propose an alternative to Kendall's shape space for reflection shapes of configurations in <inline-formula><inline-graphic xlink:href="asn050ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> with k labelled vertices, where reflection shape consists of all the geometric information that is invariant under compositions of similarity and reflection transformations. The proposed approach embeds the space of such shapes into the space <inline-formula><inline-graphic xlink:href="asn050ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> of (k - 1) × (k - 1) real symmetric positive semidefinite matrices, which is the closure of an open subset of a Euclidean space, and defines mean shape as the natural projection of Euclidean means in <inline-formula><inline-graphic xlink:href="asn050ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> on to the embedded copy of the shape space. This approach has strong connections with multi-dimensional scaling, and the mean shape so defined gives good approximations to other commonly used definitions of mean shape. We also use standard perturbation arguments for eigenvalues and eigenvectors to obtain a central limit theorem which then enables the application of standard statistical techniques to shape analysis in two or more dimensions. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
779
798
http://hdl.handle.net/10.1093/biomet/asn050
application/pdf
Access to full text is restricted to subscribers.
Ian L. Dryden
Alfred Kume
Huiling Le
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:65-802014-11-17RePEc:oup:biomet
article
Particle approximations of the score and observed information matrix in state space models with application to parameter estimation
Particle methods are popular computational tools for Bayesian inference in nonlinear non-Gaussian state space models. For this class of models, we present two particle algorithms to compute the score vector and observed information matrix recursively. The first algorithm is implemented with computational complexity <inline-formula><inline-graphic xlink:href="ASQ062IM1" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the second with complexity <inline-formula><inline-graphic xlink:href="ASQ062IM2" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where N is the number of particles. Although cheaper, the performance of the <inline-formula><inline-graphic xlink:href="ASQ062IM3" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method degrades quickly, as it relies on the approximation of a sequence of probability distributions whose dimension increases linearly with time. In particular, even under strong mixing assumptions, the variance of the estimates computed with the <inline-formula><inline-graphic xlink:href="ASQ062IM4" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method increases at least quadratically in time. The more expensive <inline-formula><inline-graphic xlink:href="ASQ062IM5" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method relies on a nonstandard particle implementation and does not suffer from this rapid degradation. It is shown how both methods can be used to perform batch and recursive parameter estimation. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
65
80
http://hdl.handle.net/10.1093/biomet/asq062
application/pdf
Access to full text is restricted to subscribers.
George Poyiadjis
Arnaud Doucet
Sumeetpal S. Singh
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:149-1622014-11-17RePEc:oup:biomet
article
Bayesian nonparametric functional data analysis through density estimation
In many modern experimental settings, observations are obtained in the form of functions and interest focuses on inferences about a collection of such functions. We propose a hierarchical model that allows us simultaneously to estimate multiple curves nonparametrically by using dependent Dirichlet process mixtures of Gaussian distributions to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of conductivity and temperature at depth data in the north Atlantic. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
149
162
http://hdl.handle.net/10.1093/biomet/asn054
application/pdf
Access to full text is restricted to subscribers.
Abel Rodríguez
David B. Dunson
Alan E. Gelfand
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:313-3342014-11-17RePEc:oup:biomet
article
Inference on fractal processes using multiresolution approximation
We consider Bayesian inference via Markov chain Monte Carlo for a variety of fractal Gaussian processes on the real line. These models have unknown parameters in the covariance matrix, requiring inversion of a new covariance matrix at each Markov chain Monte Carlo iteration. The processes have no suitable independence properties so this becomes computationally prohibitive. We surmount these difficulties by developing a computational algorithm for likelihood evaluation based on a 'multiresolution approximation' to the original process. The method is computationally very efficient and widely applicable, making likelihood-based inference feasible for large datasets. A simulation study indicates that this approach leads to accurate estimates for underlying parameters in fractal models, including fractional Brownian motion and fractional Gaussian noise, and functional parameters in the recently introduced multifractional Brownian motion. We apply the method to a variety of real datasets and illustrate its application to prediction and to model selection. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
313
334
http://hdl.handle.net/10.1093/biomet/asm025
application/pdf
Access to full text is restricted to subscribers.
Kenneth Falconer
Carmen Fernández
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:673-6892014-11-17RePEc:oup:biomet
article
Optimal adaptive randomized designs for clinical trials
Optimal decision-analytic designs are deterministic. Such designs are appropriately criticized in the context of clinical trials because they are subject to assignment bias. On the other hand, balanced randomized designs may assign an excessive number of patients to a treatment arm that is performing relatively poorly. We propose a compromise between these two extremes, one that achieves some of the good characteristics of both. We introduce a constrained optimal adaptive design for a fully sequential randomized clinical trial with k arms and n patients. An r-design is one for which, at each allocation, each arm has probability at least r of being chosen, 0 ⩽ r ⩽ 1/k. An optimal design among all r-designs is called r-optimal. An r<sub>1</sub>-design is also an r<sub>2</sub>-design if r<sub>1</sub> ⩾ r<sub>2</sub>. A design without constraint is the special case r = 0 and a balanced randomized design is the special case r = 1/k. The optimization criterion is to maximize the expected overall utility in a Bayesian decision-analytic approach, where utility is the sum over the utilities for individual patients over a 'patient horizon' N. We prove analytically that there exists an r-optimal design such that each patient is assigned to a particular one of the arms with probability 1 − (k − 1)r, and to the remaining arms with probability r. We also show that the balanced design is asymptotically r-optimal for any given r, 0 ⩽ r < 1/k, as N/n → ∞. This implies that every r-optimal design is asymptotically optimal without constraint. Numerical computations using backward induction for k = 2 arms show that, in general, this asymptotic optimality feature for r-optimal designs can be accomplished with moderate trial size n if the patient horizon N is large relative to n. We also show that, in a trial with an r-optimal design, r < 1/2, fewer patients are assigned to an inferior arm than when following a balanced design, even for r-optimal designs having the same statistical power as a balanced design. We discuss extensions to various clinical trial settings. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
673
689
http://hdl.handle.net/10.1093/biomet/asm049
application/pdf
Access to full text is restricted to subscribers.
Yi Cheng
Donald A. Berry
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:221-2282014-11-17RePEc:oup:biomet
article
On p-values for smooth components of an extended generalized additive model
The problem of testing smooth components of an extended generalized additive model for equality to zero is considered. Confidence intervals for such components exhibit good across-the-function coverage probabilities if based on the approximate result <inline-formula><inline-graphic xlink:href="ASS048IM1" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where f is the vector of evaluated values for the smooth component of interest and V <sub>f</sub> is the covariance matrix for f according to the Bayesian view of the smoothing process. Based on this result, a Wald-type test of f=0 is proposed. It is shown that care must be taken in selecting the rank used in the test statistic. The method complements previous work by extending applicability beyond the Gaussian case, while considering tests of zero effect rather than testing the parametric hypothesis given by the null space of the component's smoothing penalty. The proposed p-values are routine and efficient to compute from a fitted model, without requiring extra model fits or null distribution simulation. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
221
228
http://hdl.handle.net/10.1093/biomet/ass048
application/pdf
Access to full text is restricted to subscribers.
Simon N. Wood
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:73-842014-11-17RePEc:oup:biomet
article
Confidence regions when the Fisher information is zero
We examine the asymptotic behaviour of confidence regions in identifiable one-dimensional parametric models with smooth likelihood function and information equal to zero at a critical point of the parameter space. Confidence regions are based on inversion of the likelihood ratio test statistic and of some common forms of the score and Wald test statistics. For fixed parameter values other than the critical point, all these statistics have limiting x-super-2-sub-(1) distributions, but for most of them the convergence is not uniform near the critical point. When it is not, confidence regions based on inverting the tests, using the x-super-2-sub-(1) approximation, do not asymptotically have the nominal level. The exception to this lack of locally uniform convergence occurs with the score test standardised by expected, rather than observed, information. For the regions based on the score test standardised by observed information and on the likelihood ratio test, conservative procedures that do not rely on the x-super-2-sub-(1) approximation can be developed, but they are much too conservative near the critical parameter value. The regions based on the Wald tests have asymptotic level less than ½, regardless of the procedure used. Our results suggest that no procedure based solely on the likelihood function will be satisfactory. Whether or not this is the case is an open problem. A simulation study illustrates the results of this paper. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
73
84
Matteo Bottai
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1006-10122014-11-17RePEc:oup:biomet
article
Marginal log-linear parameterization of conditional independence models
Models defined by a set of conditional independence restrictions play an important role in statistical theory and applications, especially, but not only, in graphical modelling. In this paper we identify a subclass of these consisting of hierarchical marginal log-linear models, as defined by Bergsma & Rudas (2002a). Such models are smooth, which implies the applicability of standard asymptotic theory and simplifies interpretation. Furthermore, we give a marginal log-linear parameterization and a minimal specification of the models in the subclass, which implies the applicability of standard methods to compute maximum likelihood estimates and simplifies the calculation of the degrees of freedom of chi-squared statistics to test goodness-of-fit. The utility of the results is illustrated by applying them to block-recursive Markov models associated with chain graphs. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1006
1012
http://hdl.handle.net/10.1093/biomet/asq037
application/pdf
Access to full text is restricted to subscribers.
Tamás Rudas
Wicher P. Bergsma
Renáta Németh
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:827-8412014-11-17RePEc:oup:biomet
article
Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling
We consider parameter-driven models of time series of counts, where the observations are assumed to arise from a Poisson distribution with a mean changing over time according to a latent process. Estimation of these models is carried out within a Bayesian framework using data augmentation and Markov chain Monte Carlo methods. We suggest a new auxiliary mixture sampler, which possesses a Gibbsian transition kernel, where we draw from full conditional distributions belonging to standard distribution families only. Emphasis lies on application to state space modelling of time series of counts, but we show that auxiliary mixture sampling may be applied to a wider range of parameter-driven models, including random-effects models and panel data models based on the Poisson distribution. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
827
841
http://hdl.handle.net/10.1093/biomet/93.4.827
text/html
Access to full text is restricted to subscribers.
Sylvia FrüHwirth-Schnatter
Helga Wagner
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:901-9182014-11-17RePEc:oup:biomet
article
Estimation of latent factors for high-dimensional time series
This paper deals with the dimension reduction of high-dimensional time series based on a lower-dimensional factor process. In particular, we allow the dimension of time series N to be as large as, or even larger than, the length of observed time series T. The estimation of the factor loading matrix and the factor process itself is carried out via an eigenanalysis of a N×N non-negative definite matrix. We show that when all the factors are strong in the sense that the norm of each column in the factor loading matrix is of the order N-super-1/2, the estimator of the factor loading matrix is weakly consistent in L<sub>2</sub>-norm with the convergence rate independent of N. Thus the curse is cancelled out by the blessing of dimensionality. We also establish the asymptotic properties of the estimators when factors are not strong. The proposed method together with the asymptotic properties are illustrated in a simulation study. An application to an implied volatility data set, with a trading strategy derived from the fitted factor model, is also reported. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
901
918
http://hdl.handle.net/10.1093/biomet/asr048
application/pdf
Access to full text is restricted to subscribers.
Clifford Lam
Qiwei Yao
Neil Bathia
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:111-1242014-11-17RePEc:oup:biomet
article
Conditional simulation of max-stable processes
Since many environmental processes are spatial in extent, a single extreme event may affect several locations, and the spatial dependence must be taken into account in an appropriate way. This paper proposes a framework for conditional simulation of max-stable processes and gives closed forms for the regular conditional distributions of Brown--Resnick and Schlather processes. We test the method on simulated data and present applications to extreme rainfall around Zurich and extreme temperatures in Switzerland. The proposed framework provides accurate conditional simulations and can handle problems of realistic size. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
111
124
http://hdl.handle.net/10.1093/biomet/ass067
application/pdf
Access to full text is restricted to subscribers.
C. Dombry
F. Éyi-Minko
M. Ribatet
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:494-4962014-11-17RePEc:oup:biomet
article
Dimension reduction in time series and the dynamic factor model
This note shows that the dimension reduction method proposed by Li & Shedden (2002) is equivalent to the dynamic factor model introduced by Peña & Box (1987). Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
494
496
http://hdl.handle.net/10.1093/biomet/asp009
application/pdf
Access to full text is restricted to subscribers.
Daniel Peña
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:19-352014-11-17RePEc:oup:biomet
article
Model selection and estimation in the Gaussian graphical model
We propose penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model. The methods lead to a sparse and shrinkage estimator of the concentration matrix that is positive definite, and thus conduct model selection and estimation simultaneously. The implementation of the methods is nontrivial because of the positive definite constraint on the concentration matrix, but we show that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization. We propose a <sc>BIC</sc>-type criterion for the selection of the tuning parameter in the penalized likelihood methods. The connection between our methods and existing methods is illustrated. Simulations and real examples demonstrate the competitive performance of the new methods. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
19
35
http://hdl.handle.net/10.1093/biomet/asm018
application/pdf
Access to full text is restricted to subscribers.
Ming Yuan
Yi Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:591-6002014-11-17RePEc:oup:biomet
article
Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models
A semiparametric transformation model comprises a parametric component for covariate effects and a nonparametric component for the baseline hazard/intensity. The Breslow-type estimator has been proposed for estimating the nonparametric component in some inefficient estimation procedures. We show that introducing weights into this estimator leads to nonparametric maximum likelihood estimation, with the weights depending on the martingale residuals. The weighted Breslow-type estimator suggests an iterative reweighting algorithm for nonparametric maximum likelihood estimation, which can be implemented by a weighted variant of the existing algorithms for inefficient estimation, and can be computationally more efficient than an <sc>em</sc>-type algorithm. The weighting idea is further extended to semiparametric transformation models with mismeasured covariates. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
591
600
http://hdl.handle.net/10.1093/biomet/asp032
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:742-7462014-11-17RePEc:oup:biomet
article
Simes' procedure is 'valid on average'
Although Simes' modification of the Bonferroni procedure tends to perform very well, albeit often being slightly liberal for negatively dependent hypotheses, there are special cases where it fails more dramatically. We prove that these special cases are indeed special, applying only to specific significance levels, and obtain a strong bound on the average deviation of the Simes corrected P-value from the true probability over any interval of P-values. From this, it is argued that Simes' procedure should be expected to perform well except for pathological examples. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
742
746
http://hdl.handle.net/10.1093/biomet/93.3.742
text/html
Access to full text is restricted to subscribers.
Einar Andreas Rødland
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:327-3392014-11-17RePEc:oup:biomet
article
Likelihood for component parameters
For a statistical model with data, likelihood for the scalar or vector full parameter &thgr;, of dimension p say, is typically well defined and easily computed. In this paper, we investigate likelihood for a component parameter &psgr;(&thgr;) of dimension d < p and make use of the recent likelihood theory that has been successful in producing highly accurate third-order p-values for scalar parameters of continuous models. The theory leads under moderate regularity to a definitive third-order determination of likelihood for a component parameter &psgr;(&thgr;) of dimension d, where 1 <= d <= p. We use the simple location model on the plane with standard normal errors to motivate the development. The example exhibits most of the key characteristics of the general case and the recent theory then extends the determination of likelihood to the general context. For the scalar interest parameter case with d = 1, the usual determinations are typically of second-order accuracy; the example indicates how the new determination achieves third-order accuracy. The implementation is straightforward and uses familiar ingredients to other determinations, such as the full maximum likelihood value &thgr;ˆ, the constrained value &thgr;˜-sub-&psgr; given &psgr;(&thgr;) = &psgr;, and the observed information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;) for a complementing nuisance parameter &lgr;(&thgr;). It does however require a special version of the nuisance information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;), a version calibrated relative to a symmetric choice of the exponential-type reparameterisation &phgr;(&thgr;) underlying the recent theory, but this is easily computed. Various examples are given and the motivating example is discussed in detail. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
327
339
D. A. S. Fraser
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:233-2382014-11-17RePEc:oup:biomet
article
Some theory for constructing minimum aberration fractional factorial designs
Minimum aberration is the most established criterion for selecting a regular fractional factorial design of maximum resolution. Minimum aberration designs for n runs and n/2 <= m < n factors have previously been constructed using the novel idea of complementary designs. In this paper, an alternative method of construction is developed by relating the wordlength pattern of designs to the so-called 'confounding between experimental runs'. This allows minimum aberration designs to be constructed for n runs and 5n/16 <= m <= n/2 factors as well as for n/2 <= m < n. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
233
238
Neil A. Butler
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:569-5842014-11-17RePEc:oup:biomet
article
Dimension reduction in regression without matrix inversion
Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient dimension reduction provides a promising approach to such problems, by restricting attention to d < n linear combinations of the original p predictors. However, standard methods of sufficient dimension reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
569
584
http://hdl.handle.net/10.1093/biomet/asm038
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Bing Li
Francesca Chiaromonte
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:691-7032014-11-17RePEc:oup:biomet
article
Adaptive Lasso for Cox's proportional hazards model
We investigate the variable selection problem for Cox's proportional hazards model, and propose a unified model selection and estimation procedure with desired theoretical properties and computational convenience. The new method is based on a penalized log partial likelihood with the adaptively weighted L<sub>1</sub> penalty on regression coefficients, providing what we call the adaptive Lasso estimator. The method incorporates different penalties for different coefficients: unimportant variables receive larger penalties than important ones, so that important variables tend to be retained in the selection process, whereas unimportant variables are more likely to be dropped. Theoretical properties, such as consistency and rate of convergence of the estimator, are studied. We also show that, with proper choice of regularization parameters, the proposed estimator has the oracle properties. The convex optimization nature of the method leads to an efficient algorithm. Both simulated and real examples show that the method performs competitively. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
691
703
http://hdl.handle.net/10.1093/biomet/asm037
application/pdf
Access to full text is restricted to subscribers.
Hao Helen Zhang
Wenbin Lu
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:129-1432014-11-17RePEc:oup:biomet
article
The discrimination power of projection pursuit with different density estimators
<?Pub Caret> We explore the properties of projection pursuit discriminant analysis. This discriminant method is very powerful but relies heavily on a univariate density estimate. We show that the procedure based on wavelets maintains the same rate of convergence as with univariate wavelet density estimation. We also show the Bayes risk strong consistency of both the kernel- and wavelet-based methods. Simulated data and real data concerning character recognition show that the method is effective and robust against the curse of dimensionality. The wavelet alternative seems more likely than the kernel counterpart to find an interesting projection. Wavelets are often criticised for giving too wiggly an estimate and for being too localised to give good global properties. In the above context, these potential drawbacks do not weaken the method but the use of wavelets seems to enhance it. A multiple projection generalisation is also considered. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
129
143
Olivier Renaud
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:655-6682014-11-17RePEc:oup:biomet
article
Spherical regression
Methods are introduced for regressing points on the surface of one sphere on points on another. Complex variables and stereographic projection are used to deal with theoretical problems of directional statistics much as they have been used historically to deal with problems in non-Euclidean geometry. The complex plane harbours the group of Möbius transformations, and stereographic projection is used as a bridge to map these Möbius transforms to regression link functions on the surface of a unit sphere. A special form for these links is introduced which employs the complex plane and stereographic projection to effect angular scale changes on the sphere. The family of special forms is closed under orthogonal transformations of the dependent variable and Möbius transformations of the independent variable, and incorporates independence and proper and improper rotations as special cases. Parameter estimation and inference are exemplified using the von Mises--Fisher spherical distribution and vectorcardiogram data. All statistical results and calculations have been formulated in the real domain. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
655
668
T. D. Downs
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:513-5182014-11-17RePEc:oup:biomet
article
Optimal designs for the emax, log-linear and exponential models
We derive locally D- and ED<sub>p</sub>-optimal designs for the exponential, log-linear and three-parameter emax models. For each model the locally D- and ED<sub>p</sub>-optimal designs are supported at the same set of points, while the corresponding weights are different. This indicates that for a given model, D-optimal designs are efficient for estimating the smallest dose that achieves 100p% of the maximum effect in the observed dose range. Conversely, ED<sub>p</sub>-optimal designs also yield good D-efficiencies. We illustrate the results using several examples and demonstrate that locally D- and ED<sub>p</sub>-optimal designs for the emax, log-linear and exponential models are relatively robust with respect to misspecification of the model parameters. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
513
518
http://hdl.handle.net/10.1093/biomet/asq020
application/pdf
Access to full text is restricted to subscribers.
H. Dette
C. Kiss
M. Bevanda
F. Bretz
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:663-6742014-11-17RePEc:oup:biomet
article
On the stick-breaking representation of normalized inverse Gaussian priors
Random probability measures are the main tool for Bayesian nonparametric inference, with their laws acting as prior distributions. Many well-known priors used in practice admit different, though equivalent, representations. In terms of computational convenience, stick-breaking representations stand out. In this paper we focus on the normalized inverse Gaussian process and provide a completely explicit stick-breaking representation for it. This result is of interest both from a theoretical viewpoint and for statistical practice. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
663
674
http://hdl.handle.net/10.1093/biomet/ass023
application/pdf
Access to full text is restricted to subscribers.
S. Favaro
A. Lijoi
I. Prünster
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:933-9442014-11-17RePEc:oup:biomet
article
Some design properties of a rejective sampling procedure
Occasionally, a selected probability sample may appear undesirable with respect to the available auxiliary information. In such a situation, the practitioner might consider rejecting the sample and selecting a new set of sample elements. We consider a procedure in which the probability sample is rejected unless the sample mean of an auxiliary vector is within a specified distance of the population mean. It is proven that the large sample mean and variance of the regression estimator for the rejective sample are the same as those of the regression estimator for the original selection procedure. Likewise, the usual estimator of variance for the regression estimator is appropriate for the rejective sample. In a Monte Carlo experiment, the large sample properties hold for relatively small samples and the Monte Carlo results are in agreement with the theoretical orders of approximation. The efficiency effect of the described rejective sampling is o(n<sub>N</sub>-super- - 1, where n<sub>N</sub> is the expected sample size, but the effect can be important for particular samples. For example, rejective sampling can be used to eliminate those samples that give negative weights for the regression estimator. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
933
944
http://hdl.handle.net/10.1093/biomet/asp042
application/pdf
Access to full text is restricted to subscribers.
Wayne A. Fuller
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:61-702014-11-17RePEc:oup:biomet
article
Interval censoring: identifiability and the constant-sum property
The constant-sum property given in Oller et al. (2004) for censoring models justifies the use of a simplified likelihood to obtain the nonparametric maximum likelihood estimator of the lifetime distribution. In this paper we study the relevance of the constant-sum property in the identifiability of the lifetime distribution. We show that the lifetime distribution is not identifiable outside the class of constant-sum models. We also show that the lifetime probabilities assigned to the observable intervals are identifiable inside the class of constant-sum models. We illustrate all these notions with several examples. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
61
70
http://hdl.handle.net/10.1093/biomet/asm002
application/pdf
Access to full text is restricted to subscribers.
Ramon Oller
Guadalupe Gómez
M. Luz Calle
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:551-5662014-11-17RePEc:oup:biomet
article
Penalized Bregman divergence for large-dimensional regression and classification
Regularization methods are characterized by loss functions measuring data fits and penalty terms constraining model parameters. The commonly used quadratic loss is not suitable for classification with binary responses, whereas the loglikelihood function is not readily applicable to models where the exact distribution of observations is unknown or not fully specified. We introduce the penalized Bregman divergence by replacing the negative loglikelihood in the conventional penalized likelihood with Bregman divergence, which encompasses many commonly used loss functions in the regression analysis, classification procedures and machine learning literature. We investigate new statistical properties of the resulting class of estimators with the number p<sub>n</sub> of parameters either diverging with the sample size n or even nearly comparable with n, and develop statistical inference tools. It is shown that the resulting penalized estimator, combined with appropriate penalties, achieves the same oracle property as the penalized likelihood estimator, but asymptotically does not rely on the complete specification of the underlying distribution. Furthermore, the choice of loss function in the penalized classifiers has an asymptotically relatively negligible impact on classification performance. We illustrate the proposed method for quasilikelihood regression and binary classification with simulation evaluation and real-data application. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
551
566
http://hdl.handle.net/10.1093/biomet/asq033
application/pdf
Access to full text is restricted to subscribers.
Chunming Zhang
Yuan Jiang
Yi Chai
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:323-3372014-11-17RePEc:oup:biomet
article
A generalized Dantzig selector with shrinkage tuning
The Dantzig selector performs variable selection and model fitting in linear regression. It uses an L<sub>1</sub> penalty to shrink the regression coefficients towards zero, in a similar fashion to the lasso. While both the lasso and Dantzig selector potentially do a good job of selecting the correct variables, they tend to overshrink the final coefficients. This results in an unfortunate trade-off. One can either select a high shrinkage tuning parameter that produces an accurate model but poor coefficient estimates or a low shrinkage parameter that produces more accurate coefficients but includes many irrelevant variables. We extend the Dantzig selector to fit generalized linear models while eliminating overshrinkage of the coefficient estimates, and develop a computationally efficient algorithm, similar in nature to least angle regression, to compute the entire path of coefficient estimates. A simulation study illustrates the advantages of our approach relative to others. We apply the methodology to two datasets. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
323
337
http://hdl.handle.net/10.1093/biomet/asp013
application/pdf
Access to full text is restricted to subscribers.
Gareth M. James
Peter Radchenko
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:363-3822014-11-17RePEc:oup:biomet
article
Estimating vaccine efficacy from small outbreaks
Let C-sub-V and C-sub-0 denote the number of cases among vaccinated and unvaccinated individuals, respectively, and let &ugr; be the proportion of individuals vaccinated. The quantity ê = 1--(1--&ugr;)C-sub-V/(&ugr;C-sub-0) = 1--(relative attack rate) is the most used estimator of the effectiveness of a vaccine to protect against infection. For a wide class of vaccine responses, a family of transmission models and three types of community settings, this paper investigates what ê actually estimates. It does so under the assumption that the community is large and the vaccination coverage is adequate to prevent major outbreaks of the infectious disease, so that only data on minor outbreaks are available. For a community of homogeneous individuals who mix uniformly, it is found that ê estimates a quantity with the interpretation of 1--(mean susceptibility, per contact, of vaccinees relative to unvaccinated individuals). We provide a standard error for ê in this setting. For a community with some heterogeneity ê can be a very misleading estimator of the effectiveness of the vaccine. When individuals have inherent differences, ê estimates a quantity that depends also on the inherent susceptibilities of different types of individual and on the vaccination coverage for different types. For a community of households, ê estimates a quantity that depends on the rate of transmission within households and on the reduction in infectivity induced by the vaccine. In communities that are structured, into households or age-groups, it is possible that ê estimates a value that is negative even when the vaccine reduces both susceptibility and infectivity. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
363
382
Niels G. Becker
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:623-6402014-11-17RePEc:oup:biomet
article
Adaptive Bayesian multivariate density estimation with Dirichlet mixtures
We show that rate-adaptive multivariate density estimation can be performed using Bayesian methods based on Dirichlet mixtures of normal kernels with a prior distribution on the kernel's covariance matrix parameter. We derive sufficient conditions on the prior specification that guarantee convergence to a true density at a rate that is minimax optimal for the smoothness class to which the true density belongs. No prior knowledge of smoothness is assumed. The sufficient conditions are shown to hold for the Dirichlet location mixture-of-normals prior with a Gaussian base measure and an inverse Wishart prior on the covariance matrix parameter. Locally Hölder smoothness classes and their anisotropic extensions are considered. Our study involves several technical novelties, including sharp approximation of finitely differentiable multivariate densities by normal mixtures and a new sieve on the space of such densities. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
623
640
http://hdl.handle.net/10.1093/biomet/ast015
application/pdf
Access to full text is restricted to subscribers.
Weining Shen
Surya T. Tokdar
Subhashis Ghosal
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:173-1822014-11-17RePEc:oup:biomet
article
Power of edge exclusion tests in graphical Gaussian models
Asymptotic multivariate normal approximations to the joint distributions of edge exclusion test statistics for saturated graphical Gaussian models are derived. Non-signed and signed square-root versions of the likelihood ratio, Wald and score test statistics are considered. Noncentral chi-squared approximations are also considered for the non-signed versions. These approximations are used to estimate the power of edge exclusion tests and an example is presented. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
173
182
http://hdl.handle.net/10.1093/biomet/92.1.173
text/html
Access to full text is restricted to subscribers.
M. Fátima Salgueiro
Peter W. F. Smith
John W. McDonald
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:401-4092014-11-17RePEc:oup:biomet
article
A type of restricted maximum likelihood estimator of variance components in generalised linear mixed models
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (REML) corrects this problem by using the likelihood of a set of residual contrasts and is generally considered superior. However, this original restricted maximum likelihood definition does not directly extend beyond linear models. We propose a REML-type estimator for generalised linear mixed models by correcting the bias in the profile score function of the variance components. The proposed estimator has the same consistency properties as the maximum likelihood estimator if the number of parameters in the mean and variance components models remains fixed. However, the estimator of the variance components has a smaller finite sample bias. A simulation study with a logistic mixed model shows <?Pub Caret>that the proposed estimator is effective in correcting the downward bias in the maximum likelihood estimator. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
401
409
J. G. Liao
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:807-8242014-11-17RePEc:oup:biomet
article
Most-predictive design points for functional data predictors
We suggest a way of reducing the very high dimension of a functional predictor, X, to a low number of dimensions chosen so as to give the best predictive performance. Specifically, if X is observed on a fine grid of design points t<sub>1</sub>,…, t<sub>r</sub>, we propose a method for choosing a small subset of these, say t<sub>i<sub>1</sub></sub>,…, t<sub>i<sub>k</sub></sub>, to optimize the prediction of a response variable, Y. The values t<sub>i<sub>j</sub></sub> are referred to as the most predictive design points, or covariates, for a given value of k, and are computed using information contained in a set of independent observations (X<sub>i</sub>, Y<sub>i</sub>) of (X, Y). The algorithm is based on local linear regression, and calculations can be accelerated using linear regression to preselect the design points. Boosting can be employed to further improve the predictive performance. We illustrate the usefulness of our ideas through simulations and examples drawn from chemometrics, and we develop theoretical arguments showing that the methodology can be applied successfully in a range of settings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
807
824
http://hdl.handle.net/10.1093/biomet/asq058
application/pdf
Access to full text is restricted to subscribers.
F. Ferraty
P. Hall
P. Vieu
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:587-6002014-11-17RePEc:oup:biomet
article
Robust functional estimation using the median and spherical principal components
We present robust estimators for the mean and the principal components of a stochastic process in <inline-formula><inline-graphic xlink:href="asn031ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>. Robustness and asymptotic properties of the estimators are studied theoretically, by simulation and by example. It is shown that the proposed estimators are generally more robust to outliers than the commonly used sample mean and principal components, although their properties depend on the spacings of the eigenvalues of the covariance function. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
587
600
http://hdl.handle.net/10.1093/biomet/asn031
application/pdf
Access to full text is restricted to subscribers.
Daniel Gervini
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:419-4332014-11-17RePEc:oup:biomet
article
Efficient scalable schemes for monitoring a large number of data streams
The sequential changepoint detection problem is studied in the context of global online monitoring of a large number of independent data streams. We are interested in detecting an occurring event as soon as possible, but we do not know when the event will occur, nor do we know which subset of data streams will be affected by the event. A family of scalable schemes is proposed based on the sum of the local cumulative sum, <sc>cusum</sc>, statistics from each individual data stream, and is shown to asymptotically minimize the detection delays for each and every possible combination of affected data streams, subject to the global false alarm constraint. The usefulness and limitations of our asymptotic optimality results are illustrated by numerical simulations and heuristic arguments. The Appendices contain a probabilistic result on the first epoch to simultaneous record values for multiple independent random walks. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
419
433
http://hdl.handle.net/10.1093/biomet/asq010
application/pdf
Access to full text is restricted to subscribers.
Y. Mei
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:403-4162014-11-17RePEc:oup:biomet
article
Maximum smoothed likelihood for multivariate mixtures
We introduce an algorithm for estimating the parameters in a finite mixture of completely unspecified multivariate components in at least three dimensions under the assumption of conditionally independent coordinate dimensions. We prove that this algorithm, based on a majorization-minimization idea, possesses a desirable descent property just as any <sc>em</sc> algorithm does. We discuss the similarities between our algorithm and a related one, the so-called nonlinearly smoothed <sc>em</sc> algorithm for the non-mixture setting. We also demonstrate via simulation studies that the new algorithm gives very similar results to another algorithm that has been shown empirically to be effective but that does not satisfy any descent property. We provide code for implementing the new algorithm in a publicly available R package. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
403
416
http://hdl.handle.net/10.1093/biomet/asq079
application/pdf
Access to full text is restricted to subscribers.
M. Levine
D. R. Hunter
D. Chauveau
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:261-2782014-11-17RePEc:oup:biomet
article
Variable selection in high-dimensional linear models: partially faithful distributions and the <sc>pc</sc>-simple algorithm
We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the <sc>pc</sc> algorithm (Spirtes et al., 2000), which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
261
278
http://hdl.handle.net/10.1093/biomet/asq008
application/pdf
Access to full text is restricted to subscribers.
P. Bühlmann
M. Kalisch
M. H. Maathuis
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:607-6222014-11-17RePEc:oup:biomet
article
Continuously additive models for nonlinear functional regression
We introduce continuously additive models, which can be viewed as extensions of additive regression models with vector predictors to the case of infinite-dimensional predictors. This approach produces a class of flexible functional nonlinear regression models, where random predictor curves are coupled with scalar responses. In continuously additive modelling, integrals taken over a smooth surface along graphs of predictor functions relate the predictors to the responses in a nonlinear fashion. We use tensor product basis expansions to fit the smooth regression surface that characterizes the model. In a theoretical investigation, we show that the predictions obtained from fitting continuously additive models are consistent and asymptotically normal. We also consider extensions to generalized responses. The proposed class of models outperforms existing functional regression models in simulations and real-data examples. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
607
622
http://hdl.handle.net/10.1093/biomet/ast004
application/pdf
Access to full text is restricted to subscribers.
Hans-Georg Müller
Yichao Wu
Fang Yao
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:603-6202014-11-17RePEc:oup:biomet
article
On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, <inline-formula><inline-graphic xlink:href="asq031ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, <inline-formula><inline-graphic xlink:href="asq031ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, based on the pseudolikelihood <inline-formula><inline-graphic xlink:href="asq031ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asq031ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> is a consistent estimator of <inline-formula><inline-graphic xlink:href="asq031ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, the nuisance parameter. We show that the asymptotic distribution of T under H<sub>0</sub> is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, <inline-formula><inline-graphic xlink:href="asq031ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, or the true value of the nuisance parameter, <inline-formula><inline-graphic xlink:href="asq031ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
603
620
http://hdl.handle.net/10.1093/biomet/asq031
application/pdf
Access to full text is restricted to subscribers.
Yong Chen
Kung-Yee Liang
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:423-4362014-11-17RePEc:oup:biomet
article
Goodness of fit of biplots and correspondence analysis
The present paper examines proportional goodness of fit to variables recorded on individuals, the variances and covariances of the variables, and the form and distances between individuals. No single plot displays all three optimally in the sense of least squares. However, even aspects which are non-optimally fitted by biplots and Benzecri plots often closely preserve the optimal fit. This is shown by means of a preservation-of-fit function which depends on the type of display and on the ratio of the second to the first singular value of the data matrix. This function is never below 0·5, so at least half the fit is always preserved, and it is close to 1 unless the ratio of the singular values is small. That explains the frequently observed similarity of the various biplots and the Benzecri plot and the fact that they usually lead to the same conclusions. It follows that in many applications it is reasonable to use either the symmetric biplot or the Benzecri plot or a compromise maximin preservation plot, and that the difference between these three is usually unimportant. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
423
436
K. Ruben Gabriel
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:75-892014-11-17RePEc:oup:biomet
article
Covariate-adjusted regression
We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate-adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/92.1.75
text/html
Access to full text is restricted to subscribers.
Damla Şenturk
Hans-Georg Muller
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:187-2042014-11-17RePEc:oup:biomet
article
Two-stage sampling from a prediction point of view when the cluster sizes are unknown
We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n<sub>0</sub> of sampled clusters, they differ significantly, but for large n<sub>0</sub>, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 - α and are slightly less than 1 - α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 - 2α, being raised to 1 - α for a modified interval based on the <inline-formula><inline-graphic xlink:href="asm098ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> distribution. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
187
204
http://hdl.handle.net/10.1093/biomet/asm098
application/pdf
Access to full text is restricted to subscribers.
Jan F. Bjørnstad
Elinor Ytterstad
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:999-10052014-11-17RePEc:oup:biomet
article
Positive Association Among Three Binary Variables and Cross-Product Ratios
We show that, when the three-way association level among the three binary variables, X, U<sub>1</sub> and U<sub>2</sub> is fixed, D<sub>P</sub> = pr(X = 1¦U<sub>1</sub> = 1) - pr(X = 1¦U<sub>1</sub> = 0) increases as the cross-product ratio of U<sub>1</sub> and U<sub>2</sub> increases under the assumption that X is positively associated with U<sub>1</sub> and U<sub>2</sub>. We then discuss some implications of this property. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
999
1005
http://hdl.handle.net/10.1093/biomet/asm075
application/pdf
Access to full text is restricted to subscribers.
Stephen E. Fienberg
Sung-Ho Kim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:141-1502014-11-17RePEc:oup:biomet
article
A moving average Cholesky factor model in covariance modelling for longitudinal data
We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have a moving average and log-innovation interpretation and are modelled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the coefficient estimates. Furthermore, we study a local search algorithm, computationally more efficient than traditional all subset selection, based on <sc>bic</sc> for model selection, and show its model selection consistency. Thus, a conjecture of Pan & MacKenzie (2003) is verified. We demonstrate the finite-sample performance of the method via analysis of data on CD4 trajectories and through simulations. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
141
150
http://hdl.handle.net/10.1093/biomet/asr068
application/pdf
Access to full text is restricted to subscribers.
Weiping Zhang
Chenlei Leng
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:139-1562014-11-17RePEc:oup:biomet
article
Covariate-adjusted precision matrix estimation with an application in genetical genomics
Motivated by analysis of genetical genomics data, we introduce a sparse high-dimensional multivariate regression model for studying conditional independence relationships among a set of genes adjusting for possible genetic effects. The precision matrix in the model specifies a covariate-adjusted Gaussian graph, which presents the conditional dependence structure of gene expression after the confounding genetic effects on gene expression are taken into account. We present a covariate-adjusted precision matrix estimation method using a constrained ℓ<sub>1</sub> minimization, which can be easily implemented by linear programming. Asymptotic convergence rates in various matrix norms and sign consistency are established for the estimators of the regression coefficients and the precision matrix, allowing both the number of genes and the number of the genetic variants to diverge. Simulation shows that the proposed method results in significant improvements in both precision matrix estimation and graphical structure selection when compared to the standard Gaussian graphical model assuming constant means. The proposed method is applied to yeast genetical genomics data for the identification of the gene network among a set of genes in the mitogen-activated protein kinase pathway. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
139
156
http://hdl.handle.net/10.1093/biomet/ass058
application/pdf
Access to full text is restricted to subscribers.
T. Tony Cai
Hongzhe Li
Weidong Liu
Jichun Xie
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:411-4262014-11-17RePEc:oup:biomet
article
Non-finite Fisher information and homogeneity: an EM approach
Even simple examples of finite mixture models can fail to fulfil the regularity conditions that are routinely assumed in standard parametric inference problems. Many methods have been investigated for testing for homogeneity in finite mixture models, for example, but all rely on regularity conditions including the finiteness of the Fisher information and the space of the mixing parameter being a compact subset of some Euclidean space. Very simple examples where such assumptions fail include mixtures of two geometric distributions and two exponential distributions, and, more generally, mixture models in scale distribution families. To overcome these difficulties, we propose and study an <sc>em</sc>-test statistic, which has a simple limiting distribution for examples in this paper. Simulations show that the <sc>em</sc>-test has accurate Type I errors and is more efficient than existing methods when they are applicable. A real example is included. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
411
426
http://hdl.handle.net/10.1093/biomet/asp011
application/pdf
Access to full text is restricted to subscribers.
P. Li
J. Chen
P. Marriott
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:375-3882014-11-17RePEc:oup:biomet
article
Risk-adjusted monitoring of time to event
Recently there has been interest in risk-adjusted cumulative sum charts, <sc>CUSUMs</sc>, to monitor the performance of e.g. hospitals, taking into account the heterogeneity of patients. Even though many outcomes involve time, only conventional regression models are commonly used. In this article we investigate how time to event models may be used for monitoring purposes. We consider monitoring using <sc>CUSUMs</sc> based on the partial likelihood ratio between an out-of-control state and an in-control state. We consider both proportional and non-proportional alternatives, as well as a head start. Against proportional alternatives, we present an analytic method of computing the expected number of observed events before stopping or the probability of stopping before a given observed number of events. In a stationary set-up, the former is roughly proportional to the average run length in calendar time. Adding a head start changes the threshold only slightly if the expected number of events until hitting is used as a criterion. However, it changes the threshold substantially if a false alarm probability is used. In simulation studies, charts based on survival analysis perform better than simpler monitoring schemes. We present one example from retail finance and one medical application. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
375
388
http://hdl.handle.net/10.1093/biomet/asq004
application/pdf
Access to full text is restricted to subscribers.
A. Gandy
J. T. Kvaløy
A. Bottle
F. Zhou
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:17-342014-11-17RePEc:oup:biomet
article
The multivariate beta process and an extension of the Polya tree model
We introduce a novel stochastic process that we term the multivariate beta process. The process is defined for modelling-dependent random probabilities and has beta marginal distributions. We use this process to define a probability model for a family of unknown distributions indexed by covariates. The marginal model for each distribution is a Polya tree prior. An important feature of the proposed prior is the easy centring of the nonparametric model around any parametric regression model. We use the model to implement nonparametric inference for survival distributions. The nonparametric model that we introduce can be adopted to extend the support of prior distributions for parametric regression models. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
17
34
http://hdl.handle.net/10.1093/biomet/asq072
application/pdf
Access to full text is restricted to subscribers.
Lorenzo Trippa
Peter Müller
Wesley Johnson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-1025a2014-07-28RePEc:oup:biomet
article
Amendments and Corrections
The paper included comparison of a 12-factor, 16-run design to randomly generated Latin hypercube designs and U-designs, with respect to the properties of their alias matrices. An error in a computer program led to incorrect computation of the properties of the alias matrix of the orthogonal design. A corrected version of Table 2 is provided here. The orthogonal Latin hypercube design still has better properties than the best of 100 random designs, but the differences are less striking than those in our original table. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025-a
text/html
Access to full text is restricted to subscribers.
David M. Steinberg
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:219-2252014-07-28RePEc:oup:biomet
article
Estimating genetic association parameters from family data
We consider the problem of estimating a parameter theta, reflecting association between a disease and genotypes of a genetic polymorphism, using nuclear family data. In many applications, some parental genotypes are missing, and the distribution of these genotypes is unknown. Since misspecification of this distribution can bias estimators for theta, we consider estimating functions that are unbiased, regardless of how the distribution is specified. We call the resulting estimators parental-genotype-robust. Rabinowitz (2002) has proposed a constrained optimisation method for obtaining locally optimal unbiased tests of the null hypothesis of no association. We use a similar method to derive estimating functions that yield parental-genotype-robust estimators with minimum variance in the class of all such estimators. We extend the estimating functions to obtain parental-genotype-robust estimators when theta is a vector of unknown parameters, and show that the estimating functions enjoy a certain optimality property. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
219
225
Alice S. Whittemore
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-505a2014-07-28RePEc:oup:biomet
article
"Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505-a
text/html
Access to full text is restricted to subscribers.
Shaun R. Seaman
Sylvia Richardson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10252014-07-28RePEc:oup:biomet
article
Amendments and Corrections
It has been brought to our attention that the implicit expression (6) for the estimator with general warping function had been derived earlier by B. Ronn, in an unpublished technical report of the Royal Veterinary and Agricultural University, Frederiksberg. However, the actual implementation and computation of the estimators are very different in our paper from in the technical report. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025
text/html
Access to full text is restricted to subscribers.
D. Gervini
T. Gasser
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-5052014-07-28RePEc:oup:biomet
article
"A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505
text/html
Access to full text is restricted to subscribers.
Albert W. Marshall
Ingram Olkin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:177-1932014-07-28RePEc:oup:biomet
article
Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data
For independent data, it is well known that kernel methods and spline methods are essentially asymptotically equivalent (Silverman, 1984). However, recent work of Welsh et al. (2002) shows that the same is not true for clustered/longitudinal data. Splines and conventional kernels are different in localness and ability to account for the within-cluster correlation. We show that a smoothing spline estimator is asymptotically equivalent to a recently proposed seemingly unrelated kernel estimator of Wang (2003) for any working covariance matrix. We show that both estimators can be obtained iteratively by applying conventional kernel or spline smoothing to pseudo-observations. This result allows us to study the asymptotic properties of the smoothing spline estimator by deriving its asymptotic bias and variance. We show that smoothing splines are consistent for an arbitrary working covariance and have the smallest variance when assuming the true covariance. We further show that both the seemingly unrelated kernel estimator and the smoothing spline estimator are nonlocal unless working independence is assumed but have asymptotically negligible bias. Their finite sample performance is compared through simulations. Our results justify the use of efficient, non-local estimators such as smoothing splines for clustered/longitudinal data. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
177
193
Xihong Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:240-2452014-07-28RePEc:oup:biomet
article
Revisiting simple linear regression with autocorrelated errors
This paper studies properties of ordinary and generalised least squares estimators in a simple linear regression with stationary autocorrelated errors. Explicit expressions for the variances of the regression parameter estimators are derived for some common time series autocorrelation structures, including a first-order autoregression and general moving averages. Applications of the results include confidence intervals and an example where the variance of the trend slope estimator does not increase with increasing autocorrelation. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
240
245
Jaechoul Lee
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:899-9142013-01-01RePEc:oup:biomet
article
Simultaneous supervised clustering and feature selection over a graph
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
899
914
http://hdl.handle.net/10.1093/biomet/ass038
application/pdf
Access to full text is restricted to subscribers.
Xiaotong Shen
Hsin-Cheng Huang
Wei Pan
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:945-9582013-01-01RePEc:oup:biomet
article
Penalized balanced sampling
Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz--Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz--Thompson estimation dominates a variety of standard strategies. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
945
958
http://hdl.handle.net/10.1093/biomet/ass033
application/pdf
Access to full text is restricted to subscribers.
F. J. Breidt
G. Chauvet
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:915-9282013-01-01RePEc:oup:biomet
article
On the sparsity of signals in a random sample
This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
915
928
http://hdl.handle.net/10.1093/biomet/ass039
application/pdf
Access to full text is restricted to subscribers.
Binyan Jiang
Wei-Liem Loh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:799-8112013-01-01RePEc:oup:biomet
article
Choosing trajectory and data type when classifying functional data
In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
799
811
http://hdl.handle.net/10.1093/biomet/ass011
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Tapabrata Maiti
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:865-8772013-01-01RePEc:oup:biomet
article
A two-stage dimension-reduction method for transformed responses and its applications
Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
865
877
http://hdl.handle.net/10.1093/biomet/ass042
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:775-7862013-01-01RePEc:oup:biomet
article
Classification based on a permanental process with cyclic approximation
We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2--3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
775
786
http://hdl.handle.net/10.1093/biomet/ass047
application/pdf
Access to full text is restricted to subscribers.
J. Yang
K. Miescke
P. McCullagh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:929-9442013-01-01RePEc:oup:biomet
article
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction
Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
929
944
http://hdl.handle.net/10.1093/biomet/ass044
application/pdf
Access to full text is restricted to subscribers.
James Y. Dai
Charles Kooperberg
Michael Leblanc
Ross L. Prentice
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:981-9882013-01-01RePEc:oup:biomet
article
Finite population estimators in stochastic search variable selection
Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz--Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
981
988
http://hdl.handle.net/10.1093/biomet/ass040
application/pdf
Access to full text is restricted to subscribers.
Merlise A. Clyde
Joyee Ghosh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:995-10002013-01-01RePEc:oup:biomet
article
Proportional mean residual life model for right-censored length-biased data
To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika 77, 409--10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
995
1000
http://hdl.handle.net/10.1093/biomet/ass049
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
Ying Qing Chen
Chong-Zhi Di
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:879-8982013-01-01RePEc:oup:biomet
article
Scaled sparse linear regression
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
879
898
http://hdl.handle.net/10.1093/biomet/ass043
application/pdf
Access to full text is restricted to subscribers.
Tingni Sun
Cun-Hui Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:763-7742013-01-01RePEc:oup:biomet
article
Testing one hypothesis twice in observational studies
In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
763
774
http://hdl.handle.net/10.1093/biomet/ass032
application/pdf
Access to full text is restricted to subscribers.
P. R. Rosenbaum
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:973-9802013-01-01RePEc:oup:biomet
article
Statistical properties of an early stopping rule for resampling-based multiple testing
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
973
980
http://hdl.handle.net/10.1093/biomet/ass051
application/pdf
Access to full text is restricted to subscribers.
Hui Jiang
Julia Salzman
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:1001-10072013-01-01RePEc:oup:biomet
article
An efficient empirical likelihood approach for estimating equations with missing data
We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
1001
1007
http://hdl.handle.net/10.1093/biomet/ass045
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Yongsong Qin
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:851-8642013-01-01RePEc:oup:biomet
article
Bidirectional discrimination with application to data visualization
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
851
864
http://hdl.handle.net/10.1093/biomet/ass029
application/pdf
Access to full text is restricted to subscribers.
Hanwen Huang
Yufeng Liu
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:959-9722013-01-01RePEc:oup:biomet
article
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
959
972
http://hdl.handle.net/10.1093/biomet/ass036
application/pdf
Access to full text is restricted to subscribers.
Ronald W. Butler
Douglas A. Bronson
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:813-8322013-01-01RePEc:oup:biomet
article
Dispersion operators and resistant second-order functional data analysis
Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M-test based on a spectrally truncated version of the Hilbert--Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
813
832
http://hdl.handle.net/10.1093/biomet/ass037
application/pdf
Access to full text is restricted to subscribers.
David Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:989-9942013-01-01RePEc:oup:biomet
article
Compatible weighted proper scoring rules
Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
989
994
http://hdl.handle.net/10.1093/biomet/ass046
application/pdf
Access to full text is restricted to subscribers.
P. G. M. Forbes
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:787-7982013-01-01RePEc:oup:biomet
article
Orthogonalization of vectors with minimal adjustment
Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
787
798
http://hdl.handle.net/10.1093/biomet/ass041
application/pdf
Access to full text is restricted to subscribers.
Paul H. Garthwaite
Frank Critchley
Karim Anaya-Izquierdo
Emmanuel Mubwandarikwa
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:833-8492013-01-01RePEc:oup:biomet
article
A geometric approach to projective shape and the cross ratio
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
833
849
http://hdl.handle.net/10.1093/biomet/ass055
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Kanti V. Mardia
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:757-7642014-06-14RePEc:oup:biomet
article
Empirical likelihood methods for two-dimensional shape analysis
We consider empirical likelihood for the mean similarity shape of objects in two dimensions described by labelled landmarks. The restriction to two dimensions permits the representation of preshapes as complex unit vectors. We focus on the use of empirical likelihood techniques for the construction of confidence regions for the mean shape and for testing the hypothesis of a common mean shape across several populations. Theoretical properties and computational details are discussed and the results of a simulation study are presented. Our results show that bootstrap calibrated empirical likelihood performs well in practice in the planar shape setting. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
757
764
http://hdl.handle.net/10.1093/biomet/asq028
application/pdf
Access to full text is restricted to subscribers.
Getulio J. A. Amaral
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:567-5842014-06-14RePEc:oup:biomet
article
Shape curves and geodesic modelling
A family of shape curves is introduced that is useful for modelling the changes in shape in a series of geometrical objects. The relationship between the preshape sphere and the shape space is used to define a general family of curves based on horizontal geodesics on the preshape sphere. Methods for fitting geodesics and more general curves in the non-Euclidean shape space of point sets are discussed, based on minimizing sums of squares of Procrustes distances. Likelihood-based inference is considered. We illustrate the ideas by carrying out statistical analysis of two-dimensional landmarks on rats' skulls at various times in their development and three-dimensional landmarks on lumbar vertebrae from three primate species. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
567
584
http://hdl.handle.net/10.1093/biomet/asq027
application/pdf
Access to full text is restricted to subscribers.
Kim Kenobi
Ian L. Dryden
Huiling Le
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:361-3742014-06-14RePEc:oup:biomet
article
Efficient estimation in multi-phase case-control studies
In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979), work by Breslow & Cain (1988), Scott & Wild (1991), Breslow & Holubkov (1997) and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations used in a Newton--Raphson or Fisher-scoring maximization of the prospective loglikelihood. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
361
374
http://hdl.handle.net/10.1093/biomet/asq009
application/pdf
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:765-7722014-06-14RePEc:oup:biomet
article
Strictly stationary solutions of autoregressive moving average equations
Necessary and sufficient conditions for the existence of a strictly stationary solution of the equations defining an autoregressive moving average process driven by an independent and identically distributed noise sequence are determined. No moment assumptions on the driving noise sequence are made. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
765
772
http://hdl.handle.net/10.1093/biomet/asq034
application/pdf
Access to full text is restricted to subscribers.
Peter J. Brockwell
Alexander Lindner
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:347-3602014-06-14RePEc:oup:biomet
article
A theory for testing hypotheses under covariate-adaptive randomization
The covariate-adaptive randomization method was proposed for clinical trials long ago but little theoretical work has been done for statistical inference associated with it. Practitioners often apply test procedures available for simple randomization, which is controversial since procedures valid under simple randomization may not be valid under other randomization schemes. In this paper, we provide some theoretical results for testing hypotheses after covariate-adaptive randomization. We show that one way to obtain a valid test procedure is to use a correct model between outcomes and covariates, including those used in randomization. We also show that the simple two sample t-test, without using any covariate, is conservative under covariate-adaptive biased coin randomization in terms of its Type I error, and that a valid bootstrap t-test can be constructed. The powers of several tests are examined theoretically and empirically. Our study provides guidance for applications and sheds light on further research in this area. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
347
360
http://hdl.handle.net/10.1093/biomet/asq014
application/pdf
Access to full text is restricted to subscribers.
Jun Shao
Xinxin Yu
Bob Zhong
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:405-4182014-06-14RePEc:oup:biomet
article
Interval estimation for drop-the-losers designs
In the first stage of a two-stage, drop-the-losers design, a candidate for the best treatment is selected. At the second stage, additional observations are collected to decide whether the candidate is actually better than the control. The design also allows the investigator to stop the trial for ethical reasons at the end of the first stage if there is already strong evidence of futility or superiority. Two types of tests have recently been developed, one based on the combined means and the other based on the combined p-values, but corresponding interval estimators are unavailable except in special cases. The problem is that, in most cases, the interval estimators depend on the mean configuration of all treatments in the first stage, which is unknown. In this paper, we prove a basic stochastic ordering lemma that enables us to bridge the gap between hypothesis testing and interval estimation. The proposed confidence intervals achieve the nominal confidence level in certain special cases. Simulations show that decisions based on our intervals are usually more powerful than those based on existing methods. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
405
418
http://hdl.handle.net/10.1093/biomet/asq003
application/pdf
Access to full text is restricted to subscribers.
Samuel S. Wu
Weizhen Wang
Mark C. K. Yang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:585-6012014-06-14RePEc:oup:biomet
article
A class of grouped Brunk estimators and penalized spline estimators for monotone regression
We study a class of monotone univariate regression estimators. We use B-splines to approximate an underlying regression function and estimate spline coefficients based on grouped data. We investigate asymptotic properties of two monotone estimators: a grouped Brunk estimator and a penalized monotone estimator. These estimators are consistent at the boundary and their mean square errors achieve optimal convergence rates under suitable assumptions of the true regression function. Asymptotic distributions are developed and are shown to be independent of spline degrees and the number of knots. Simulation results and car data illustrate performance of the proposed estimators. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asq029
application/pdf
Access to full text is restricted to subscribers.
Xiao Wang
Jinglai Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:647-6592014-06-14RePEc:oup:biomet
article
Sufficient cause interactions for categorical and ordinal exposures with three levels
Definitions are given for weak and strong sufficient cause interactions in settings in which the outcome is binary and in which there are two exposures of interest that are categorical or ordinal. Weak sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes some other value. Strong sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes any other value. Empirical conditions are derived for such interactions when exposures have two or three levels and are related to regression coefficients in linear and log-linear models. When the exposures are binary, the notions of a weak and a strong sufficient cause interaction coincide, but not when the exposures are categorical or ordinal. The results are applied to examples concerning gene-gene and gene-environment interactions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asq030
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:321-3322014-06-14RePEc:oup:biomet
article
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
321
332
http://hdl.handle.net/10.1093/biomet/asq006
application/pdf
Access to full text is restricted to subscribers.
D. Y. Lin
D. Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:305-3192014-06-14RePEc:oup:biomet
article
Semiparametric dimension reduction estimation for mean response with missing data
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
305
319
http://hdl.handle.net/10.1093/biomet/asq005
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Jing Qin
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:481-4962014-06-14RePEc:oup:biomet
article
Likelihood ratio statistics based on an integrated likelihood
An integrated likelihood depends only on the parameter of interest and the data, so it can be used as a standard likelihood function for likelihood-based inference. In this paper, the higher-order asymptotic properties of the signed integrated likelihood ratio statistic for a scalar parameter of interest are considered. These results are used to construct a modified integrated likelihood ratio statistic and to suggest a class of prior densities to use in forming the integrated likelihood. The properties of the integrated likelihood ratio statistic are compared to those of the standard likelihood ratio statistic. Several examples show that the integrated likelihood ratio statistic can be a useful alternative to the standard likelihood ratio statistic. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
481
496
http://hdl.handle.net/10.1093/biomet/asq015
application/pdf
Access to full text is restricted to subscribers.
T. A. Severini
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:699-7122014-06-14RePEc:oup:biomet
article
A semiparametric additive rate model for recurrent events with an informative terminal event
We propose a semiparametric additive rate model for modelling recurrent events in the presence of a terminal event. The dependence between recurrent events and terminal event is nonparametric. A general transformation model is used to model the terminal event. We construct an estimating equation for parameter estimation and derive the asymptotic distributions of the proposed estimators. Simulation studies demonstrate that the proposed inference procedure performs well in realistic settings. Application to a medical study is presented. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
699
712
http://hdl.handle.net/10.1093/biomet/asq039
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Jianwen Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:465-4802014-06-14RePEc:oup:biomet
article
The horseshoe estimator for sparse signals
This paper proposes a new approach to sparsity, called the horseshoe estimator, which arises from a prior based on multivariate-normal scale mixtures. We describe the estimator's advantages over existing approaches, including its robustness, adaptivity to different sparsity patterns and analytical tractability. We prove two theorems: one that characterizes the horseshoe estimator's tail robustness and the other that demonstrates a super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using both real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers obtained by Bayesian model averaging under a point-mass mixture prior. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
465
480
http://hdl.handle.net/10.1093/biomet/asq017
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Nicholas G. Polson
James G. Scott
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:727-7402014-06-14RePEc:oup:biomet
article
Estimating species richness by a Poisson-compound gamma model
We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/asq026
application/pdf
Access to full text is restricted to subscribers.
Ji-Ping Wang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:631-6452014-06-14RePEc:oup:biomet
article
Detecting simultaneous changepoints in multiple sequences
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
631
645
http://hdl.handle.net/10.1093/biomet/asq025
application/pdf
Access to full text is restricted to subscribers.
Nancy R. Zhang
David O. Siegmund
Hanlee Ji
Jun Z. Li
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:279-2942014-06-14RePEc:oup:biomet
article
Dimension reduction for non-elliptically distributed predictors: second-order methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify first-order methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to second-order methods, such as sliced average variance estimation and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment-based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asq016
application/pdf
Access to full text is restricted to subscribers.
Yuexiao Dong
Bing Li
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:333-3452014-06-14RePEc:oup:biomet
article
Evidence factors in observational studies
Some experiments involve more than one random assignment of treatments to units. An analogous situation arises in certain observational studies, although randomization is not used, so each assignment may be biased. If each assignment is suspect, it is natural to ask whether there are separate pieces of information, dependent upon different assumptions, and perhaps whether conclusions about treatment effects are not critically dependent upon one or another suspect assumption. The design of an observational study contains evidence factors if it permits several statistically independent tests of the same null hypothesis about treatment effects, where these tests rely on different assumptions about treatment assignments at several levels of assignment. Two designs and two empirical examples are considered, one example of each design. In the dose-control design, there are matched pairs of a treated subject and an untreated control, and doses of treatment vary between pairs for treated subjects; this yields two evidence factors. In the varied intensity design, there are matched sets with two treated subjects and one or more untreated controls, where the two treated subjects within the same matched set receive different doses of treatment, and in a technically different way, the design yields two evidence factors. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
333
345
http://hdl.handle.net/10.1093/biomet/asq019
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:741-7552014-06-14RePEc:oup:biomet
article
Properties of nested sampling
Nested sampling is a simulation method for approximating marginal likelihoods. We establish that nested sampling has an approximation error that vanishes at the standard Monte Carlo rate and that this error is asymptotically Gaussian. It is shown that the asymptotic variance of the nested sampling approximation typically grows linearly with the dimension of the parameter. We discuss the applicability and efficiency of nested sampling in realistic problems, and compare it with two current methods for computing marginal likelihood. Finally, we propose an extension that avoids resorting to Markov chain Monte Carlo simulation to obtain the simulated points. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/asq021
application/pdf
Access to full text is restricted to subscribers.
Nicolas Chopin
Christian P. Robert
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:389-4042014-06-14RePEc:oup:biomet
article
Calibrating parametric subject-specific risk estimation
For modern evidence-based medicine, decisions on disease prevention or management strategies are often guided by a risk index system. For each individual, the system uses his/her baseline information to estimate the risk of experiencing a future disease-related clinical event. Such a risk scoring scheme is usually derived from an overly simplified parametric model. To validate a model-based procedure, one may perform a standard global evaluation via, for instance, a receiver operating characteristic analysis. In this article, we propose a method to calibrate the risk index system at a subject level. Specifically, we developed point and interval estimation procedures for t-year mortality rates conditional on the estimated parametric risk score. The proposals are illustrated with a dataset from a large clinical trial with post-myocardial infarction patients. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
389
404
http://hdl.handle.net/10.1093/biomet/asq012
application/pdf
Access to full text is restricted to subscribers.
T. Cai
L. Tian
Hajime Uno
Scott D. Solomon
L. J. Wei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:661-6822014-06-14RePEc:oup:biomet
article
Bounded, efficient and doubly robust estimation with inverse weighting
Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on a previous nonparametric likelihood approach and propose new doubly robust estimators, which have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare the new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
661
682
http://hdl.handle.net/10.1093/biomet/asq035
application/pdf
Access to full text is restricted to subscribers.
Zhiqiang Tan
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:505-5122014-06-14RePEc:oup:biomet
article
Copula inference under censoring
This paper discusses copula model selection procedures and goodness-of-fit tests under censoring. The proposed methodology is based on a comparison of nonparametric and model-based estimators of the probability integral transformation, K. New weighted estimators for K are introduced. The resulting tests are compared to an existing approach by simulation and illustrated with an example involving bleeding changes in a woman's reproductive history. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
505
512
http://hdl.handle.net/10.1093/biomet/asq011
application/pdf
Access to full text is restricted to subscribers.
M. L. Lakhal-Chaieb
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:683-6982014-06-14RePEc:oup:biomet
article
Analysis of cohort studies with multivariate and partially observed disease classification data
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
683
698
http://hdl.handle.net/10.1093/biomet/asq036
application/pdf
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Samiran Sinha
W. Ryan Diver
Heather Spencer Feigelson
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:519-5382014-06-14RePEc:oup:biomet
article
Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
519
538
http://hdl.handle.net/10.1093/biomet/asq038
application/pdf
Access to full text is restricted to subscribers.
Ali Shojaie
George Michailidis
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:539-5502014-06-14RePEc:oup:biomet
article
A new approach to Cholesky-based covariance regularization in high dimensions
In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well-known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretation always results in a positive definite estimator. In particular, one can obtain a positive definite banded estimator of the covariance matrix at the same computational cost as the popular banded estimator of Bickel & Levina (2008b), which is not guaranteed to be positive definite. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint, and compare the numerical performance of several methods in simulations and on a sonar data example. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
539
550
http://hdl.handle.net/10.1093/biomet/asq022
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
Elizaveta Levina
Ji Zhu
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:713-7262014-06-14RePEc:oup:biomet
article
Attributable fraction functions for censored event times
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
713
726
http://hdl.handle.net/10.1093/biomet/asq023
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:447-4642014-06-14RePEc:oup:biomet
article
A sequential smoothing algorithm with linear computational cost
In this paper we propose a new particle smoother that has a computational complexity of O(N), where N is the number of particles. This compares favourably with the O(N-super-2) computational cost of most smoothers. The new method also overcomes some degeneracy problems in existing algorithms. Through simulation studies we show that substantial gains in efficiency are obtained for practical amounts of computational cost. It is shown both through these simulation studies, and by the analysis of an athletics dataset, that our new method also substantially outperforms the simple filter-smoother, the only other smoother with computational cost that is O(N). Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
447
464
http://hdl.handle.net/10.1093/biomet/asq013
application/pdf
Access to full text is restricted to subscribers.
Paul Fearnhead
David Wyncoll
Jonathan Tawn
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:435-4462014-06-14RePEc:oup:biomet
article
Estimating linear dependence between nonstationary time series using the locally stationary wavelet model
Large volumes of neuroscience data comprise multiple, nonstationary electrophysiological or neuroimaging time series recorded from different brain regions. Accurately estimating the dependence between such neural time series is critical, since changes in the dependence structure are presumed to reflect functional interactions between neuronal populations. We propose a new dependence measure, derived from a bivariate locally stationary wavelet time series model. Since wavelets are localized in both time and scale, this approach leads to a natural, local and multi-scale estimate of nonstationary dependence. Our methodology is illustrated by application to a simulated example, and to electrophysiological data relating to interactions between the rat hippocampus and prefrontal cortex during working memory and decision making. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
435
446
http://hdl.handle.net/10.1093/biomet/asq007
application/pdf
Access to full text is restricted to subscribers.
J. Sanderson
P. Fryzlewicz
M. W. Jones
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:621-6302014-06-14RePEc:oup:biomet
article
Accurate and robust tests for indirect inference
In this paper we propose accurate parameter and over-identification tests for indirect inference. Under the null hypothesis the new tests are asymptotically χ-super-2-distributed with a relative error of order n-super- - 1. They exhibit better finite sample accuracy than classical tests for indirect inference, which have the same asymptotic distribution but an absolute error of order n-super- - 1-2. Robust versions of the tests are also provided. We illustrate their accuracy in nonlinear regression, Poisson regression with overdispersion and diffusion models. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
621
630
http://hdl.handle.net/10.1093/biomet/asq040
application/pdf
Access to full text is restricted to subscribers.
Veronika Czellar
Elvezio Ronchetti
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:497-5042014-06-14RePEc:oup:biomet
article
Objective Bayes and conditional inference in exponential families
Objective Bayes methodology is considered for conditional frequentist inference about a canonical parameter in a multi-parameter exponential family. A condition is derived under which posterior Bayes quantiles match the conditional frequentist coverage to a higher-order approximation in terms of the sample size. This condition is on the model, not on the prior, and it ensures that any first-order probability matching prior in the unconditional sense automatically yields higher-order conditional probability matching. Objective Bayes methods are compared to parametric bootstrap and analytic methods for higher-order conditional frequentist inference. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
497
504
http://hdl.handle.net/10.1093/biomet/asq002
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:861-8722013-08-16RePEc:oup:biomet
article
Nonparametric estimation of the probability of illness in the illness-death model under cross-sectional sampling
Cross-sectional sampling is an attractive design that saves resources but results in biased data. For proper inference, one should first discover the bias function and then weigh observations appropriately. We consider cross-sectioning of the illness-death model with the aim of estimating the probability of visiting the illness state before death. We develop simple consistent and asymptotically normal estimators under various assumptions on the model and data collection and, in particular, compare designs with and without a follow-up. These designs are common in surveillance of hospital acquired infections, but estimators currently in use do not properly correct the bias. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asp046
application/pdf
Access to full text is restricted to subscribers.
M. Mandel
R. Fluss
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:975-9822013-08-16RePEc:oup:biomet
article
Maximum likelihood estimation using composite likelihoods for closed exponential families
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
975
982
http://hdl.handle.net/10.1093/biomet/asp056
application/pdf
Access to full text is restricted to subscribers.
Kanti V. Mardia
John T. Kent
Gareth Hughes
Charles C. Taylor
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:887-9012013-08-16RePEc:oup:biomet
article
Marginal hazards model for case-cohort studies with multiple disease outcomes
Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
887
901
http://hdl.handle.net/10.1093/biomet/asp059
application/pdf
Access to full text is restricted to subscribers.
S. Kang
J. Cai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1019-10232013-08-16RePEc:oup:biomet
article
A note on a conjectured sharpness principle for probabilistic forecasting with calibration
This note proves a weak sharpness principle as conjectured by Gneiting et al. (2007) in connection with probabilistic forecasting subject to calibration constraints. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1019
1023
http://hdl.handle.net/10.1093/biomet/asp054
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:835-8452013-08-16RePEc:oup:biomet
article
Bayesian lasso regression
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
835
845
http://hdl.handle.net/10.1093/biomet/asp047
application/pdf
Access to full text is restricted to subscribers.
Chris Hans
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:847-8602013-08-16RePEc:oup:biomet
article
Generalized fiducial inference for wavelet regression
We apply Fisher's fiducial idea to wavelet regression, first developing a general methodology for handling model selection problems within the fiducial framework. We propose fiducial-based methods for wavelet curve estimation and the construction of pointwise confidence intervals. We show that these confidence intervals have asymptotically correct coverage. Simulations demonstrate that they possess promising empirical properties. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
847
860
http://hdl.handle.net/10.1093/biomet/asp050
application/pdf
Access to full text is restricted to subscribers.
Jan Hannig
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1024-10242013-08-16RePEc:oup:biomet
article
'Generalized method of moments estimation for linear regression with clustered failure time data'
4
2009
96
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/asp061
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:917-9322013-08-16RePEc:oup:biomet
article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
917
932
http://hdl.handle.net/10.1093/biomet/asp041
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:971-9742013-08-16RePEc:oup:biomet
article
Construction of orthogonal Latin hypercube designs
Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/asp058
application/pdf
Access to full text is restricted to subscribers.
Fasheng Sun
Min-Qian Liu
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:983-9902013-08-16RePEc:oup:biomet
article
Adaptive approximate Bayesian computation
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
983
990
http://hdl.handle.net/10.1093/biomet/asp052
application/pdf
Access to full text is restricted to subscribers.
Mark A. Beaumont
Jean-Marie Cornuet
Jean-Michel Marin
Christian P. Robert
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:873-8862013-08-16RePEc:oup:biomet
article
Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach
To estimate the lifetime distribution of right-censored length-biased data, we propose a pseudo-partial likelihood approach that allows us to derive two nonparametric estimators. With its closed-form estimators and explicit limiting variances, this approach retains the simplicity of conditional analysis, and has only a small efficiency loss compared with the unconditional analysis. Under some regularity conditions, we show that the two estimators are uniformly consistent and converge weakly to Gaussian processes. A simulation study demonstrates that the proposed estimators have satisfactory finite-sample performance. Application to an Alzheimer's disease study is reported. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
873
886
http://hdl.handle.net/10.1093/biomet/asp064
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:821-8342013-08-16RePEc:oup:biomet
article
Bayesian analysis of matrix normal graphical models
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
821
834
http://hdl.handle.net/10.1093/biomet/asp049
application/pdf
Access to full text is restricted to subscribers.
Hao Wang
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:781-7922013-08-16RePEc:oup:biomet
article
A new look at time series of counts
This paper proposes a simple new model for stationary time series of integer counts. Previous work has focused on thinning methods and classical time series autoregressive moving-average difference equations; in contrast, our methods use a renewal process to generate a correlated sequence of Bernoulli trials. By superpositioning independent copies of such processes, stationary series with binomial, Poisson, geometric or any other discrete marginal distribution can be readily constructed. The model class proposed is parsimonious, non-Markov and readily generates series with either short- or long-memory autocovariances. The model can be fitted with linear prediction techniques for stationary series. As an example, a stationary series with binomial marginal distributions is fitted to the number of rainy days in 210 consecutive weeks at Key West, Florida. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
781
792
http://hdl.handle.net/10.1093/biomet/asp057
application/pdf
Access to full text is restricted to subscribers.
Yunwei Cui
Robert Lund
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:998-10042013-08-16RePEc:oup:biomet
article
A note on the variance of doubly-robust G-estimators
A recursive variance calculation is derived for doubly-robust G-estimators for dynamic treatment regimes in a multi-interval setting. Treatment decision parameters are not assumed to be shared across treatment intervals; this independence of parameters permits sequential estimation of the G-estimators' variance when G-estimation is performed in a sequential fashion. The recursive variance calculation is both natural and computationally feasible. This development can easily be adapted to other complex estimating procedures that require nuisance parameter estimation. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
998
1004
http://hdl.handle.net/10.1093/biomet/asp043
application/pdf
Access to full text is restricted to subscribers.
E. E. M. Moodie
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:945-9562013-08-16RePEc:oup:biomet
article
Sliced space-filling designs
We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
945
956
http://hdl.handle.net/10.1093/biomet/asp044
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:903-9152013-08-16RePEc:oup:biomet
article
Tests and confidence intervals for secondary endpoints in sequential clinical trials
In a sequential clinical trial whose stopping rule depends on the primary endpoint, inference on secondary endpoints is an important long-standing problem. Ignoring the possibility of early stopping based on the primary endpoint may result in substantial bias. To address this problem, a commonly used approach is to develop bias correction by estimating the bias in the case of bivariate normal outcomes and appealing to joint asymptotic normality of the statistics associated with the primary and secondary endpoints. We propose herein a new approach that uses resampling and a novel ordering scheme in the sample space of sequential statistics observed up to a stopping time. This approach is shown to provide accurate inference in complex clinical trials, including time-sequential trials with survival endpoints and covariates. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
903
915
http://hdl.handle.net/10.1093/biomet/asp063
application/pdf
Access to full text is restricted to subscribers.
Tze Leung Lai
Mei-Chiung Shih
Zheng Su
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:957-9702013-08-16RePEc:oup:biomet
article
Nested Latin hypercube designs
We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
957
970
http://hdl.handle.net/10.1093/biomet/asp045
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1012-10182013-08-16RePEc:oup:biomet
article
A note on adaptive Bonferroni and Holm procedures under dependence
Hochberg & Benjamini (1990) first presented adaptive procedures for controlling familywise error rate. However, until now, it has not been proved that these procedures control the familywise error rate. We introduce a simplified version of Hochberg & Benjamini's adaptive Bonferroni and Holm procedures. Assuming a conditional dependence model, we prove that the former procedure controls the familywise error rate in finite samples while the latter controls it approximately. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1012
1018
http://hdl.handle.net/10.1093/biomet/asp048
application/pdf
Access to full text is restricted to subscribers.
Wenge Guo
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:793-8042013-08-16RePEc:oup:biomet
article
Bias reduction in exponential family nonlinear models
In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
793
804
http://hdl.handle.net/10.1093/biomet/asp055
application/pdf
Access to full text is restricted to subscribers.
Ioannis Kosmidis
David Firth
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:991-9972013-08-16RePEc:oup:biomet
article
Semiparametric methods for evaluating risk prediction markers in case-control studies
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
991
997
http://hdl.handle.net/10.1093/biomet/asp040
application/pdf
Access to full text is restricted to subscribers.
Ying Huang
Margaret Sullivan Pepe
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:43-552012-05-01RePEc:oup:biomet
article
Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds
A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
43
55
http://hdl.handle.net/10.1093/biomet/asr078
application/pdf
Access to full text is restricted to subscribers.
Emma F. Eastoe
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:230-2372012-05-01RePEc:oup:biomet
article
Estimating overdispersion when fitting a generalized linear model to sparse data
We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, φ, is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of φ on Pearson's lack-of-fit statistic, with or without Farrington's modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
230
237
http://hdl.handle.net/10.1093/biomet/asr083
application/pdf
Access to full text is restricted to subscribers.
D. J. Fletcher
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:85-1002012-05-01RePEc:oup:biomet
article
Combining data from two independent surveys: a model-assisted approach
Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
85
100
http://hdl.handle.net/10.1093/biomet/asr063
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:167-1842012-05-01RePEc:oup:biomet
article
Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
167
184
http://hdl.handle.net/10.1093/biomet/asr062
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Ming-Hui Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:15-282012-05-01RePEc:oup:biomet
article
Factor profiled sure independence screening
We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
15
28
http://hdl.handle.net/10.1093/biomet/asr074
application/pdf
Access to full text is restricted to subscribers.
H. Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:101-1132012-05-01RePEc:oup:biomet
article
Optimal allocation to maximize the power of two-sample tests for binary response
We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
101
113
http://hdl.handle.net/10.1093/biomet/asr077
application/pdf
Access to full text is restricted to subscribers.
D. Azriel
M. Mandel
Y. Rinott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:71-842012-05-01RePEc:oup:biomet
article
Optimal fractions of two-level factorials under a baseline parameterization
Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/asr071
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:151-1652012-05-01RePEc:oup:biomet
article
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
151
165
http://hdl.handle.net/10.1093/biomet/asr076
application/pdf
Access to full text is restricted to subscribers.
Grace Y. Yi
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:245-2512012-05-01RePEc:oup:biomet
article
Optimality of group testing in the presence of misclassification
Several optimality properties of Dorfman's (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/asr064
application/pdf
Access to full text is restricted to subscribers.
Aiyi Liu
Chunling Liu
Zhiwei Zhang
Paul S. Albert
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:238-2442012-05-01RePEc:oup:biomet
article
On robust estimation via pseudo-additive information
We consider a robust parameter estimator minimizing an empirical approximation to the q-entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
238
244
http://hdl.handle.net/10.1093/biomet/asr061
application/pdf
Access to full text is restricted to subscribers.
Davide Ferrari
Davide La Vecchia
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:57-692012-05-01RePEc:oup:biomet
article
Conservative hypothesis tests and confidence intervals using importance sampling
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
57
69
http://hdl.handle.net/10.1093/biomet/asr079
application/pdf
Access to full text is restricted to subscribers.
Matthew T. Harrison
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:211-2222012-05-01RePEc:oup:biomet
article
A proportional likelihood ratio model
We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
211
222
http://hdl.handle.net/10.1093/biomet/asr060
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:223-2292012-05-01RePEc:oup:biomet
article
Proportional likelihood ratio models for mean regression
The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
223
229
http://hdl.handle.net/10.1093/biomet/asr075
application/pdf
Access to full text is restricted to subscribers.
Alan Huang
Paul J. Rathouz
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:115-1262012-05-01RePEc:oup:biomet
article
Directed acyclic graphs with edge-specific bounds
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
115
126
http://hdl.handle.net/10.1093/biomet/asr059
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
Zhiqiang Tan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:127-1402012-05-01RePEc:oup:biomet
article
Bayesian analysis of multistate event history data: beta-Dirichlet process prior
Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
127
140
http://hdl.handle.net/10.1093/biomet/asr067
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Lancelot James
Rafael Weissbach
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:185-1972012-05-01RePEc:oup:biomet
article
Mean residual life models with time-dependent coefficients under right censoring
The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
185
197
http://hdl.handle.net/10.1093/biomet/asr065
application/pdf
Access to full text is restricted to subscribers.
Liuquan Sun
Xinyuan Song
Zhigang Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:29-422012-05-01RePEc:oup:biomet
article
A direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
29
42
http://hdl.handle.net/10.1093/biomet/asr066
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
Ming Yuan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:199-2102012-05-01RePEc:oup:biomet
article
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
199
210
http://hdl.handle.net/10.1093/biomet/asr072
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:1-142012-05-01RePEc:oup:biomet
article
Studies in the history of probability and statistics, L: Karl Pearson and the Rule of Three
Karl Pearson's role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson's work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson's statistical work, and his three major achievements are briefly described. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asr046
application/pdf
Access to full text is restricted to subscribers.
Stephen M. Stigler
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:253-2722012-08-31RePEc:oup:biomet
article
Dependence modelling for spatial extremes
Current dependence models for spatial extremes are based upon max-stable processes. Within this class, there are few inferentially viable models available, and we propose one further model. More problematic are the restrictive assumptions that must be made when using max-stable processes to model dependence for spatial extremes: it must be assumed that the dependence structure of the observed extremes is compatible with a limiting model that holds for all events more extreme than those that have already occurred. This problem has long been acknowledged in the context of finite-dimensional multivariate extremes, in particular when data display dependence at observable levels, but are independent in the limit. We propose a flexible class of models that is suitable for such data in a spatial context. In addition, we consider the situation where the extremal dependence structure may vary with distance. We apply our models to spatially referenced significant wave height data from the North Sea, finding evidence that their extremal structure is not compatible with a limiting dependence model. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
253
272
http://hdl.handle.net/10.1093/biomet/asr080
application/pdf
Access to full text is restricted to subscribers.
Jennifer L. Wadsworth
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:473-4802012-08-31RePEc:oup:biomet
article
A new residual for ordinal outcomes
We propose a new residual for regression models of ordinal outcomes, defined as E{sign(y,Y)}, where y is the observed outcome and Y is a random variable from the fitted distribution. This new residual is a single value per subject irrespective of the number of categories of the ordinal outcome, contains directional information between the observed value and the fitted distribution, and does not require the assignment of arbitrary numbers to categories. We study its properties, describe its connections with other residuals, ranks and ridits, and demonstrate its use in model diagnostics. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr073
application/pdf
Access to full text is restricted to subscribers.
Chun Li
Bryan E. Shepherd
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:327-3432012-08-31RePEc:oup:biomet
article
Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
327
343
http://hdl.handle.net/10.1093/biomet/ass006
application/pdf
Access to full text is restricted to subscribers.
Yongseok Park
Jeremy M. G. Taylor
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:569-5832012-08-31RePEc:oup:biomet
article
On multilinear principal component analysis of order-two tensors
Principal component analysis is commonly used for dimension reduction in analysing high-dimensional data. Multilinear principal component analysis aims to serve a similar function for analysing tensor structure data, and has empirically been shown effective in reducing dimensionality. In this paper, we investigate its statistical properties and demonstrate its advantages. Conventional principal component analysis, which vectorizes the tensor data, may lead to inefficient and unstable prediction due to the often extremely large dimensionality involved. Multilinear principal component analysis, in trying to preserve the data structure, searches for low-dimensional projections and, thereby, decreases dimensionality more efficiently. The asymptotic theory of order-two multilinear principal component analysis, including asymptotic efficiency and distributions of principal components, associated projections, and the explained variance, is developed. A test of dimensionality is also proposed. Finally, multilinear principal component analysis is shown to improve conventional principal component analysis in analysing the Olivetti faces dataset, which is achieved by extracting a more modularly oriented basis set in reconstructing the test faces. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
569
583
http://hdl.handle.net/10.1093/biomet/ass019
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
Peishien Wu
Iping Tu
Suyun Huang
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:615-6302012-08-31RePEc:oup:biomet
article
Predictive accuracy of covariates for event times
We propose a graphical measure, the generalized negative predictive function, to quantify the predictive accuracy of covariates for survival time or recurrent event times. This new measure characterizes the event-free probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two clinical studies are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
615
630
http://hdl.handle.net/10.1093/biomet/ass018
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:741-7472012-08-31RePEc:oup:biomet
article
The fitting of complex parametric models
Consider parametric models that are too complicated to allow calculation of a likelihood but from which observations can be simulated. We examine parameter estimators that are linear functions of a possibly large set of candidate features. A combination of simulations based on a fractional design and sets of discriminant analyses is then used to find an optimal estimator of the vector parameter and its covariance matrix. The procedure is an alternative to the approximate Bayesian computation scheme. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/ass030
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
Christiana Kartsonaki
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:687-7022012-08-31RePEc:oup:biomet
article
Inner envelopes: efficient estimation in multivariate linear regression
In this article we propose a new model, called the inner envelope model, which leads to efficient estimation in the context of multivariate normal linear regression. The asymptotic distribution and the consistency of its maximum likelihood estimators are established. Theoretical results, simulation studies and examples all show that the efficiency gains can be substantial relative to standard methods and to the maximum likelihood estimators from the envelope model introduced recently by Cook et al. (2010). Compared to the envelope model, the inner envelope model is based on a different construction and it can produce substantial efficiency gains in situations where the envelope model offers no gains. In effect, inner envelopes open a new frontier to the way in which reducing subspaces can be used to improve efficiency in multivariate problems. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
687
702
http://hdl.handle.net/10.1093/biomet/ass024
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:405-4212012-08-31RePEc:oup:biomet
article
Corrected-loss estimation for quantile regression with covariate measurement errors
We study estimation in quantile regression when covariates are measured with errors. Existing methods require stringent assumptions, such as spherically symmetric joint distribution of the regression and measurement error variables, or linearity of all quantile functions, which restrict model flexibility and complicate computation. In this paper, we develop a new estimation approach based on corrected scores to account for a class of covariate measurement errors in quantile regression. The proposed method is simple to implement. Its validity requires only linearity of the particular quantile function of interest, and it requires no parametric assumptions on the regression error distributions. Finite-sample results demonstrate that the proposed estimators are more efficient than the existing methods in various models considered. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
405
421
http://hdl.handle.net/10.1093/biomet/ass005
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Leonard A. Stefanski
Zhongyi Zhu
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:649-6622012-08-31RePEc:oup:biomet
article
Modelling covariance structure in bivariate marginal models for longitudinal data
It can be more challenging to efficiently model the covariance matrices for multivariate longitudinal data than for the univariate case, due to the correlations arising between multiple responses. The positive-definiteness constraint and the high dimensionality are further obstacles in covariance modelling. In this paper, we develop a data-based method by which the parameters in the covariance matrices are replaced by unconstrained and interpretable parameters with reduced dimensions. The maximum likelihood estimators for the mean and covariance parameters are shown to be consistent and asymptotically normally distributed. Simulations and real data analysis show that the new approach performs very well even when modelling bivariate nonstationary dependence structures. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
649
662
http://hdl.handle.net/10.1093/biomet/ass031
application/pdf
Access to full text is restricted to subscribers.
Jing Xu
Gilbert Mackenzie
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:511-5312012-08-31RePEc:oup:biomet
article
Nonparametric estimation of diffusions: a differential equations approach
We consider estimation of scalar functions that determine the dynamics of diffusion processes. It has been recently shown that nonparametric maximum likelihood estimation is ill-posed in this context. We adopt a probabilistic approach to regularize the problem by the adoption of a prior distribution for the unknown functional. A Gaussian prior measure is chosen in the function space by specifying its precision operator as an appropriate differential operator. We establish that a Bayesian--Gaussian conjugate analysis for the drift of one-dimensional nonlinear diffusions is feasible using high-frequency data, by expressing the loglikelihood as a quadratic function of the drift, with sufficient statistics given by the local time process and the end points of the observed path. Computationally efficient posterior inference is carried out using a finite element method. We embed this technology in partially observed situations and adopt a data augmentation approach whereby we iteratively generate missing data paths and draws from the unknown functional. Our methodology is applied to estimate the drift of models used in molecular dynamics and financial econometrics using high- and low-frequency observations. We discuss extensions to other partially observed schemes and connections to other types of nonparametric inference. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
511
531
http://hdl.handle.net/10.1093/biomet/ass034
application/pdf
Access to full text is restricted to subscribers.
Omiros Papaspiliopoulos
Yvo Pokern
Gareth O. Roberts
Andrew M. Stuart
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:551-5682012-08-31RePEc:oup:biomet
article
Analysis of principal nested spheres
A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall's shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
551
568
http://hdl.handle.net/10.1093/biomet/ass022
application/pdf
Access to full text is restricted to subscribers.
Sungkyu Jung
Ian L. Dryden
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:502-5082012-08-31RePEc:oup:biomet
article
Inference for additive interaction under exposure misclassification
Results are given concerning inferences that can be drawn about interaction when binary exposures are subject to certain forms of independent nondifferential misclassification. Tests for interaction, using the misclassified exposures, are valid provided the probability of misclassification satisfies certain bounds. Results are given for additive statistical interactions, for causal interactions corresponding to synergism in the sufficient cause framework and for so-called compositional epistasis. Both two-way and three-way interactions are considered. The results require only that the probability of misclassification be no larger than 1/2 or 1/4, depending on the test. For additive statistical interaction, a method to correct estimates and confidence intervals for misclassification is described. The consequences for power of interaction tests under exposure misclassification are explored through simulations. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
502
508
http://hdl.handle.net/10.1093/biomet/ass012
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:631-6482012-08-31RePEc:oup:biomet
article
An efficient method of estimation for longitudinal surveys with monotone missing data
Panel attrition is frequently encountered in panel sample surveys. When it is related to the observed study variable, the classical approach of nonresponse adjustment using a covariate-dependent dropout mechanism can be biased. We consider an efficient method of estimation with monotone panel attrition when the response probability depends on the previous values of study variable as well as other covariates. Because of the monotone structure of the missing pattern, the response mechanism is missing at random. The proposed estimator is asymptotically optimal in the sense that it minimizes the asymptotic variance of a class of estimators that can be written as a linear combination of the unbiased estimators of the panel estimates for each wave, and incorporates all available information using generalized least squares. Variance estimation is discussed and results from a simulation study are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
631
648
http://hdl.handle.net/10.1093/biomet/ass026
application/pdf
Access to full text is restricted to subscribers.
Ming Zhou
Jae Kwang Kim
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:457-4722012-08-31RePEc:oup:biomet
article
Empirical bootstrap bias correction and estimation of prediction mean square error in small area estimation
We develop a method for bias correction, which models the error of the target estimator as a function of the corresponding estimator obtained from bootstrap samples, and the original estimators and bootstrap estimators of the parameters governing the model fitted to the sample data. This is achieved by considering a number of plausible parameter values, generating a pseudo original sample for each parameter and bootstrap samples for each such sample, and then searching for an appropriate functional relationship. Under certain conditions, the procedure also permits estimation of the mean square error of the bias corrected estimator. The method is applied for estimating the prediction mean square error in small area estimation of proportions under a generalized mixed model. Empirical comparisons with jackknife and bootstrap methods are presented. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
457
472
http://hdl.handle.net/10.1093/biomet/ass010
application/pdf
Access to full text is restricted to subscribers.
D. Pfeffermann
S. Correa
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:703-7162012-08-31RePEc:oup:biomet
article
Penalized empirical likelihood and growing dimensional general estimating equations
When a parametric likelihood function is not specified for a model, estimating equations may provide an instrument for statistical inference. Qin and Lawless (1994) illustrated that empirical likelihood makes optimal use of these equations in inferences for fixed low-dimensional unknown parameters. In this paper, we study empirical likelihood for general estimating equations with growing high dimensionality and propose a penalized empirical likelihood approach for parameter estimation and variable selection. We quantify the asymptotic properties of empirical likelihood and its penalized version, and show that penalized empirical likelihood has the oracle property. The performance of the proposed method is illustrated via simulated applications and a data analysis. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
703
716
http://hdl.handle.net/10.1093/biomet/ass014
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Cheng Yong Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:481-4872012-08-31RePEc:oup:biomet
article
Structuring shrinkage: some correlated priors for regression
This paper develops a rich class of sparsity priors for regression effects that encourage shrinkage of both regression effects and contrasts between effects to zero whilst leaving sizeable real effects largely unshrunk. The construction of these priors uses some properties of normal-gamma distributions to include design features in the prior specification, but has general relevance to any continuous sparsity prior. Specific prior distributions are developed for serial dependence between regression effects and correlation within groups of regression effects. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
481
487
http://hdl.handle.net/10.1093/biomet/asr082
application/pdf
Access to full text is restricted to subscribers.
J. E. Griffin
P. J. Brown
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:733-7402012-08-31RePEc:oup:biomet
article
Positive definite estimators of large covariance matrices
Using convex optimization, we construct a sparse estimator of the covariance matrix that is positive definite and performs well in high-dimensional settings. A lasso-type penalty is used to encourage sparsity and a logarithmic barrier function is used to enforce positive definiteness. Consistency and convergence rate bounds are established as both the number of variables and sample size diverge. An efficient computational algorithm is developed and the merits of the approach are illustrated with simulations and a speech signal classification example. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
733
740
http://hdl.handle.net/10.1093/biomet/ass025
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:299-3132012-08-31RePEc:oup:biomet
article
Componentwise classification and clustering of functional data
The infinite dimension of functional data can challenge conventional methods for classification and clustering. A variety of techniques have been introduced to address this problem, particularly in the case of prediction, but the structural models that they involve can be too inaccurate, or too abstract, or too difficult to interpret, for practitioners. In this paper, we develop approaches to adaptively choose components, enabling classification and clustering to be reduced to finite-dimensional problems. We explore and discuss properties of these methodologies. Our techniques involve methods for estimating classifier error rate and cluster tightness, and for choosing both the number of components, and their locations, to optimize these quantities. A major attraction of this approach is that it allows identification of parts of the function domain that convey important information for classification and clustering. It also permits us to determine regions that are relevant to one of these analyses but not the other. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
299
313
http://hdl.handle.net/10.1093/biomet/ass003
application/pdf
Access to full text is restricted to subscribers.
A. Delaigle
P. Hall
N. Bathia
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:599-6132012-08-31RePEc:oup:biomet
article
Nonparametric incidence estimation from prevalent cohort survival data
Incidence is an important epidemiological concept most suitably studied using an incident cohort study. However, data are often collected from the more feasible prevalent cohort study, whereby diseased individuals are recruited through a cross-sectional survey and followed in time. In the absence of temporal trends in survival, we derive an efficient nonparametric estimator of the cumulative incidence based on such data and study its asymptotic properties. Arbitrary calendar time variations in disease incidence are allowed. Age-specific incidence and adjustments for both stratified sampling and temporal variations in survival are also discussed. Simulation results are presented and data from the Canadian Study of Health and Aging are analysed to infer the incidence of dementia in the Canadian elderly population. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
599
613
http://hdl.handle.net/10.1093/biomet/ass017
application/pdf
Access to full text is restricted to subscribers.
Marco Carone
Masoud Asgharian
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:273-2842012-08-31RePEc:oup:biomet
article
Stochastic blockmodels with a growing number of classes
We present asymptotic and finite-sample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size. We also establish finite-sample confidence bounds on maximum-likelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising self-reported school friendships, resulting in block estimates that reveal residual structure. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
273
284
http://hdl.handle.net/10.1093/biomet/asr053
application/pdf
Access to full text is restricted to subscribers.
D. S. Choi
P. J. Wolfe
E. M. Airoldi
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:717-7312012-08-31RePEc:oup:biomet
article
On the robustness of the adaptive lasso to model misspecification
Penalization methods have been shown to yield both consistent variable selection and oracle parameter estimation under correct model specification. In this article, we study such methods under model misspecification, where the assumed form of the regression function is incorrect, including generalized linear models for uncensored outcomes and the proportional hazards model for censored responses. Estimation with the adaptive least absolute shrinkage and selection operator, lasso, penalty is proven to achieve sparse estimation of regression coefficients under misspecification. The resulting estimators are selection consistent, asymptotically normal and oracle, where the selection is based on the limiting values of the parameter estimators obtained using the misspecified model without penalization. We further derive conditions under which the penalized estimators from the misspecified model may yield selection consistency under the true model. The robustness is explored numerically via simulation and an application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
717
731
http://hdl.handle.net/10.1093/biomet/ass027
application/pdf
Access to full text is restricted to subscribers.
W. Lu
Y. Goldberg
J. P. Fine
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:315-3252012-08-31RePEc:oup:biomet
article
Global optimality of nonconvex penalized estimators
Nonconvex penalties such as the smoothly clipped absolute deviation or minimax concave penalties have desirable properties such as the oracle property, even when the dimension of the predictive variables is large. However, checking whether a given local minimizer has such properties is not easy since there can be many local minimizers. In this paper, we give sufficient conditions under which a local minimizer is unique, and show that the oracle estimator becomes the unique local minimizer with probability tending to one. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
315
325
http://hdl.handle.net/10.1093/biomet/asr084
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Sunghoon Kwon
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:379-3922012-08-31RePEc:oup:biomet
article
Efficient estimation for the Cox model with varying coefficients
A proportional hazards model with varying coefficients allows one to examine the extent to which covariates interact nonlinearly with an exposure variable. A global partial likelihood method, in contrast with the local partial likelihood method of Fan et al. (2006), is proposed for estimation of varying coefficient functions. The proposed estimators are proved to be consistent and asymptotically normal. Semiparametric efficiency of the estimators is demonstrated in terms of their linear functionals. Evidence in support of the superiority of the method is presented in numerical studies and real examples. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
379
392
http://hdl.handle.net/10.1093/biomet/asr081
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Huazhen Lin
Yong Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:439-4562012-08-31RePEc:oup:biomet
article
Improved double-robust estimation in missing data and causal inference models
Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
439
456
http://hdl.handle.net/10.1093/biomet/ass013
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Quanhong Lei
Mariela Sued
James M. Robins
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:285-2982012-08-31RePEc:oup:biomet
article
Doubly misspecified models
Estimation bias arising from local model uncertainty and incomplete data has been studied by Copas & Eguchi (2005) under the assumption of a correctly specified marginal model. We extend the approach to allow additional local uncertainty in the assumed marginal model, arguing that this is almost unavoidable for nonlinear problems. We present a general bias analysis and sensitivity procedure for such doubly misspecified models and illustrate the breadth of application through three examples: logistic regression with a missing confounder, measurement error for binary responses and survival analysis with frailty. We show that a double-the-variance rule is not conservative under double misspecification. The ideas are brought together in a meta-analysis of studies of rehabilitation rates for juvenile offenders. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
285
298
http://hdl.handle.net/10.1093/biomet/asr085
application/pdf
Access to full text is restricted to subscribers.
N. X. Lin
J. Q. Shi
R. Henderson
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:393-4042012-08-31RePEc:oup:biomet
article
Nonparametric inference for assessing treatment efficacy in randomized clinical trials with a time-to-event outcome and all-or-none compliance
To evaluate the biological efficacy of a treatment in a randomized clinical trial, one needs to compare patients in the treatment arm who actually received treatment with the subgroup of patients in the control arm who would have received treatment had they been randomized into the treatment arm. In practice, subgroup membership in the control arm is usually unobservable. This paper develops a nonparametric inference procedure to compare subgroup probabilities with right-censored time-to-event data and unobservable subgroup membership in the control arm. We also present a procedure to estimate the onset and duration of treatment effect. The performance of our method is evaluated by simulation. An illustration is given using a randomized clinical trial for melanoma. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
393
404
http://hdl.handle.net/10.1093/biomet/ass004
application/pdf
Access to full text is restricted to subscribers.
Robert M. Elashoff
Gang Li
Ying Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:345-3612012-08-31RePEc:oup:biomet
article
Analysing bivariate survival data with interval sampling and application to cancer epidemiology
In biomedical studies, ordered bivariate survival data are frequently encountered when bivariate failure events are used as outcomes to identify the progression of a disease. In cancer studies, interest could be focused on bivariate failure times, for example, time from birth to cancer onset and time from cancer onset to death. This paper considers a sampling scheme, termed interval sampling, in which the first failure event is identified within a calendar time interval, the time of the initiating event can be retrospectively confirmed and the occurrence of the second failure event is observed subject to right censoring. In a cancer data application, the initiating, first and second events could correspond to birth, cancer onset and death. The fact that the data are collected conditional on the first failure event occurring within a time interval induces bias. Interval sampling is widely used for collection of disease registry data by governments and medical institutions, though the interval sampling bias is frequently overlooked by researchers. This paper develops statistical methods for analysing such data. Semiparametric methods are proposed under semi-stationarity and stationarity. Numerical studies demonstrate that the proposed estimation approaches perform well with moderate sample sizes. We apply the proposed methods to ovarian cancer registry data. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
345
361
http://hdl.handle.net/10.1093/biomet/ass009
application/pdf
Access to full text is restricted to subscribers.
Hong Zhu
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:488-4932012-08-31RePEc:oup:biomet
article
Information dynamics and optimal sampling in capture-recapture
The build up of information in a continued capture-recapture experiment of simple random sampling of an open population is studied by predicting the conditional approximate Fisher information for abundance in data from one survey given the previous data. By neglecting the stochasticity in survival, a simple approximate likelihood is obtained. Optimal temporal allocation of a given total effort is found by numerical optimization for various objective functions based on the approximate Fisher information. For aerial photographic surveys of bowhead whales, the performance of estimates of abundance and of demographic parameters is compared between constant yearly survey effort and nominally optimal sampling by simulating a realistic model over 50 years. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
488
493
http://hdl.handle.net/10.1093/biomet/ass001
application/pdf
Access to full text is restricted to subscribers.
T. Schweder
D. Sadykova
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:675-6862012-08-31RePEc:oup:biomet
article
Objective Bayes, conditional inference and the signed root likelihood ratio statistic
Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic. Conditions for conditional probability matching in ancillary statistic models are derived and discussed. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
675
686
http://hdl.handle.net/10.1093/biomet/ass028
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
Todd A. Kuffner
G. Alastair Young
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:755-7622012-08-31RePEc:oup:biomet
article
Quadratic inference function approach to merging longitudinal studies: validation and joint estimation
Merging data from multiple studies has been widely adopted in biomedical research. In this paper, we consider two major issues related to merging longitudinal datasets. We first develop a rigorous hypothesis testing procedure to assess the validity of data merging, and then propose a flexible joint estimation procedure that enables us to analyse merged data and to account for different within-subject correlations and follow-up schedules in different studies. We establish large sample properties for the proposed procedures. We compare our method with meta analysis and generalized estimating equations and show that our test provides robust control of Type I error against both misspecification of working correlation structures and heterogeneous dispersion parameters. Our joint estimating procedure leads to an improvement in estimation efficiency on all regression coefficients after data merging is validated. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
755
762
http://hdl.handle.net/10.1093/biomet/ass021
application/pdf
Access to full text is restricted to subscribers.
Fei Wang
Lu Wang
Peter X.-K. Song
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:423-4382012-08-31RePEc:oup:biomet
article
Multiple imputation in quantile regression
We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American's Table Study data, investigating the association between two measures of dietary intake. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
423
438
http://hdl.handle.net/10.1093/biomet/ass007
application/pdf
Access to full text is restricted to subscribers.
Ying Wei
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:494-5012012-08-31RePEc:oup:biomet
article
A generalized Dunnett test for multi-arm multi-stage clinical studies with treatment selection
We generalize the Dunnett test to derive efficacy and futility boundaries for a flexible multi-arm multi-stage clinical trial for a normally distributed endpoint with known variance. We show that the boundaries control the familywise error rate in the strong sense. The method is applicable for any number of treatment arms, number of stages and number of patients per treatment per stage. It can be used for a wide variety of boundary types or rules derived from α-spending functions. Additionally, we show how sample size can be computed under a least favourable configuration power requirement and derive formulae for expected sample sizes. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
494
501
http://hdl.handle.net/10.1093/biomet/ass002
application/pdf
Access to full text is restricted to subscribers.
D. Magirr
T. Jaki
J. Whitehead
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:509-5092012-08-31RePEc:oup:biomet
article
'On measuring the variability of small area estimators under a basic area level model'
2
2012
99
Biometrika
509
509
http://hdl.handle.net/10.1093/biomet/ass016
application/pdf
Access to full text is restricted to subscribers.
Gauri Sankar Datta
J. N. K. Rao
David Daniel Smith
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:385-3982013-08-02RePEc:oup:biomet
article
Weighting in survey analysis under informative sampling
Sampling related to the outcome variable of a regression analysis conditional on covariates is called informative sampling and may lead to bias in ordinary least squares estimation. Weighting by the reciprocal of the inclusion probability approximately removes such bias but may inflate variance. This paper investigates two ways of modifying such weights to improve efficiency while retaining consistency. One approach is to multiply the inverse probability weights by functions of the covariates. The second is to smooth the weights given values of the outcome variable and covariates. Optimal ways of constructing weights by these two approaches are explored. Both approaches require the fitting of auxiliary weight models. The asymptotic properties of the resulting estimators are investigated and linearization variance estimators are obtained. The approach is extended to pseudo maximum likelihood estimation for generalized linear models. The properties of the different weighted estimators are compared in a limited simulation study. The robustness of the estimators to misspecification of the auxiliary weight model or of the regression model of interest is discussed. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
385
398
http://hdl.handle.net/10.1093/biomet/ass085
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
C. J. Skinner
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:519-5242013-08-02RePEc:oup:biomet
article
A central limit theorem in the β-model for undirected random graphs with a diverging number of vertices
Chatterjee et al. (2011) established the consistency of the maximum likelihood estimator in the β-model for undirected random graphs when the number of vertices goes to infinity. By approximating the inverse of the Fisher information matrix, we prove asymptotic normality of the maximum likelihood estimator under mild conditions. Simulation studies and a data example illustrate the theoretical results. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
519
524
http://hdl.handle.net/10.1093/biomet/ass084
application/pdf
Access to full text is restricted to subscribers.
Ting Yan
Jinfeng Xu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:485-4942013-08-02RePEc:oup:biomet
article
Log-mean linear models for binary data
This paper introduces a novel class of models for binary data, which we call log-mean linear models. They are specified by linear constraints on the log-mean linear parameter, defined as a log-linear expansion of the mean parameter of the multivariate Bernoulli distribution. We show that marginal independence relationships between variables can be specified by setting certain log-mean linear interactions to zero and, more specifically, that graphical models of marginal independence are log-mean linear models. Our approach overcomes some drawbacks of the existing parameterizations of graphical models of marginal independence. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
485
494
http://hdl.handle.net/10.1093/biomet/ass080
application/pdf
Access to full text is restricted to subscribers.
A. Roverato
M. Lupparelli
L. La Rocca
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:473-4842013-08-02RePEc:oup:biomet
article
The role of the range parameter for estimation and prediction in geostatistics
Two canonical problems in geostatistics are estimating the parameters in a specified family of stochastic process models and predicting the process at new locations. We show that asymptotic results for a Gaussian process over a fixed domain with Matérn covariance function, previously proven only in the case of a fixed range parameter, can be extended to the case of jointly estimating the range and the variance of the process. Moreover, we show that intuition and approximations derived from asymptotics using a fixed range parameter can be problematic when applied to finite samples, even for large sample sizes. In contrast, we show via simulation that performance is improved and asymptotic approximations are applicable for smaller sample sizes when the parameters are jointly estimated. These effects are particularly apparent when the process is mean square differentiable or the effective range of spatial correlation is small. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
473
484
http://hdl.handle.net/10.1093/biomet/ass079
application/pdf
Access to full text is restricted to subscribers.
C. G. Kaufman
B. A. Shaby
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:269-2762013-08-02RePEc:oup:biomet
article
Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation
We show that the proportional likelihood ratio model proposed recently by Luo & Tsai (2012) enjoys model-invariant properties under certain forms of nonignorable missing mechanisms and randomly double-truncated data, so that target parameters in the population can be estimated consistently from those biased samples. We also construct an alternative estimator for the target parameters by maximizing a pseudolikelihood that eliminates a functional nuisance parameter in the model. The corresponding estimating equation has a U-statistic structure. As an added advantage of the proposed method, a simple score-type test is developed to test a null hypothesis on the regression coefficients. Simulations show that the proposed estimator has a small-sample efficiency similar to that of the nonparametric likelihood estimator and performs well for certain nonignorable missing data problems. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
269
276
http://hdl.handle.net/10.1093/biomet/ass056
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:431-4452013-08-02RePEc:oup:biomet
article
Simple tiered classifiers
In this paper we propose simple, general tiered classifiers for relatively complex data. Empirical studies on real and simulated data show that three two-tier classifiers, which are respective extensions of linear discriminant analysis, linear logistic regression and support vector machines, can reduce noticeably the relatively high misclassification error of their original single-tier counterparts, without significantly increasing computational labour. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
431
445
http://hdl.handle.net/10.1093/biomet/ass086
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Yingcun Xia
Jing-Hao Xue
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:417-4302013-08-02RePEc:oup:biomet
article
Estimation with missing data: beyond double robustness
We propose an estimator that is more robust than doubly robust estimators, based on weighting complete cases using weights other than inverse probability when estimating the population mean of a response variable subject to ignorable missingness. We allow multiple models for both the propensity score and the outcome regression. Our estimator is consistent if any of the multiple models is correctly specified. Such multiple robustness against model misspecification is a significant improvement over double robustness, which allows only one propensity score model and one outcome regression model. Our estimator attains the semiparametric efficiency bound when one propensity score model and one outcome regression model are correctly specified, without requiring knowledge of which models are correct. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
417
430
http://hdl.handle.net/10.1093/biomet/ass087
application/pdf
Access to full text is restricted to subscribers.
Peisong Han
Lu Wang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:91-1102013-08-02RePEc:oup:biomet
article
Sampling decomposable graphs using a Markov chain on junction trees
Full Bayesian computational inference for model determination in undirected graphical models is currently restricted to decomposable graphs or other special cases, except for small-scale problems, say up to 15 variables. In this paper we develop new, more efficient methodology for such inference, by making two contributions to the computational geometry of decomposable graphs. The first of these provides sufficient conditions under which it is possible to completely connect two disconnected complete subsets of vertices, or perform the reverse procedure, yet maintain decomposability of the graph. The second is a new Markov chainMonte Carlo sampler for arbitrary positive distributions on decomposable graphs, taking a junction tree representing the graph as its state variable. The resulting methodology is illustrated with numerical experiments on three models. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
91
110
http://hdl.handle.net/10.1093/biomet/ass052
application/pdf
Access to full text is restricted to subscribers.
Peter J. Green
Alun Thomas
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:3-152013-08-02RePEc:oup:biomet
article
Karl Pearson's Biometrika: 1901--36
Karl Pearson edited Biometrika for the first 35 years of its existence. Not only did he shape the journal, he also contributed over 200 pieces and inspired, more or less directly, most of the other contributions. The journal could not be separated from the man. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
3
15
http://hdl.handle.net/10.1093/biomet/ass077
application/pdf
Access to full text is restricted to subscribers.
John Aldrich
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:525-5302013-08-02RePEc:oup:biomet
article
Efficient estimation of the censored linear regression model
In linear regression or accelerated failure time models, complications in efficient estimation arise from the multiple roots of the efficient score and density estimation. This paper proposes a one-step efficient estimation method based on a counting process martingale, which has several advantages: it avoids the multiple-root problem, the initial estimator is easily available and the variance estimator can be obtained by employing plug-in rules. A simple and effective data-driven bandwidth selector is provided. The proposed estimator is proved to be semiparametric efficient, with the same asymptotic variance as the efficient estimator when the error distribution is known up to a location shift. Numerical studies with supportive evidence are presented. The proposal is applied to the Colorado Plateau uranium miners data. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
525
530
http://hdl.handle.net/10.1093/biomet/ass073
application/pdf
Access to full text is restricted to subscribers.
Yuanyuan Lin
Kani Chen
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:511-5182013-08-02RePEc:oup:biomet
article
Composite likelihood estimation for the Brown--Resnick process
Genton et al. (2011) investigated the gain in efficiency when triplewise, rather than pairwise, likelihood is used to fit the popular Smith max-stable model for spatial extremes. We generalize their results to the Brown--Resnick model and show that the efficiency gain is substantial only for very smooth processes, which are generally unrealistic in applications. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
511
518
http://hdl.handle.net/10.1093/biomet/ass089
application/pdf
Access to full text is restricted to subscribers.
R. Huser
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:319-3382013-08-02RePEc:oup:biomet
article
Using shared genetic controls in studies of gene-environment interactions
With the advent of modern genomic methods to adjust for population stratification, the use of external or publicly available controls has become an attractive option for reducing the cost of large-scale case-control genetic association studies. In this article, we study the estimation of joint effects of genetic and environmental exposures from a case-control study where data on genome-wide markers are available on the cases and a set of external controls while data on environmental exposures are available on the cases and a set of internal controls. We show that under such a design, one can exploit an assumption of gene-environment independence in the underlying population to estimate the gene-environment joint effects, after adjustment for population stratification. We develop a semiparametric profile likelihood method and related pseudolikelihood and working likelihood methods that are easy to implement in practice. We propose variance estimators for the methods based on asymptotic theory. Simulation is used to study the performance of the methods, and data from a multi-centre genome-wide association study of bladder cancer is further used to illustrate their application. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
319
338
http://hdl.handle.net/10.1093/biomet/ass078
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
Nilanjan Chatterjee
Raymond J. Carroll
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:213-2202013-08-02RePEc:oup:biomet
article
Spatially varying cross-correlation coefficients in the presence of nugget effects
We derive sufficient conditions for the cross-correlation coefficient of a multivariate spatial process to vary with location when the spatial model is augmented with nugget effects. The derived class is valid for any choice of covariance functions, and yields substantial flexibility between multiple processes. The key is to identify the cross-correlation coefficient matrix with a contraction matrix, which can be either diagonal, implying a parsimonious formulation, or a fully general contraction matrix, yielding greater flexibility but added model complexity. We illustrate the approach with a bivariate minimum and maximum temperature dataset in Colorado, allowing the two variables to be positively correlated at low elevations and nearly independent at high elevations, while still yielding a positive definite covariance matrix. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
213
220
http://hdl.handle.net/10.1093/biomet/ass057
application/pdf
Access to full text is restricted to subscribers.
William Kleiber
Marc G. Genton
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:189-2022013-08-02RePEc:oup:biomet
article
Benchmarking small area estimators
This paper considers benchmarking issues in the context of small area estimation. We find optimal estimators within the class of benchmarked linear estimators under linear constraints. This extends existing results for external and internal benchmarking, and also links the two. Necessary and sufficient conditions for self-benchmarking are found for an augmented model. Most results of this paper are found using ideas of orthogonal projection Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
189
202
http://hdl.handle.net/10.1093/biomet/ass063
application/pdf
Access to full text is restricted to subscribers.
W. R. Bell
G. S. Datta
M. Ghosh
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:339-3542013-08-02RePEc:oup:biomet
article
Estimating time-varying effects for overdispersed recurrent events data with treatment switching
In the analysis of multivariate event times, frailty models assuming time-independent regression coefficients are often considered, mainly due to their mathematical convenience. In practice, regression coefficients are often time dependent and the temporal effects are of clinical interest. Motivated by a phase III clinical trial in multiple sclerosis, we develop a semiparametric frailty modelling approach to estimate time-varying effects for overdispersed recurrent events data with treatment switching. The proposed model incorporates the treatment switching time in the time-varying coefficients. Theoretical properties of the proposed model are established and an efficient expectation-maximization algorithm is derived to obtain the maximum likelihood estimates. Simulation studies evaluate the numerical performance of the proposed model under various temporal treatment effect curves. The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching as well as for alternative models when the proportional hazard assumption is violated. A multiple sclerosis dataset is analysed to illustrate our methodology. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
339
354
http://hdl.handle.net/10.1093/biomet/ass091
application/pdf
Access to full text is restricted to subscribers.
Qingxia Chen
Donglin Zeng
Joseph G. Ibrahim
Mouna Akacha
Heinz Schmidli
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:229-2342013-08-02RePEc:oup:biomet
article
The Kolmogorov filter for variable screening in high-dimensional binary classification
Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems, including t-test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods rely on strong modelling assumptions that are easily violated in real applications. To circumvent the parametric modelling assumptions, we propose a new variable screening technique for binary classification based on the Kolmogorov--Smirnov statistic. We prove that this so-called Kolmogorov filter enjoys the sure screening property under much weakened model assumptions. We supplement our theoretical study by a simulation study. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
229
234
http://hdl.handle.net/10.1093/biomet/ass062
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:459-4712013-08-02RePEc:oup:biomet
article
Data augmentation for non-Gaussian regression models using variance-mean mixtures
We use the theory of normal variance-mean mixtures to derive a data-augmentation scheme for a class of common regularization problems. This generalizes existing theory on normal variance mixtures for priors in regression and classification. It also allows variants of the expectation-maximization algorithm to be brought to bear on a wider range of models than previously appreciated. We demonstrate the method on several examples, focusing on the case of binary logistic regression. We also show that quasi-Newton acceleration can substantially improve the speed of the algorithm without compromising its robustness. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/ass081
application/pdf
Access to full text is restricted to subscribers.
N. G. Polson
J. G. Scott
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:1-12013-08-02RePEc:oup:biomet
article
Editorial
1
2013
100
Biometrika
1
1
http://hdl.handle.net/10.1093/biomet/ast003
application/pdf
Access to full text is restricted to subscribers.
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:249-2532013-08-02RePEc:oup:biomet
article
Blocked two-level regular factorial designs with weak minimum aberration
This paper considers the construction of blocked two-level regular designs with weak minimum aberration. We first obtain the minimum value of the number of two-factor interactions which are aliased with the block effects. Based on this result, two methods are then proposed in two different scenarios to construct weak minimum aberration blocked two-level designs with respect to some existing combined wordlength patterns. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
249
253
http://hdl.handle.net/10.1093/biomet/ass061
application/pdf
Access to full text is restricted to subscribers.
Shengli Zhao
Pengfei Li
Rohana Karunamuni
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:241-2482013-08-02RePEc:oup:biomet
article
Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders
Suppose we are interested in the effect of a binary treatment on an outcome where that relationship is confounded by an ordinal confounder. We assume that the true confounder is not observed but, rather, we observe a nondifferentially mismeasured version of it. We show that, under certain monotonicity assumptions about its effect on the treatment and on the outcome, an effect measure controlling for the mismeasured confounder will fall between the corresponding crude and true effect measures. We also present results for coarsened and, under further assumptions, multiple misclassified confounders. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
241
248
http://hdl.handle.net/10.1093/biomet/ass054
application/pdf
Access to full text is restricted to subscribers.
Elizabeth L. Ogburn
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:157-1722013-08-02RePEc:oup:biomet
article
Simultaneous discovery of rare and common segment variants
Copy number variant is an important type of genetic structural variation appearing in germline DNA, ranging from common to rare in a population. Both rare and common copy number variants have been reported to be associated with complex diseases, so it is important to identify both simultaneously based on a large set of population samples. We develop a proportion adaptive segment selection procedure that automatically adjusts to the unknown proportions of the carriers of the segment variants. We characterize the detection boundary that separates the region where a segment variant is detectable by some method from the region where it cannot be detected. Although the detection boundaries are very different for the rare and common segment variants, it is shown that the proposed procedure can reliably identify both whenever they are detectable. Compared with methods for single-sample analysis, this procedure gains power by pooling information from multiple samples. The method is applied to analyse neuroblastoma samples and identifies a large number of copy number variants that are missed by single-sample methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
157
172
http://hdl.handle.net/10.1093/biomet/ass059
application/pdf
Access to full text is restricted to subscribers.
X. Jessie Jeng
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:447-4582013-08-02RePEc:oup:biomet
article
Penalized multivariate Whittle likelihood for power spectrum estimation
Nonparametric estimation procedures that can flexibly account for varying levels of smoothness among different functional parameters, such as penalized likelihoods, have been developed in a variety of settings. However, geometric constraints on power spectra have limited the development of such methods when estimating the power spectrum of a vector-valued time series. This article introduces a penalized likelihood approach to nonparametric multivariate spectral analysis through the minimization of a penalized Whittle negative loglikelihood. This likelihood is derived from the large-sample distribution of the periodogram and includes a penalty function that forms a measure of regularity on multivariate power spectra. The approach allows for varying levels of smoothness among spectral components while accounting for the positive definiteness of spectral matrices and the Hermitian and periodic structures of power spectra as functions of frequency. The consistency of the proposed estimator is derived and its empirical performance is demonstrated in a simulation study and in an analysis of indoor air quality. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
447
458
http://hdl.handle.net/10.1093/biomet/ass088
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
William O. Collinge
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:203-2122013-08-02RePEc:oup:biomet
article
Unified inference for sparse and dense longitudinal models
In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing, the estimate of the mean function, the convergence rates and the limiting variance functions are different in the two scenarios. This phenomenon poses challenges for statistical inference, as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop methods based on self-normalization that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
203
212
http://hdl.handle.net/10.1093/biomet/ass050
application/pdf
Access to full text is restricted to subscribers.
Seonjin Kim
Zhibiao Zhao
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:235-2402013-08-02RePEc:oup:biomet
article
Interval estimation of population means under unknown but bounded probabilities of sample selection
Applying concepts from partial identification to the domain of finite population sampling, we propose a method for interval estimation of a population mean when the probabilities of sample selection lie within a posited interval. The interval estimate is derived from sharp bounds on the Hajek (1971) estimator of the population mean. We demonstrate the method's utility for sensitivity analysis by applying it to a sample of needles collected as part of a syringe tracking and testing programme in New Haven, Connecticut. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
235
240
http://hdl.handle.net/10.1093/biomet/ass064
application/pdf
Access to full text is restricted to subscribers.
Peter M. Aronow
Donald K. K. Lee
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:75-892013-08-02RePEc:oup:biomet
article
Efficient Gaussian process regression for large datasets
Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n-super-3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/ass068
application/pdf
Access to full text is restricted to subscribers.
Anjishnu Banerjee
David B. Dunson
Surya T. Tokdar
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:399-4152013-08-02RePEc:oup:biomet
article
Simple design-efficient calibration estimators for rejective and high-entropy sampling
For survey calibration, consider the situation where the population totals of auxiliary variables are known or where auxiliary variables are measured for all population units. For each situation, we develop design-efficient calibration estimators under rejective or high-entropy sampling. A general approach is to extend efficient estimators for missing-data problems with independent and identically distributed data to the survey setting. We show that this approach effectively resolves two long-standing issues in existing approaches: how to achieve design efficiency regardless of a linear superpopulation model in generalized regression and calibration estimation, and how to find a simple approximation in optimal regression estimation. Moreover, the proposed approach sheds light on several issues that seem not to be well studied in the literature. Examples include use of the weighted Kullback--Leibler distance in calibration estimation, and efficient estimation allowing for misspecification of a nonlinear superpopulation model. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
399
415
http://hdl.handle.net/10.1093/biomet/ass090
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:355-3702013-08-02RePEc:oup:biomet
article
Estimation of a sparse group of sparse vectors
We consider estimating a sparse group of sparse normal mean vectors, based on penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and the numbers of their significant components, which can be performed by a fast algorithm. The resulting estimators are developed within a Bayesian framework and can be viewed as maximum a posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense settings. A simulation study demonstrates the efficiency of the proposed approach, which successfully competes with the sparse group lasso estimator. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/ass082
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:277-2812013-08-02RePEc:oup:biomet
article
Optimal estimation of Poisson intensity with partially observed covariates
Rathbun et al. (2007) and Waagepetersen (2008) propose estimating functions for parameters of Poisson point process intensity that may be applied when space- and/or time-varying covariates are sampled from a probability-based sampling design. This paper demonstrates that Waageptersen's estimating function is optimal in a class of weighted estimating functions. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
277
281
http://hdl.handle.net/10.1093/biomet/ass069
application/pdf
Access to full text is restricted to subscribers.
S. L. Rathbun
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:495-5022013-08-02RePEc:oup:biomet
article
The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing
In single hypothesis testing, power is a nondecreasing function of Type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The optimal power puzzle arises from the fact that for multiple testing under the false discovery rate paradigm, such a monotonic relationship may not hold. In particular, exact false discovery rate control may lead to a less powerful testing procedure if a test statistic fails to fulfil the monotone likelihood ratio condition. In this article, we identify different scenarios wherein the condition fails and give caveats for conducting multiple testing in practical settings. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
495
502
http://hdl.handle.net/10.1093/biomet/ast001
application/pdf
Access to full text is restricted to subscribers.
Hongyuan Cao
Wenguang Sun
Michael R. Kosorok
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:503-5102013-08-02RePEc:oup:biomet
article
A consistent multivariate test of association based on ranks of distances
We consider the problem of detecting associations between random vectors of any dimension. Few tests of independence exist that are consistent against all dependent alternatives. We propose a powerful test that is applicable in all dimensions and consistent against all alternatives. The test has a simple form, is easy to implement, and has good power. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
503
510
http://hdl.handle.net/10.1093/biomet/ass070
application/pdf
Access to full text is restricted to subscribers.
Ruth Heller
Yair Heller
Malka Gorfine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:301-3172013-08-02RePEc:oup:biomet
article
A multiple comparison procedure for hypotheses with gatekeeping structure
We develop gatekeeping procedures that focus on comparing multiple treatments with a control when there are multiple endpoints. Our procedures utilize estimated correlations among individual test statistics without parametric assumptions. We make comparisons with other gatekeeping procedures with respect to properties of the trade-off in statistical power between families of hypotheses. We introduce a reward function to facilitate these comparisons. We illustrate our methods by simulation and an analysis of data from a randomized, multi-armed clinical trial. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
301
317
http://hdl.handle.net/10.1093/biomet/ass083
application/pdf
Access to full text is restricted to subscribers.
Xiaolong Luo
Guang Chen
S. Peter Ouyang
Bruce W. Turnbull
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:125-1382013-08-02RePEc:oup:biomet
article
A nonparametric prior for simultaneous covariance estimation
In the modelling of longitudinal data from several groups, appropriate handling of the dependence structure is of central importance. Standard methods include specifying a single covariance matrix for all groups or independently estimating the covariance matrix for each group without regard to the others, but when these model assumptions are incorrect, these techniques can lead to biased mean effects or loss of efficiency, respectively. Thus, it is desirable to develop methods for simultaneously estimating the covariance matrix for each group that will borrow strength across groups in a way that is ultimately informed by the data. In addition, for several groups with covariance matrices of even medium dimension, it is difficult to manually select a single best parametric model among the huge number of possibilities given by incorporating structural zeros and/or commonality of individual parameters across groups. In this paper we develop a family of nonparametric priors using the matrix stick-breaking process of Dunson et al. (2008) that seeks to accomplish this task by parameterizing the covariance matrices in terms of their modified Cholesky decompositions (Pourahmadi, 1999). We establish some theoretical properties of these priors, examine their effectiveness via a simulation study, and illustrate the priors using data from a longitudinal clinical trial. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
125
138
http://hdl.handle.net/10.1093/biomet/ass060
application/pdf
Access to full text is restricted to subscribers.
Jeremy T. Gaskins
Michael J. Daniels
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:17-732013-08-02RePEc:oup:biomet
article
Biometrika highlights from volume 28 onwards
Highlights, trends and influences are identified associated with the pages of Biometrika subsequent to the editorship of Karl Pearson. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
17
73
http://hdl.handle.net/10.1093/biomet/ass076
application/pdf
Access to full text is restricted to subscribers.
D. M. Titterington
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:254-2602013-08-02RePEc:oup:biomet
article
Strong orthogonal arrays and associated Latin hypercubes for computer experiments
This paper introduces, constructs and studies a new class of arrays, called strong orthogonal arrays, as suitable designs for computer experiments. A strong orthogonal array of strength t enjoys better space-filling properties than a comparable orthogonal array in all dimensions lower than t while retaining the space-filling properties of the latter in t dimensions. Latin hypercubes based on strong orthogonal arrays of strength t are more space-filling than comparable orthogonal array-based Latin hypercubes in all g dimensions for any 2 ≤ g ≤ t - 1. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
254
260
http://hdl.handle.net/10.1093/biomet/ass065
application/pdf
Access to full text is restricted to subscribers.
Yuanzhen He
Boxin Tang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:173-1872013-08-02RePEc:oup:biomet
article
Smoothed nonparametric estimation for current status competing risks data
We study the nonparametric estimation of the cumulative incidence function and the cause-specific hazard function for current status data with competing risks via kernel smoothing. A smoothed naive nonparametric maximum likelihood estimator and a smoothed full nonparametric maximum likelihood estimator are shown to have pointwise asymptotic normality and faster convergence rates than the corresponding unsmoothed nonparametric likelihood estimators. Using the smoothed estimators and the plug-in principle, we can estimate the cause-specific hazard function, which has not been studied previously. We also propose semi-smoothed estimators of the cause-specific hazard as an alternative to the smoothed estimator and demonstrate that neither is uniformly more efficient than the other. Numerical studies show that a smoothed bootstrap method works well for selecting the bandwidths in the smoothed nonparametric estimation. The use of the estimators is exemplified by an application to cumulative incidence and hazard of subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
173
187
http://hdl.handle.net/10.1093/biomet/ass053
application/pdf
Access to full text is restricted to subscribers.
Chenxi Li
Jason P. Fine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:283-3002013-08-02RePEc:oup:biomet
article
Simultaneous confidence intervals uniformly more likely to determine signs
Many studies draw inferences about multiple endpoints but ignore the statistical implications of multiplicity. Effects inferred to be positive when there is no adjustment for multiplicity can lose their statistical significance when multiplicity is taken into account, perhaps explaining why such adjustments are so often omitted. We develop new simultaneous confidence intervals that mitigate this problem; these are uniformly more likely to determine signs than are standard simultaneous confidence intervals. When one or more of the parameter estimates are small, the new intervals sacrifice some length to avoid crossing zero; but when all the parameter estimates are large, the new intervals coincide with standard simultaneous confidence intervals, so there is no loss of precision. When only a small fraction of the estimates are small, the procedure can determine signs essentially as well as one-sided tests with prespecified directions, incurring only a modest penalty in maximum length. The intervals are constructed by inverting level-α tests to form a 1 - α confidence set, and then projecting that set onto the coordinate axes to get confidence intervals. The tests have hyper-rectangular acceptance regions that minimize the maximum amount by which the acceptance region protrudes from the orthant that contains the hypothesized parameter value, subject to a constraint on the maximum side-length of the hyper-rectangle. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
283
300
http://hdl.handle.net/10.1093/biomet/ass074
application/pdf
Access to full text is restricted to subscribers.
Yoav Benjamini
Vered Madar
Philip B. Stark
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:371-3832013-08-02RePEc:oup:biomet
article
Efficiency loss and the linearity condition in dimension reduction
Linearity, sometimes jointly with constant variance, is routinely assumed in the context of sufficient dimension reduction. It is well understood that, when these conditions do not hold, blindly using them may lead to inconsistency in estimating the central subspace and the central mean subspace. Surprisingly, we discover that even if these conditions do hold, using them will bring efficiency loss. This paradoxical phenomenon is illustrated through sliced inverse regression and principal Hessian directions. The efficiency loss also applies to other dimension reduction procedures. We explain this empirical discovery by theoretical investigation. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
371
383
http://hdl.handle.net/10.1093/biomet/ass075
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Liping Zhu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:531-5372013-08-02RePEc:oup:biomet
article
On the likelihood ratio test for envelope models in multivariate linear regression
We investigate the likelihood ratio test for a hypothesis regarding the dimension of the Σ-envelope of span(β) in a multivariate linear regression model. The asymptotic null distribution of the likelihood ratio statistic is obtained as some nuisance parameters approach infinity. A saddlepoint approximation is also given for this limiting distribution. The accuracy of this approximation and its comparison to the standard chi-squared approximation are assessed via simulation. The results can be used in a similar test for partial envelope models. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
531
537
http://hdl.handle.net/10.1093/biomet/ast002
application/pdf
Access to full text is restricted to subscribers.
James R. Schott
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:571-5862013-11-29RePEc:oup:biomet
article
Statistics of orthogonal axial frames
An orthogonal axial frame is a set of orthonormal unit vectors which are known only up to sign. Such frames arise in crystallography and seismology and as principal axes of multivariate data or of some physical tensors. We develop methods for analysing data of this form. A test of uniformity is given. Parametric models for orthogonal axial frames are presented and tests of location are considered. A brief illustrative example involving earthquakes is given. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
571
586
http://hdl.handle.net/10.1093/biomet/ast017
application/pdf
Access to full text is restricted to subscribers.
R. Arnold
P. E. Jupp
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:778-7802013-11-29RePEc:oup:biomet
article
Convergence of Luo and Tsai's iterative algorithm for estimation in proportional likelihood ratio models
Luo & Tsai (2012, Biometrika) introduced the proportional likelihood ratio model. They proposed an iterative algorithm for the estimation of the baseline distribution function but did not establish its convergence. Here we provide a proof of convergence. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
778
780
http://hdl.handle.net/10.1093/biomet/ast019
application/pdf
Access to full text is restricted to subscribers.
O. Davidov
G. Iliopoulos
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:695-7082013-11-29RePEc:oup:biomet
article
More efficient estimators for case-cohort studies
The case-cohort study design, used to reduce costs in large cohort studies, involves a random sample of the entire cohort, called the subcohort, augmented with subjects having the disease of interest but not in the subcohort sample. When several diseases are of interest, multiple case-cohort studies may be conducted using the same subcohort, with each disease analysed separately, ignoring the additional exposure measurements collected on subjects with the other diseases. This is not an efficient use of the data, and in this paper we propose more efficient estimators. We consider both joint and separate analyses for the multiple diseases. We propose an estimating equation approach with a new weight function, and we establish the consistency and asymptotic normality of the resulting estimator. Simulation studies show that the proposed methods using all available information lead to gains in efficiency. We apply our proposed method to data from the Busselton Health Study. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
695
708
http://hdl.handle.net/10.1093/biomet/ast018
application/pdf
Access to full text is restricted to subscribers.
S. Kim
J. Cai
W. Lu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:727-7402013-11-29RePEc:oup:biomet
article
Nonparametric estimation of the mean function for recurrent event data with missing event category
Recurrent event data frequently arise in longitudinal studies when study subjects possibly experience more than one event during the observation period. Often, such recurrent events can be categorized. However, part of the categorization may be missing due to technical difficulties. If the event types are missing completely at random, then a complete case analysis may provide consistent estimates of regression parameters in certain regression models, but estimates of the baseline event rates are generally biased. Previous work on nonparametric estimation of these rates has utilized parametric missingness models. In this paper, we develop fully nonparametric methods in which the missingness mechanism is completely unspecified. Consistency and asymptotic normality of the nonparametric estimators of the mean event functions accommodate nonparametric estimators of the event category probabilities, which converge more slowly than the parametric rate. Plug-in variance estimators are provided and perform well in simulation studies, where complete case estimators may exhibit large biases and parametric estimators generally have a larger mean squared error when the model is misspecified. The proposed methods are applied to data from a cystic fibrosis registry. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/ast016
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Jianwen Cai
Jason P. Fine
Huichuan J. Lai
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:655-6702013-11-29RePEc:oup:biomet
article
High-dimensional semiparametric bigraphical models
In multivariate analysis, a Gaussian bigraphical model is commonly used for modelling matrix-valued data. In this paper, we propose a semiparametric extension of the Gaussian bigraphical model, called the nonparanormal bigraphical model. A projected nonparametric rank-based regularization approach is employed to estimate sparse precision matrices and produce graphs under a penalized likelihood framework. Theoretically, our semiparametric procedure achieves the parametric rates of convergence for both matrix estimation and graph recovery. Empirically, our approach outperforms the parametric Gaussian model for non-Gaussian data and is competitive with its parametric counterpart for Gaussian data. Extensions to the categorical bigraphical model and the missing data problem are discussed. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
655
670
http://hdl.handle.net/10.1093/biomet/ast009
application/pdf
Access to full text is restricted to subscribers.
Yang Ning
Han Liu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:671-6802013-11-29RePEc:oup:biomet
article
Inverse probability weighting with error-prone covariates
Inverse probability-weighted estimators are widely used in applications where data are missing due to nonresponse or censoring and in the estimation of causal effects from observational studies. Current estimators rely on ignorability assumptions for response indicators or treatment assignment and outcomes being conditional on observed covariates which are assumed to be measured without error. However, measurement error is common for the variables collected in many applications. For example, in studies of educational interventions, student achievement as measured by standardized tests is almost always used as the key covariate for removing hidden biases, but standardized test scores may have substantial measurement errors. We provide several expressions for a weighting function that can yield a consistent estimator for population means using incomplete data and covariates measured with error. We propose a method to estimate the weighting function from data. The results of a simulation study show that the estimator is consistent and has no bias and small variance. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
671
680
http://hdl.handle.net/10.1093/biomet/ast022
application/pdf
Access to full text is restricted to subscribers.
Daniel F. McCaffrey
J. R. Lockwood
Claude M. Setodji
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:587-6062013-11-29RePEc:oup:biomet
article
Automatic declustering of rare events
The analysis of events with low probability but disastrous impact entails understanding how they cluster in time. We present an automatic three-step procedure for identifying clusters, estimating the cluster size distribution and constructing confidence intervals for the extremal index, which measures the degree of clustering of rare events. The third step combines empirical likelihood and parametric likelihood approaches. Simulations show that our new procedure performs very well for finite samples and outperforms previous methods in constructing confidence intervals for the extremal index when there is clustering in the data, as well as in estimating probabilities for small clusters. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
587
606
http://hdl.handle.net/10.1093/biomet/ast013
application/pdf
Access to full text is restricted to subscribers.
C. Y. Robert
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:555-5692013-11-29RePEc:oup:biomet
article
A unified approach to robust estimation in finite population sampling
We argue that the conditional bias associated with a sample unit can be a useful measure of influence in finite population sampling. We use the conditional bias to derive robust estimators that are obtained by downweighting the most influential sample units. Under the model-based approach to inference, our proposed robust estimator is closely related to the well-known estimator of Chambers (1986). Under the design-based approach, it possesses the desirable feature of being applicable with most sampling designs used in practice. For stratified simple random sampling, it is essentially equivalent to the estimator of Kokic & Bell (1994). The proposed robust estimator depends on a tuning constant. In this paper, we propose a method for determining the tuning constant and show that the resulting estimator is consistent. Results from a simulation study suggest that our approach improves the efficiency of standard nonrobust estimators when the population contains units that may be influential if selected in the sample. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
555
569
http://hdl.handle.net/10.1093/biomet/ast010
application/pdf
Access to full text is restricted to subscribers.
J.-F. Beaumont
D. Haziza
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:771-7772013-11-29RePEc:oup:biomet
article
Species sampling models: consistency for the number of species
This paper considers species sampling models using constructions that arise from Bayesian nonparametric prior distributions. A discrete random measure, used to generate a species sampling model, can have either a countable infinite number of atoms, which has been the emphasis in the recent literature, or a finite number of atoms K, while allowing K to be assigned a prior probability distribution on the positive integers. It is the latter class of model we consider here, due to the interpretation of K as the number of species. We demonstrate the consistency of the posterior distribution of K as the sample size increases. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
771
777
http://hdl.handle.net/10.1093/biomet/ast006
application/pdf
Access to full text is restricted to subscribers.
P. G. Bissiri
A. Ongaro
S. G. Walker
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:681-6942013-11-29RePEc:oup:biomet
article
Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
A dynamic treatment regime is a list of sequential decision rules for assigning treatment based on a patient's history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignment. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method's performance and robustness to model misspecification, which is a key concern. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
681
694
http://hdl.handle.net/10.1093/biomet/ast014
application/pdf
Access to full text is restricted to subscribers.
Baqun Zhang
Anastasios A. Tsiatis
Eric B. Laber
Marie Davidian
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:539-5532013-11-29RePEc:oup:biomet
article
A general modelling framework for multivariate disease mapping
This paper deals with multivariate disease mapping. We propose a novel framework that encompasses most of the models already proposed. Our framework starts with a simple identity, reformulating Kronecker products of covariance matrices as simple matrix products. This formula is computationally convenient, and its generalizations reproduce most of the proposals in the disease mapping literature. Use of the identity leads to a flexible, general and computationally convenient modelling framework, making it possible to combine spatial dependence structures and different relationships between diseases with limited effort. Moreover, as the proposed modelling framework covers most of the Gaussian Markov random field-based multivariate disease mapping models in the literature, it allows comparison of all these models in a common context, thus helping us to understand them better. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/ast023
application/pdf
Access to full text is restricted to subscribers.
Miguel A. Martinez-Beneito
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:757-7632013-11-29RePEc:oup:biomet
article
Adjusted regression estimation for time-to-event data with differential measurement error
Differential measurement error data plausibly arise in epidemiology and biomedical studies but have been rarely dealt with explicitly, especially for time-to-event data. We propose an estimation equation correction method in semiparametric censored linear regression to deal with differential measurement error for time-to-event data with validation samples. The method does not require explicit modelling of the error-prone covariates and leads to unbiased estimation. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
757
763
http://hdl.handle.net/10.1093/biomet/ast007
application/pdf
Access to full text is restricted to subscribers.
Menggang Yu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:764-7702013-11-29RePEc:oup:biomet
article
Survival analysis without survival data: connecting length-biased and case-control data
We show that relative mean survival parameters of a semiparametric log-linear model can be estimated using covariate data from an incident sample and a prevalent sample, even when there is no prospective follow-up to collect any survival data. Estimation is based on an induced semiparametric density ratio model for covariates from the two samples, and it shares the same structure as for a logistic regression model for case-control data. Likelihood inference coincides with well-established methods for case-control data. We show two further related results. First, estimation of interaction parameters in a survival model can be performed using covariate information only from a prevalent sample, analogous to a case-only analysis. Furthermore, propensity score and conditional exposure effect parameters on survival can be estimated using only covariate data collected from incident and prevalent samples. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
764
770
http://hdl.handle.net/10.1093/biomet/ast008
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:709-7262013-11-29RePEc:oup:biomet
article
Robust analysis of semiparametric renewal process models
A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood, with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated, owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
709
726
http://hdl.handle.net/10.1093/biomet/ast011
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Young K. Truong
Jason P. Fine
oai:RePEc:oup:biomet:v:100:y:2013:i:4:p:1024-10242015-03-30RePEc:oup:biomet
article
'Biometrika highlights from volume 28 onwards'
4
2013
100
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/ast061
application/pdf
Access to full text is restricted to subscribers.
D. M. Titterington
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:65-782013-11-15RePEc:oup:biomet
article
Marginal analyses of longitudinal data with an informative pattern of observations
We consider solutions to generalized estimating equations with singular working correlation matrices, of which the estimator of Diggle et al. (2007) is a special case. We give explicit conditions for consistent estimation when the pattern of observations may be informative. In such cases, simulations reveal reduced bias and reduced mean squared error compared with existing alternatives. A study of peritoneal dialysis is used to illustrate the methodology. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
65
78
http://hdl.handle.net/10.1093/biomet/asp068
application/pdf
Access to full text is restricted to subscribers.
D. M. Farewell
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:31-482013-11-15RePEc:oup:biomet
article
Incorporating prior probabilities into high-dimensional classifiers
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
31
48
http://hdl.handle.net/10.1093/biomet/asp081
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Jing-Hao Xue
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:254-2592013-11-15RePEc:oup:biomet
article
The maximal data piling direction for discrimination
We study a discriminant direction vector that generally exists only in high-dimension, low sample size settings. Projections of data onto this direction vector take on only two distinct values, one for each class. There exist infinitely many such directions in the subspace generated by the data; but the maximal data piling vector has the longest distance between the projections. This paper investigates mathematical properties and classification performance of this discrimination method. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
254
259
http://hdl.handle.net/10.1093/biomet/asp084
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:223-2302013-11-15RePEc:oup:biomet
article
Global and local spectral-based tests for periodicities
We investigate tests for periodicity based on a spectral analysis of a time series, differentiating between global and local spectral-based tests. Global tests use information across the entire frequency band,whereas local tests are based on a window around the test frequency.We show that many spectral-based tests can be expressed in terms of a regression-based F test, which allows for approximate size and power calculations. Since global tests are usually derived assuming white noise errors, we extend to the correlated noise case. We demonstrate via a Monte Carlo study that although the global test may have better size and power, local tests are easier to use, and are comparable or better in terms of the power to detect periodicities, especially for spectra with a large dynamic range. We apply this methodology to a nonbehavioural test of hearing. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
223
230
http://hdl.handle.net/10.1093/biomet/asp079
application/pdf
Access to full text is restricted to subscribers.
L. Wei
P. F. Craigmile
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:209-2142013-11-15RePEc:oup:biomet
article
A note on the sensitivity to assumptions of a generalized linear mixed model
A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
209
214
http://hdl.handle.net/10.1093/biomet/asp083
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
M. Y. Wong
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:199-2082013-11-15RePEc:oup:biomet
article
Forecasting for quantile self-exciting threshold autoregressive time series models
Self-exciting threshold autoregressive time series models have been used extensively, and the conditional mean obtained from these models can be used to predict the future value of a random variable. In this paper we consider quantile forecasts of a time series based on the quantile self-exciting threshold autoregressive time series models proposed by Cai and Stander (2008) and present a new forecasting method for them. Simulation studies and application to real time series show that the method works very well. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
199
208
http://hdl.handle.net/10.1093/biomet/asp070
application/pdf
Access to full text is restricted to subscribers.
Yuzhi Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:123-1322013-11-15RePEc:oup:biomet
article
Sharp bounds on causal effects in case-control and cohort studies
Evaluating the causal effect of an exposure on a response from case-control and cohort studies is a major concern in epidemiological and medical research. Since causal effects are in general nonidentifiable from such studies, this paper derives bounds on two causal measures: the causal risk difference and the causal risk ratio. We use the potential response approach and a linear programming method to derive sharp bounds on the causal risk difference, and a novel fractional programming method to derive bounds on the causal risk ratio. In addition, in the presence of missing data, we consider three different missingness mechanisms and propose sharp bounds under these situations. The results provide new guidance on causal inference in case-control and cohort studies. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
123
132
http://hdl.handle.net/10.1093/biomet/asp076
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
Zhi Geng
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:215-2222013-11-15RePEc:oup:biomet
article
Pseudo-score confidence intervals for parameters in discrete statistical models
We propose pseudo-score confidence intervals for parameters in models for discrete data. The confidence interval is obtained by inverting a test that uses a Pearson chi-squared statistic to compare fitted values for the working model with fitted values of the model when a parameter of interest takes various fixed values. For multinomial models, the pseudo-score method simplifies to the score method when the model is saturated and otherwise it is asymptotically equivalent to score and likelihood ratio test-based inferences. For cases in which ordinary score methods are impractical, such as when the likelihood function is not an explicit function of model parameters, the pseudo-score method is feasible. We illustrate the method for four such examples. Generalizations of the method are also presented for future research, including inference for complex sampling designs using a quasilikelihood Pearson statistic that compares fitted values for two models relative to the variance of the observations under the simpler model. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
215
222
http://hdl.handle.net/10.1093/biomet/asp074
application/pdf
Access to full text is restricted to subscribers.
Alan Agresti
Euijung Ryu
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:238-2452013-11-15RePEc:oup:biomet
article
Nonparametric Bayesian inference for the spectral density function of a random field
A powerful technique for inference concerning spatial dependence in a random field is to use spectral methods based on frequency domain analysis. Here we develop a nonparametric Bayesian approach to statistical inference for the spectral density of a random field. We construct a multi-dimensional Bernstein polynomial prior for the spectral density and devise a Markov chain Monte Carlo algorithm to simulate from the posterior of the spectral density. The posterior sampling enables us to obtain a smoothed estimate of the spectral density as well as credible bands at desired levels. Simulation shows that our proposed method is more robust than a parametric approach. For illustration, we analyse a soil data example. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
238
245
http://hdl.handle.net/10.1093/biomet/asp066
application/pdf
Access to full text is restricted to subscribers.
Yanbing Zheng
Jun Zhu
Anindya Roy
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:159-1702013-11-15RePEc:oup:biomet
article
Mean loglikelihood and higher-order approximations
Higher-order approximations to p-values can be obtained from the loglikelihood function and a reparameterization that can be viewed as a canonical parameter in an exponential family approximation to the model. This approach clarifies the connection between Skovgaard (1996) and Fraser et al. (1999a), and shows that the Skovgaard approximation can be obtained directly using the mean loglikelihood function. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
159
170
http://hdl.handle.net/10.1093/biomet/asq001
application/pdf
Access to full text is restricted to subscribers.
N. Reid
D. A. S. Fraser
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:133-1452013-11-15RePEc:oup:biomet
article
A semiparametric random effects model for multivariate competing risks data
We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
133
145
http://hdl.handle.net/10.1093/biomet/asp082
application/pdf
Access to full text is restricted to subscribers.
Thomas H. Scheike
Yanqing Sun
Mei-Jie Zhang
Tina Kold Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:1-132013-11-15RePEc:oup:biomet
article
Systematic sampling with errors in sample locations
Systematic sampling of points in continuous space is widely used in microscopy and spatial surveys. Classical theory provides asymptotic expressions for the variance of estimators based on systematic sampling as the grid spacing decreases. However, the classical theory assumes that the sample grid is exactly periodic; real physical sampling procedures may introduce errors in the placement of the sample points. This paper studies the effect of errors in sample positioning on the variance of estimators in the case of one-dimensional systematic sampling. First we sketch a general approach to variance analysis using point process methods. We then analyze three different models for the error process, calculate exact expressions for the variances, and derive asymptotic variances. Errors in the placement of sample points can lead to substantial inflation of the variance, dampening of zitterbewegung, that is fluctuation effects, and a slower order of convergence. This suggests that the current practice in some areas of microscopy may be based on over-optimistic predictions of estimator accuracy. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
1
13
http://hdl.handle.net/10.1093/biomet/asp067
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Adrian Baddeley
Karl-Anton Dorph-Petersen
Eva B. Vedel Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:95-1082013-11-15RePEc:oup:biomet
article
On the use of stochastic ordering to test for trend with clustered binary data
We introduce the use of stochastic ordering for defining treatment-related trend in clustered exchangeable binary data for both when cluster sizes are fixed and when they vary randomly. In the latter case, there is a well-documented tendency for such data to be sparse, a problem we address by making an assumption of interpretability or, equivalently, marginal compatibility. Our procedures are based on a representation of the joint distribution of binary exchangeable random variables by a saturated model, and may hence be considered nonparametric. The definition of trend by stochastic ordering is proposed to ensure a flexibility that allows for various forms of monotone increases in response to the cluster as a whole to be included in the evaluation of the trend. We obtain maximum likelihood estimates of probability functions under stochastic ordering using mixture-likelihood-based algorithms. Since the data are sparse, we avoid the use of asymptotic results and obtain p-values of the likelihood ratio procedures by permutation resampling. We demonstrate how the proposed framework can be used in risk assessment, and provide comparisons with existing procedures. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
95
108
http://hdl.handle.net/10.1093/biomet/asp077
application/pdf
Access to full text is restricted to subscribers.
Aniko Szabo
E. Olusegun George
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:79-932013-11-15RePEc:oup:biomet
article
Generalized empirical likelihood methods for analyzing longitudinal data
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
79
93
http://hdl.handle.net/10.1093/biomet/asp073
application/pdf
Access to full text is restricted to subscribers.
Suojin Wang
Lianfen Qian
Raymond J. Carroll
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:109-1212013-11-15RePEc:oup:biomet
article
Stochastic approximation with virtual observations for dose-finding on discrete levels
Phase I clinical studies are experiments in which a new drug is administered to humans to determine the maximum dose that causes toxicity with a target probability. Phase I dose-finding is often formulated as a quantile estimation problem. For studies with a biological endpoint, it is common to define toxicity by dichotomizing the continuous biomarker expression. In this article, we propose a novel variant of the Robbins--Monro stochastic approximation that utilizes the continuous measurements for quantile estimation. The Robbins--Monro method has seldom seen clinical applications, because it does not perform well for quantile estimation with binary data and it works with a continuum of doses that are generally not available in practice. To address these issues, we formulate the dose-finding problem as root-finding for the mean of a continuous variable, for which the stochastic approximation procedure is efficient. To accommodate the use of discrete doses, we introduce the idea of virtual observation that is defined on a continuous dosage range. Our proposed method inherits the convergence properties of the stochastic approximation algorithm and its computational simplicity. Simulations based on real trial data show that our proposed method improves accuracy compared with the continual re-assessment method and produces results robust to model misspecification. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
109
121
http://hdl.handle.net/10.1093/biomet/asp065
application/pdf
Access to full text is restricted to subscribers.
Ying Kuen Cheung
Mitchell S. V. Elkind
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:15-302013-11-15RePEc:oup:biomet
article
Cross-covariance functions for multivariate random fields based on latent dimensions
The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
15
30
http://hdl.handle.net/10.1093/biomet/asp078
application/pdf
Access to full text is restricted to subscribers.
Tatiyana V. Apanasovich
Marc G. Genton
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:246-2532013-11-15RePEc:oup:biomet
article
The distribution-based p-value for the outlier sum in differential gene expression analysis
Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
246
253
http://hdl.handle.net/10.1093/biomet/asp075
application/pdf
Access to full text is restricted to subscribers.
Lin-An Chen
Dung-Tsa Chen
Wenyaw Chan
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:147-1582013-11-15RePEc:oup:biomet
article
Estimation of the retransformed conditional mean in health care cost studies
We propose a new approach for analyzing skewed and heteroscedastic health care cost data through regression of the conditional quantiles of the transformed cost. Using the appealing equivariance property of quantiles to monotone transformations, we propose a distribution-free estimator of the conditional mean cost on the original scale. The proposed method is extended to a two-part heteroscedastic model to account for zero costs commonly seen in health care cost studies. Simulation studies indicate that the proposed estimator has competitive and more robust performance than existing estimators in various heteroscedastic models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
147
158
http://hdl.handle.net/10.1093/biomet/asp072
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Xiao-Hua Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:171-1802013-11-15RePEc:oup:biomet
article
On doubly robust estimation in a semiparametric odds ratio model
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007). Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
171
180
http://hdl.handle.net/10.1093/biomet/asp062
application/pdf
Access to full text is restricted to subscribers.
Eric J. Tchetgen Tchetgen
James M. Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:231-2372013-11-15RePEc:oup:biomet
article
Weighted least squares approximate restricted likelihood estimation for vector autoregressive processes
We derive a weighted least squares approximate restricted likelihood estimator for a k-dimensional pth-order autoregressive model with intercept. Exact likelihood optimization of this model is generally infeasible due to the parameter space, which is complicated and high-dimensional, involving pk-super-2 parameters. The weighted least squares estimator has significantly reduced bias and mean squared error than the ordinary least squares estimator for both stationary and nonstationary processes. Furthermore, at the unit root, the limiting distribution of the weighted least squares approximate restricted likelihood estimator is shown to be the zero-intercept Dickey--Fuller distribution, unlike the ordinary least squares with intercept estimator that has a different distribution with significantly higher bias. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
231
237
http://hdl.handle.net/10.1093/biomet/asp071
application/pdf
Access to full text is restricted to subscribers.
Willa W. Chen
Rohit S. Deo
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:601-6152013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood for proportional hazards models with biased-sampling data
We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
601
615
http://hdl.handle.net/10.1093/biomet/asp026
application/pdf
Access to full text is restricted to subscribers.
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:677-6902013-06-14RePEc:oup:biomet
article
Optimal repeated measurement designs for a model with partial interactions
We consider crossover designs for a model with partial interactions. In this model, the carryover effect depends on whether the treatment is preceded by itself or not. When the aim of the experiment is to study the total effects corresponding to a single treatment, we obtain approximate optimal symmetric designs, within the competing class of circular designs, by generalizing the method introduced by Kushner (1997) and Kunert & Martin (2000). This generalization places the method proposed by Bailey & Druilhet (2004) into Kushner's context. The optimal designs obtained are not binary, as in Kunert & Martin (2000). We also propose efficient designs generated by only one sequence. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
677
690
http://hdl.handle.net/10.1093/biomet/asp034
application/pdf
Access to full text is restricted to subscribers.
P. Druilhet
W. Tinsson
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:723-7342013-06-14RePEc:oup:biomet
article
Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
723
734
http://hdl.handle.net/10.1093/biomet/asp033
application/pdf
Access to full text is restricted to subscribers.
Weihua Cao
Anastasios A. Tsiatis
Marie Davidian
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:497-5122013-06-14RePEc:oup:biomet
article
Objective Bayesian model selection in Gaussian graphical models
This paper presents a default model-selection procedure for Gaussian graphical models that involves two new developments. First, we develop a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyper-inverse Wishart g-prior, and show how it corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors. Second, we apply a class of priors that automatically handles the problem of multiple hypothesis testing. We demonstrate our methods on a variety of simulated examples, concluding with a real example analyzing covariation in mutual-fund returns. These studies reveal that the combined use of a multiplicity-correction prior on graphs and fractional Bayes factors for computing marginal likelihoods yields better performance than existing Bayesian methods. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
497
512
http://hdl.handle.net/10.1093/biomet/asp017
application/pdf
Access to full text is restricted to subscribers.
C. M. Carvalho
J. G. Scott
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:529-5442013-06-14RePEc:oup:biomet
article
Asymptotic properties of penalized spline estimators
We study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines, with knots equal to the data points and a penalty controlling the roughness of the fit. Depending on the number of knots, sample size and penalty, we show that the theoretical properties of penalized regression spline estimators are either similar to those of regression splines or to those of smoothing splines, with a clear breakpoint distinguishing the cases. We prove that using fewer knots results in better asymptotic rates than when using a large number of knots. We obtain expressions for bias and variance and asymptotic rates for the number of knots and penalty parameter. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
529
544
http://hdl.handle.net/10.1093/biomet/asp035
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Tatyana Krivobokova
Jean D. Opsomer
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:751-7602013-06-14RePEc:oup:biomet
article
A Student t-mixture autoregressive model with applications to heavy-tailed financial data
We introduce the class of Student t-mixture autoregressive models, which is promising for financial time series modelling. The model is able to capture serial correlations, time-varying means and volatilities, and the shape of the conditional distributions can be time varied from short-tailed to long-tailed, or from unimodal to multimodal. The use of t-distributed errors in each component of the model allows conditional leptokurtic distributions that account for the commonly observed excess unconditional kurtosis in financial data. Methods of parameter estimation and model selection are given. Finally, the proposed modelling procedure is illustrated through a real example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
751
760
http://hdl.handle.net/10.1093/biomet/asp031
application/pdf
Access to full text is restricted to subscribers.
C. S. Wong
W. S. Chan
P. L. Kam
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:617-6332013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood estimators for the Cox regression model with missing covariates
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
617
633
http://hdl.handle.net/10.1093/biomet/asp027
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
Qiang Xu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:691-7092013-06-14RePEc:oup:biomet
article
Use of functionals in linearization and composite estimation with application to two-sample survey data
An important problem associated with two-sample surveys is the estimation of nonlinear functions of finite population totals such as ratios, correlation coefficients or measures of income inequality. Computation and estimation of the variance of such complex statistics are made more difficult by the existence of overlapping units. In one-sample surveys, the linearization method based on the influence function approach is a powerful tool for variance estimation. We introduce a two-sample linearization technique that can be viewed as a generalization of the one-sample influence function approach. Our technique is based on expressing the parameters of interest as multivariate functionals of finite and discrete measures and then using partial influence functions to compute the linearized variables. Under broad assumptions, the asymptotic variance of the substitution estimator, derived from Deville (1999), is shown to be the variance of a weighted sum of the linearized variables. The paper then focuses on a general class of composite substitution estimators, and from this class the optimal estimator for minimizing the asymptotic variance is obtained. The efficiency of the optimal composite estimator is demonstrated through an empirical study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
691
709
http://hdl.handle.net/10.1093/biomet/asp039
application/pdf
Access to full text is restricted to subscribers.
C. Goga
J.-C. Deville
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:577-5902013-06-14RePEc:oup:biomet
article
Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data
This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton--Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
577
590
http://hdl.handle.net/10.1093/biomet/asp025
application/pdf
Access to full text is restricted to subscribers.
Lynn M. Johnson
Robert L. Strawderman
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:545-5582013-06-14RePEc:oup:biomet
article
Empirical Bayes estimation for additive hazards regression models
We develop a novel empirical Bayesian framework for the semiparametric additive hazards regression model. The integrated likelihood, obtained by integration over the unknown prior of the nonparametric baseline cumulative hazard, can be maximized using standard statistical software. Unlike the corresponding full Bayes method, our empirical Bayes estimators of regression parameters, survival curves and their corresponding standard errors have easily computed closed-form expressions and require no elicitation of hyperparameters of the prior. The method guarantees a monotone estimator of the survival function and accommodates time-varying regression coefficients and covariates. To facilitate frequentist-type inference based on large-sample approximation, we present the asymptotic properties of the semiparametric empirical Bayes estimates. We illustrate the implementation and advantages of our methodology with a reanalysis of a survival dataset and a simulation study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
545
558
http://hdl.handle.net/10.1093/biomet/asp024
application/pdf
Access to full text is restricted to subscribers.
Debajyoti Sinha
M. Brent McHenry
Stuart R. Lipsitz
Malay Ghosh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:635-6442013-06-14RePEc:oup:biomet
article
Approximating the α-permanent
The standard matrix permanent is the solution to a number of combinatorial and graph-theoretic problems, and the α-weighted permanent is the density function for a class of Cox processes called boson processes. The exact computation of the ordinary permanent is known to be #P-complete, and the same appears to be the case for the α-permanent for most values of α. At present, the lack of a satisfactory algorithm for approximating the α-permanent is a formidable obstacle to the use of boson processes in applied work. This paper proposes an importance-sampling estimator using nonuniform random permutations generated in a cycle format. Empirical investigation reveals that the estimator works well for the sorts of matrices that arise in point-process applications, involving up to a few hundred points. We conclude with a numerical illustration of the Bayes estimate of the intensity function of a boson point process, which is a ratio of α-permanents. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
635
644
http://hdl.handle.net/10.1093/biomet/asp036
application/pdf
Access to full text is restricted to subscribers.
S. C. Kou
P. McCullagh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:711-7222013-06-14RePEc:oup:biomet
article
Effects of data dimension on empirical likelihood
We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (2008). Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
711
722
http://hdl.handle.net/10.1093/biomet/asp037
application/pdf
Access to full text is restricted to subscribers.
Song Xi Chen
Liang Peng
Ying-Li Qin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:735-7492013-06-14RePEc:oup:biomet
article
A negative binomial model for time series of counts
We study generalized linear models for time series of counts, where serial dependence is introduced through a dependent latent process in the link function. Conditional on the covariates and the latent process, the observation is modelled by a negative binomial distribution. To estimate the regression coefficients, we maximize the pseudolikelihood that is based on a generalized linear model with the latent process suppressed. We show the consistency and asymptotic normality of the generalized linear model estimator when the latent process is a stationary strongly mixing process. We extend the asymptotic results to generalized linear models for time series, where the observation variable, conditional on covariates and a latent process, is assumed to have a distribution from a one-parameter exponential family. Thus, we unify in a common framework the results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
735
749
http://hdl.handle.net/10.1093/biomet/asp029
application/pdf
Access to full text is restricted to subscribers.
Richard A. Davis
Rongning Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:663-6762013-06-14RePEc:oup:biomet
article
Gaussian process emulation of dynamic computer codes
Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. The approach is less straightforward for dynamic codes, which represent time-evolving systems. We develop a novel iterative system to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
663
676
http://hdl.handle.net/10.1093/biomet/asp028
application/pdf
Access to full text is restricted to subscribers.
S. Conti
J. P. Gosling
J. E. Oakley
A. O'Hagan
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:559-5752013-06-14RePEc:oup:biomet
article
Improving point and interval estimators of monotone functions by rearrangement
Suppose that a target function is monotonic and an available original estimate of this target function is not monotonic. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original confidence interval, which covers the target function with probability at least 1-α, is defined by an upper and lower endpoint functions that are not monotonic. Then the rearranged confidence interval, defined by the rearranged upper and lower endpoint functions, is monotonic, shorter in length in common norms than the original interval, and covers the target function with probability at least 1-α. We illustrate the results with a growth chart example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
559
575
http://hdl.handle.net/10.1093/biomet/asp030
application/pdf
Access to full text is restricted to subscribers.
V. Chernozhukov
I. Fernández-Val
A. Galichon
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:273-2902015-03-25RePEc:oup:biomet
article
Sample size and power analysis for sparse signal recovery in genome-wide association studies
Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondisease-associated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
273
290
http://hdl.handle.net/10.1093/biomet/asr003
application/pdf
Access to full text is restricted to subscribers.
Jichun Xie
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:243-2502015-03-25RePEc:oup:biomet
article
Testing a linear time series model against its threshold extension
This paper derives the asymptotic null distribution of a quasilikelihood ratio test statistic for an autoregressive moving average model against its threshold extension. The null hypothesis is that of no threshold, and the error term could be dependent. The asymptotic distribution is rather complicated, and all existing methods for approximating a distribution in the related literature fail to work. Hence, a novel bootstrap approximation based on stochastic permutation is proposed in this paper. Besides being robust to the assumptions on the error term, our method enjoys more flexibility and needs less computation when compared with methods currently used in the literature. Monte Carlo experiments give further support to the new approach, and an illustration is reported. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
243
250
http://hdl.handle.net/10.1093/biomet/asq074
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:489-4942015-03-25RePEc:oup:biomet
article
The dimple in Gneiting's spatial-temporal covariance model
Gneiting (2002) proposed a nonseparable covariance model for spatial-temporal data. In the present paper we show that in certain circumstances his model possesses a counterintuitive dimple. In some cases, the magnitude of the dimple can be nontrivial. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
489
494
http://hdl.handle.net/10.1093/biomet/asr006
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Mohsen Mohammadzadeh
Ali M. Mosammam
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:355-3702015-03-25RePEc:oup:biomet
article
Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation
For longitudinal data, when the within-subject covariance is misspecified, the semiparametric regression estimator may be inefficient. We propose a method that combines the efficient semiparametric estimator with nonparametric covariance estimation, and is robust against misspecification of covariance models. We show that kernel covariance estimation provides uniformly consistent estimators for the within-subject covariance matrices, and the semiparametric profile estimator with substituted nonparametric covariance is still semiparametrically efficient. The finite sample performance of the proposed estimator is illustrated by simulation. In an application to CD4 count data from an AIDS clinical trial, we extend the proposed method to a functional analysis of the covariance model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/asq080
application/pdf
Access to full text is restricted to subscribers.
Yehua Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:481-4882015-03-25RePEc:oup:biomet
article
On the likelihood function of Gaussian max-stable processes
We derive a closed form expression for the likelihood function of a Gaussian max-stable process indexed by ℝ-super-d at p≤d+1 sites, d≥1. We demonstrate the gain in efficiency in the maximum composite likelihood estimators of the covariance matrix from p=2 to p=3 sites in ℝ-super-2 by means of a Monte Carlo simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asr020
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Yanyuan Ma
Huiyan Sang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:997-10012015-03-25RePEc:oup:biomet
article
A note on overadjustment in inverse probability weighted estimation
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asq049
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Lingling Li
Xiaochun Li
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:851-8652015-03-25RePEc:oup:biomet
article
Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback--Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
851
865
http://hdl.handle.net/10.1093/biomet/asq044
application/pdf
Access to full text is restricted to subscribers.
Abhishek Bhattacharya
David B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:237-2422015-03-25RePEc:oup:biomet
article
Recapture models under equality constraints for the conditional capture probabilities
We introduce a general class of capture-recapture models in which capture probabilities depend on capture history. We discuss constrained versions of the saturated model based on equality constraints. Inference can be performed through a simple estimating equation. The approach is illustrated on a dataset concerning Great Copper butterflies in Willamette Valley of Oregon. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asq068
application/pdf
Access to full text is restricted to subscribers.
A. Farcomeni
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:147-1622015-03-25RePEc:oup:biomet
article
Estimation of covariate effects in generalized linear mixed models with informative cluster sizes
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
147
162
http://hdl.handle.net/10.1093/biomet/asq066
application/pdf
Access to full text is restricted to subscribers.
John M. Neuhaus
Charles E. McCulloch
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:81-902015-03-25RePEc:oup:biomet
article
A self-normalized confidence interval for the mean of a class of nonstationary processes
We construct an asymptotic confidence interval for the mean of a class of nonstationary processes with constant mean and time-varying variances. Due to the large number of unknown parameters, traditional approaches based on consistent estimation of the limiting variance of sample mean through moving block or non-overlapping block methods are not applicable. Under a block-wise asymptotically equal cumulative variance assumption, we propose a self-normalized confidence interval that is robust against the nonstationarity and dependence structure of the data. We also apply the same idea to construct an asymptotic confidence interval for the mean difference of nonstationary processes with piecewise constant means. The proposed methods are illustrated through simulations and an application to global temperature series. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
81
90
http://hdl.handle.net/10.1093/biomet/asq076
application/pdf
Access to full text is restricted to subscribers.
Zhibiao Zhao
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:905-9202015-03-25RePEc:oup:biomet
article
Penalized high-dimensional empirical likelihood
We propose penalized empirical likelihood for parameter estimation and variable selection for problems with diverging numbers of parameters. Our results are demonstrated for estimating the mean vector in multivariate analysis and regression coefficients in linear models. By using an appropriate penalty function, we showthat penalized empirical likelihood has the oracle property. That is, with probability tending to 1, penalized empirical likelihood identifies the true model and estimates the nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. The advantage of penalized empirical likelihood as a nonparametric likelihood approach is illustrated by testing hypotheses and constructing confidence regions. Numerical simulations confirm our theoretical findings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
905
920
http://hdl.handle.net/10.1093/biomet/asq057
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1013-10132015-03-25RePEc:oup:biomet
article
Amendments and Corrections
4
2010
97
Biometrika
1013
1013
http://hdl.handle.net/10.1093/biomet/asq052
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:867-8802015-03-25RePEc:oup:biomet
article
A weighted estimating equation approach for inhomogeneous spatial point processes
We introduce a new estimation method for parametric intensity function models of inhomogeneous spatial point processes based on weighted estimating equations. The weights can incorporate information on both inhomogeneity and dependence of the process. Simulations show that significant efficiency gains can be achieved for non-Poisson processes, compared to the Poisson maximum likelihood estimator. An application to tropical forest data illustrates the use of the proposed method. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
867
880
http://hdl.handle.net/10.1093/biomet/asq043
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
Ye Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:825-8382015-03-25RePEc:oup:biomet
article
Noncrossing quantile regression curve estimation
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
825
838
http://hdl.handle.net/10.1093/biomet/asq048
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
Brian J. Reich
Huixia Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:49-632015-03-25RePEc:oup:biomet
article
Bootstrap inference for mean reflection shape and size-and-shape with three-dimensional landmark data
Working within the framework of a multi-dimensional scaling approach to shape analysis, we develop bootstrap methods for inference about mean reflection shape and size-and-shape based on labelled landmark data. The approach is developed in general dimensions though we focus on the three-dimensional case. We consider two pivotal statistics which we use to construct bootstrap confidence regions for the mean reflection shape or size-and-shape, and present simulation results which show that these statistics perform well in a variety of examples. We also suggest regularized versions of the test statistics that are suitable for more challenging cases where sample size is not sufficiently large in relation to the number of landmarks and present numerical results confirming that regularization indeed leads to better performance. An algorithm for producing a graphical representation of the confidence region for the mean reflection shape is presented and applied in an example involving molecular dynamics simulation data. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
49
63
http://hdl.handle.net/10.1093/biomet/asq065
application/pdf
Access to full text is restricted to subscribers.
S. P. Preston
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:231-2362015-03-25RePEc:oup:biomet
article
A novel reversible jump algorithm for generalized linear models
We propose a novel methodology to construct proposal densities in reversible jump algorithms that obtain samples from parameter subspaces of competing generalized linear models with differing dimensions. The derived proposal densities are not restricted to moves between nested models and are applicable even to models that share no common parameters. We illustrate our methodology on competing logistic regression and log-linear graphical models, demonstrating how our suggested proposal densities, together with the resulting freedom to propose moves between any models, improve the performance of the reversible jump algorithm. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
231
236
http://hdl.handle.net/10.1093/biomet/asq071
application/pdf
Access to full text is restricted to subscribers.
M. Papathomas
P. Dellaportas
V. G. S. Vasdekis
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:921-9342015-03-25RePEc:oup:biomet
article
Estimation of controlled direct effects on a dichotomous outcome using logistic structural direct effect models
We consider the problem of assessing whether an exposure affects a dichotomous outcome other than by modifying a given mediator. The standard approach, logistic regression adjusting for both exposure and the mediator, is known to be biased in the presence of confounders for the mediator-outcome relationship. Because additional regression adjustment for such confounders is only justified when they are not affected by the exposure, inverse probability weighting has been advocated, but is not ideally tailored to mediators that are continuous or have strong measured predictors. We overcome this limitation by developing inference for a novel class of causal models that are closely related to Robins' logistic structural direct effect models, but do not inherit their difficulties of estimation. We study identification and efficient estimation under the assumption that all confounders for the exposure-outcome and mediator-outcome relationships have been measured, and find adequate performance in simulation studies. We discuss extensions to case-control studies and relevant implications for the generic problem of adjustment for time-varying confounding. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
921
934
http://hdl.handle.net/10.1093/biomet/asq053
application/pdf
Access to full text is restricted to subscribers.
Stijn Vansteelandt
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:391-4012015-03-25RePEc:oup:biomet
article
The union closure method for testing a fixed sequence of families of hypotheses
Statistical analyses often involve testing multiple hypotheses that are naturally grouped into a fixed sequence of families. An effective approach to control the familywise error rate is to prioritize the importance of prespecification in the testing order. A gatekeeping testing procedure examines the first family with no multiple adjustment and then examines the subsequent family depending on the decision made with respect to the previous one. In this paper, we describe the union closure method that can be used to design gatekeeping procedures. A bipolar disorder trial with three primary and two secondary outcomes is presented as an example. Power comparisons based on the bipolar disorder trial show that the proposed gatekeeping procedures under the union closure framework are more powerful than competing methods. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
391
401
http://hdl.handle.net/10.1093/biomet/asr015
application/pdf
Access to full text is restricted to subscribers.
Han-Joo Kim
A. Richard Entsuah
Justine Shults
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:433-4482015-03-25RePEc:oup:biomet
article
Maximum likelihood estimation of a generalized threshold stochastic regression model
There is hardly any literature on modelling nonlinear dynamic relations involving nonnormal time series data. This is a serious lacuna because nonnormal data are far more abundant than normal ones, for example, time series of counts and positive time series. While there are various forms of nonlinearities, the class of piecewise-linear models is particularly appealing for its relative ease of tractability and interpretation. We propose to study the generalized threshold model which specifies that the conditional probability distribution of the response variable belongs to an exponential family, and the conditional mean response is linked to some piecewise-linear stochastic regression function. We introduce a likelihood-based estimation scheme, and the consistency and limiting distribution of the maximum likelihood estimator are derived. We illustrate the proposed approach with an analysis of a hare abundance time series, which gives new insights on how phase-dependent predator-prey-climate interactions shaped the ten-year hare population cycle. A simulation study is conducted to examine the finite-sample performance of the proposed estimation method. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
433
448
http://hdl.handle.net/10.1093/biomet/asr008
application/pdf
Access to full text is restricted to subscribers.
Noelle I. Samia
Kung-Sik Chan
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:119-1322015-03-25RePEc:oup:biomet
article
Parametric fractional imputation for missing data analysis
Parametric fractional imputation is proposed as a general tool for missing data analysis. Using fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Computational efficiency can be achieved using the idea of importance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also provides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is discussed and results from a limited simulation study are presented. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asq073
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:107-1182015-03-25RePEc:oup:biomet
article
Horvitz--Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove, under mild regularity conditions, that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional central limit theorem and obtain asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule, considering a mean variance criterion. These techniques are illustrated by a test population of N=18 902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared with simple random sampling without replacement. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
107
118
http://hdl.handle.net/10.1093/biomet/asq070
application/pdf
Access to full text is restricted to subscribers.
Hervé Cardot
Etienne Josserand
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:839-8502015-03-25RePEc:oup:biomet
article
Censored quantile regression with partially functional effects
Quantile regression offers a flexible approach to analyzing survival data, allowing each covariate effect to vary with quantiles. In practice, constancy is often found to be adequate for some covariates. In this paper, we study censored quantile regression tailored to the partially functional effect setting with a mixture of varying and constant effects. Such a model can offer a simpler view regarding covariate-survival association and, moreover, can enable improvement in estimation efficiency. We propose profile estimating equations and present an iterative algorithm that can be readily and stably implemented. Asymptotic properties of the resultant estimators are established. A simple resampling-based inference procedure is developed and justified. Extensive simulation studies demonstrate efficiency gains of the proposed method over a naive two-stage procedure. The proposed method is illustrated via an application to a recent renal dialysis study. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
839
850
http://hdl.handle.net/10.1093/biomet/asq050
application/pdf
Access to full text is restricted to subscribers.
Jing Qian
Limin Peng
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:215-2242015-03-25RePEc:oup:biomet
article
Assessing the validity of weighted generalized estimating equations
The inverse probability weighted generalized estimating equations approach (Robins et al. 1994; Robins et al. 1995), effectively removes bias and provides valid statistical inference for regression parameter estimation in marginal models when longitudinal data contain missing values. The validity of the weighted generalized estimating equations regarding consistent estimation depends on whether the underlying missing data process is properly modelled. However, there is little work available to examine whether or not this condition holds. In this paper we propose a test constructed from two sets of estimating equations: one set is known to be unbiased, but the other set is not known. We utilize the quadratic inference function (Qu et al. 2000) method to assess their compatibility, which is equivalent to testing for the validity of the weighted generalized estimating equations approach. We conduct simulation studies to assess the performance of the proposed method. The test procedure is illustrated through a real data example. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
215
224
http://hdl.handle.net/10.1093/biomet/asq078
application/pdf
Access to full text is restricted to subscribers.
A. Qu
G. Y. Yi
P. X.-K. Song
P. Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:177-1862015-03-25RePEc:oup:biomet
article
Nonparametric estimation for length-biased and right-censored data
This paper considers survival data arising from length-biased sampling, where the survival times are left truncated by uniformly distributed random truncation times. We propose a nonparametric estimator that incorporates the information about the length-biased sampling scheme. The new estimator retains the simplicity of the truncation product-limit estimator with a closed-form expression, and has a small efficiency loss compared with the nonparametric maximum likelihood estimator, which requires an iterative algorithm. Moreover, the asymptotic variance of the proposed estimator has a closed form, and a variance estimator is easily obtained by plug-in methods. Numerical simulation studies with practical sample sizes are conducted to compare the performance of the proposed method with its competitors. A data analysis of the Canadian Study of Health and Aging is conducted to illustrate the methods and theory. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
177
186
http://hdl.handle.net/10.1093/biomet/asq069
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:163-1752015-03-25RePEc:oup:biomet
article
A unified framework for studying parameter identifiability and estimation in biased sampling designs
Based on the odds ratio representation of a joint density, we propose a unified framework to study parameter identifiability in biased sampling designs. It is shown that most of these designs encountered in practice can be reformulated within the proposed framework and, as a result, the question of parameter identifiability can be largely clarified. Estimation of the identifiable parameters is considered and traditional results on the equivalence of the prospective and retrospective likelihoods are extended. Information contained in data on certain identifiable parameters is often very limited. Such parameters can be poorly estimated by the likelihood approach with practically attainable sample sizes, which can substantially affect the estimates of parameters of primary interest. A partially penalized likelihood approach is proposed to address this. Simulation results suggest that the proposed approach has good performance. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
163
175
http://hdl.handle.net/10.1093/biomet/asq059
application/pdf
Access to full text is restricted to subscribers.
Hua Yun Chen
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:199-2142015-03-25RePEc:oup:biomet
article
The effect of correlation in false discovery rate estimation
The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
199
214
http://hdl.handle.net/10.1093/biomet/asq075
application/pdf
Access to full text is restricted to subscribers.
Armin Schwartzman
Xihong Lin
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:251-2712015-03-25RePEc:oup:biomet
article
False discovery rates and copy number variation
Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumour genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than a million chromosome locations and for hundreds of subjects. This leads to massive data sets and complicated inference problems concerning which locations are more likely to vary. In this paper we consider a relatively simple false discovery rate approach to copy number analysis. More careful parametric change-point methods can then be focused on promising regions of the genome. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
251
271
http://hdl.handle.net/10.1093/biomet/asr018
application/pdf
Access to full text is restricted to subscribers.
Bradley Efron
Nancy R. Zhang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:473-4802015-03-25RePEc:oup:biomet
article
Empirical likelihood for small area estimation
Current methodologies in small area estimation are mostly either parametric or heavily dependent on the assumed linearity of the estimators of the small area means. We discuss an alternative empirical likelihood-based Bayesian approach, which neither requires a parametric likelihood nor assumes linearity of the estimators, and can handle both discrete and continuous data in a unified manner. Empirical likelihoods for both area- and unit-level models are introduced. We discuss the suitability of the proposed likelihoods in Bayesian inference and illustrate their performances on a real dataset and a simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr004
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Malay Ghosh
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:947-9602015-03-25RePEc:oup:biomet
article
Enhancing the sample average approximation method with U designs
Many computational problems in statistics can be cast as stochastic programs that are optimization problems whose objective functions are multi-dimensional integrals. The sample average approximation method is widely used for solving such a problem, which first constructs a sampling-based approximation to the objective function and then finds the solution to the approximated problem. Independent and identically distributed sampling is a prevailing choice for constructing such approximations. Recently it was found that the use of Latin hypercube designs can improve sample average approximations. In computer experiments, U designs are known to possess better space-filling properties than Latin hypercube designs. Inspired by this fact, we propose to use U designs to further enhance the accuracy of the sample average approximation method. Theoretical results are derived to show that sample average approximations with U designs can significantly outperform those with Latin hypercube designs. Numerical examples are provided to corroborate the developed theoretical results. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asq046
application/pdf
Access to full text is restricted to subscribers.
Qi Tang
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:990-9962015-03-25RePEc:oup:biomet
article
On the equivalence of prospective and retrospective likelihood methods in case-control studies
We present new approaches to analyzing case-control studies using prospective likelihood methods. In the classical framework, we extend the equality of the profile likelihoods to the Barndorff-Nielsen modified profile likelihoods for prospective and retrospective models. This enables simple and accurate approximate conditional inference for stratified case-control studies of moderate stratum size. In the Bayesian framework, we provide sufficient conditions on priors for the prospective model parameters to yield a prospective marginal posterior density equal to its retrospective counterpart. Our results extend the prospective-retrospective equivalence in the Bayesian paradigm with a more general class of priors than has previously been investigated. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
990
996
http://hdl.handle.net/10.1093/biomet/asq054
application/pdf
Access to full text is restricted to subscribers.
Ana-Maria Staicu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:371-3802015-03-25RePEc:oup:biomet
article
Sure independence screening and compressed random sensing
Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness structure of random sensing matrices to greatly boost computation speed. When using sub-Gaussian sensing matrices, which include the Gaussian and Bernoulli sensing matrices as special cases, our proposal has the exact recovery property with overwhelming probability. We also consider sparse recovery with noise and explicitly reveal the impact of noise-to-signal ratio on the probability of sure screening. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asr010
application/pdf
Access to full text is restricted to subscribers.
Lingzhou Xue
Hui Zou
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:325-3402015-03-25RePEc:oup:biomet
article
Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
325
340
http://hdl.handle.net/10.1093/biomet/asq083
application/pdf
Access to full text is restricted to subscribers.
M. H. Maathuis
M. G. Hudgens
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:381-3902015-03-25RePEc:oup:biomet
article
Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control
Testing a low-dimensional null hypothesis against a high-dimensional alternative in a generalized linear model may lead to a test statistic that is a quadratic form in the residuals under the null model. Using asymptotic arguments, we show that the distribution of such a test statistic can be approximated by a ratio of quadratic forms in normal variables, for which algorithms are readily available. For generalized linear models, the asymptotic distribution shows good control of type I error for moderate to small samples, even when the number of covariates in the model far exceeds the sample size. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
381
390
http://hdl.handle.net/10.1093/biomet/asr016
application/pdf
Access to full text is restricted to subscribers.
Jelle J. Goeman
Hans C. van Houwelingen
Livio Finos
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:495-5012015-03-25RePEc:oup:biomet
article
An Akaike-type information criterion for model selection under inequality constraints
The Akaike information criterion for model selection presupposes that the parameter space is not subject to order restrictions or inequality constraints. Anraku (1999) proposed a modified version of this criterion, called the order-restricted information criterion, for model selection in the one-way analysis of variance model when the population means are monotonic. We propose a generalization of this to the case when the population means may be restricted by a mixture of linear equality and inequality constraints. If the model has no inequality constraints, then the generalized order-restricted information criterion coincides with the Akaike information criterion. Thus, the former extends the applicability of the latter to model selection in multi-way analysis of variance models when some models may have inequality constraints while others may not. Simulation shows that the information criterion proposed in this paper performs well in selecting the correct model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
495
501
http://hdl.handle.net/10.1093/biomet/asr002
application/pdf
Access to full text is restricted to subscribers.
R. M. Kuiper
H. Hoijtink
M. J. Silvapulle
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:935-9462015-03-25RePEc:oup:biomet
article
Compound optimal allocation for individual and collective ethics in binary clinical trials
In recent years, several authors have investigated response-adaptive allocation rules for comparative clinical trials, in order to favour, at each stage of the trial, the treatment that appears to be best. In this paper, we define admissible allocations, namely treatment assignments that cannot be simultaneously improved upon with respect to both a specific design criterion, reflecting the inferential properties of the experiment, and the proportion of patients assigned to the best treatment or treatments; we survey existing designs from this viewpoint. We also suggest combining information and ethical considerations by taking a suitable weighted mean of two corresponding standardized criteria, with weights that depend on the actual treatment effects. This compound criterion leads to a locally optimal allocation that can be targeted by some response-adaptive randomization rule. The paper mainly deals with the case of two treatments, but the suggested methodology is shown to extend to more than two. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
935
946
http://hdl.handle.net/10.1093/biomet/asq055
application/pdf
Access to full text is restricted to subscribers.
Alessandro Baldi Antognini
Alessandra Giovagnoli
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1002-10052015-03-25RePEc:oup:biomet
article
Parameter redundancy with covariates
We show how to determine the parameter redundancy status of a model with covariates from that of the same model without covariates, thereby simplifying the calculation considerably. A matrix decomposition is necessary to ensure that the symbolic computation computer programmes return correct results. The paper is illustrated by mark-recovery and latent-class models, with associated Maple code. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asq041
application/pdf
Access to full text is restricted to subscribers.
Diana J. Cole
Byron J. T. Morgan
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:985-9892015-03-25RePEc:oup:biomet
article
Some insights into continuum regression and its asymptotic properties
Continuum regression encompasses ordinary least squares regression, partial least squares regression and principal component regression under the same umbrella using a nonnegative parameter Gamma. However, there seems to be no literature discussing the asymptotic properties for arbitrary continuum regression parameter Gamma. This article establishes a relation between continuum regression and sufficient dimension reduction and studies the asymptotic properties of continuum regression for arbitrary Gamma under inverse regression models. Theoretical and simulation results show that the continuum seems unnecessary when the conditional distribution of the predictors given the response follows the multivariate normal distribution. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
985
989
http://hdl.handle.net/10.1093/biomet/asq024
application/pdf
Access to full text is restricted to subscribers.
Xin Chen
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:893-9042015-03-25RePEc:oup:biomet
article
Consistent selection of the number of clusters via crossvalidation
In cluster analysis, one of the major challenges is to estimate the number of clusters. Most existing approaches attempt to minimize some distance-based dissimilarity measure within clusters. This article proposes a novel selection criterion that is applicable to all kinds of clustering algorithms, including distance based or non-distance based algorithms. The key idea is to select the number of clusters that minimizes the algorithm's instability, which measures the robustness of any given clustering algorithm against the randomness in sampling.Anovel estimation scheme for clustering instability is developed based on crossvalidation. The proposed selection criterion's effectiveness is demonstrated on a variety of numerical experiments, and its asymptotic selection consistency is established when the dataset is properly split. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asq061
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:291-3062015-03-25RePEc:oup:biomet
article
Sparse Bayesian infinite factor models
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
291
306
http://hdl.handle.net/10.1093/biomet/asr013
application/pdf
Access to full text is restricted to subscribers.
A. Bhattacharya
D. B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:1-152015-03-25RePEc:oup:biomet
article
Joint estimation of multiple graphical models
Gaussian graphical models explore dependence relationships between random variables, through the estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method that jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is included. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/asq060
application/pdf
Access to full text is restricted to subscribers.
Jian Guo
Elizaveta Levina
George Michailidis
Ji Zhu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:341-3542015-03-25RePEc:oup:biomet
article
Time-dependent cross ratio estimation for bivariate failure times
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox's partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
341
354
http://hdl.handle.net/10.1093/biomet/asr005
application/pdf
Access to full text is restricted to subscribers.
Tianle Hu
Bin Nan
Xihong Lin
James M. Robins
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:187-1982015-03-25RePEc:oup:biomet
article
Variance estimation for generalized Cavalieri estimators
The precision of stereological estimators based on systematic sampling is of great practical importance. This paper presents methods of data-based variance estimation for generalized Cavalieri estimators where errors in sampling positions may occur. Variance estimators are derived under perturbed systematic sampling, systematic sampling with cumulative errors and systematic sampling with random dropouts. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
187
198
http://hdl.handle.net/10.1093/biomet/asq064
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Eva B. Vedel Jensen
Karl-Anton Dorph-Petersen
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:459-4712015-03-25RePEc:oup:biomet
article
On balanced random imputation in surveys
Random imputation methods are often used in practice because they tend to preserve the distribution of the variable being imputed, which is an important property when the goal is to estimate population quantiles. However, this type of imputation method introduces additional variability, the imputation variance, due to the random selection of residuals. In this paper, we propose a class of random balanced imputation methods under which the imputation variance is eliminated while the distribution of the variable being imputed is preserved. The rationale behind balanced imputation is to select residuals at random so that appropriate constraints are satisfied. We describe an algorithm for selecting the random residuals that can be viewed as an adaptation of the cube algorithm proposed in the context of balanced sampling (Deville & Tille, 2004). Results of a simulation study support our findings. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/asr011
application/pdf
Access to full text is restricted to subscribers.
G. Chauvet
J.-C. Deville
D. Haziza
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:881-8922015-03-25RePEc:oup:biomet
article
Bootstrap confidence intervals and hypothesis tests for extrema of parameters
The bootstrap provides effective and accurate methodology for a wide variety of statistical problems which might not otherwise enjoy practicable solutions. However, there still exist important problems where standard bootstrap estimators are not consistent, and where alternative approaches, for example the m-out-of-n bootstrap and asymptotic methods, also face significant challenges. One of these is the problem of constructing confidence intervals or hypothesis tests for extrema of parameters, for example for the maximum of p parameters where each has to be estimated from data. In the present paper we suggest approaches to solving this problem. We use the bootstrap to construct an accurate estimator of the joint distribution of centred parameter estimators, and we base the procedure, either a confidence interval or a hypothesis test, on that distribution estimator. Our methodology is designed so that it errs on the side of conservatism, modulo the small inaccuracy of the bootstrap step. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
881
892
http://hdl.handle.net/10.1093/biomet/asq045
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Hugh Miller
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:417-4312015-03-25RePEc:oup:biomet
article
Distribution estimators and confidence intervals for stereological volumes
Assessing the precision of volume estimates from systematic samples is a question of great practical importance, but statistically a challenging task due to the strong spatial dependence of the data and typically small sample sizes. The approach taken in this paper is more ambitious than earlier methodologies, the goal of which was estimation of the variance of a volume estimator v̂, rather than estimation of the distribution of v̂. We shall show that bootstrap methods yield consistent estimators of the distribution of v̂, and also suggest a variety of confidence intervals for the true volume. Our new methodology covers cases where serial sections are exactly periodic, as well as instances where the physical slicing procedure introduces errors in the placement of the sampling points. Measurement errors within sections are also taken into account. The performance of the method is illustrated by a simulation study with synthetic data, and also applied to real datasets. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
417
431
http://hdl.handle.net/10.1093/biomet/asr012
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Johanna Ziegel
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:307-3232015-03-25RePEc:oup:biomet
article
Bayesian influence analysis: a geometric approach
In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
307
323
http://hdl.handle.net/10.1093/biomet/asr009
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:961-9682015-03-25RePEc:oup:biomet
article
Probability-based Latin hypercube designs for slid-rectangular regions
Existing space-filling designs are based on the assumption that the experimental region is rectangular, while in practice this assumption can be violated. Motivated by a data centre thermal management study, a class of probability-based Latin hypercube designs is proposed to accommodate a specific type of irregular region. A heuristic algorithm is proposed to search efficiently for optimal designs. Unbiased estimators are proposed, their variances are given and their performances are compared empirically. The proposed method is applied to obtain an optimal sensor placement plan to monitor and study the thermal distribution in a data centre. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
961
968
http://hdl.handle.net/10.1093/biomet/asq051
application/pdf
Access to full text is restricted to subscribers.
Ying Hung
Yasuo Amemiya
Chien-Fu Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:133-1462015-03-25RePEc:oup:biomet
article
Partial envelopes for efficient estimation in multivariate linear regression
We introduce the partial envelope model, which leads to a parsimonious method for multivariate linear regression when some of the predictors are of special interest. It has the potential to achieve massive efficiency gains compared with the standard model in the estimation of the coefficients for the selected predictors. The partial envelope model is a variation on the envelope model proposed by Cook et al. (2010) but, as it focuses on part of the predictors, it has looser restrictions and can further improve the efficiency. We develop maximum likelihood estimation for the partial envelope model and discuss applications of the bootstrap. An example is provided to illustrate some of its operating characteristics. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
133
146
http://hdl.handle.net/10.1093/biomet/asq063
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:977-9842015-03-25RePEc:oup:biomet
article
On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process
The Voronoi estimator may be defined for any location as the inverse of the area of the corresponding Voronoi cell. We investigate the statistical properties of this estimator for the intensity of an inhomogeneous Poisson process, and demonstrate it is approximately unbiased with a gamma sampling distribution. We also introduce the centroidal Voronoi estimator, a simple extension based on spatial regularization of the point pattern. Simulations show the Voronoi estimator has remarkably low bias, while the centroidal Voronoi estimator has slightly more bias but is much less variable. The performance is compared to kernel estimators using two simulated datasets and a dataset consisting of earthquakes within the continental United States. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asq047
application/pdf
Access to full text is restricted to subscribers.
C. D. Barr
F. P. Schoenberg
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:91-1062015-03-25RePEc:oup:biomet
article
On asymptotic normality and variance estimation for nondifferentiable survey estimators
Survey estimators of population quantities such as distribution functions and quantiles contain nondifferentiable functions of estimated quantities. The theoretical properties of such estimators are substantially more complicated to derive than those of differentiable estimators. In this article, we provide a unified framework for obtaining the asymptotic design-based properties of two common types of nondifferentiable estimators. Estimators of the first type have an explicit expression, while those of the second are defined only as the solution to estimating equations. We propose both analytical and replication-based design-consistent variance estimators for both cases, based on kernel regression. The practical behaviour of the variance estimators is demonstrated in a simulation experiment. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
91
106
http://hdl.handle.net/10.1093/biomet/asq077
application/pdf
Access to full text is restricted to subscribers.
Jianqiang C. Wang
J. D. Opsomer
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:35-482015-03-25RePEc:oup:biomet
article
Bayesian geostatistical modelling with informative sampling locations
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
35
48
http://hdl.handle.net/10.1093/biomet/asq067
application/pdf
Access to full text is restricted to subscribers.
D. Pati
B. J. Reich
D. B. Dunson
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:969-9762015-03-25RePEc:oup:biomet
article
Varying coefficient transformation models with censored data
A maximum likelihood method with spline smoothing is proposed for linear transformation models with varying coefficients. The estimation and inference procedures are computationally easy. Under some regularity conditions, the estimators are proved to be consistent and asymptotically normal. A simulation study using the Stanford transplant data is presented to show that the proposed method performs well with a finite sample and is easy to use in practice. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
969
976
http://hdl.handle.net/10.1093/biomet/asq032
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Xingwei Tong
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:371-3852013-03-04RePEc:oup:biomet
article
Pairwise dependence diagnostics for clustered failure-time data
Frailty and copula models specify a parametric dependence structure for multivariate failure-time data. Estimation of some joint quantities can be highly sensitive to the assumed parametric form, and hence model fit is an important issue. This paper lays out a general diagnostic framework for evaluating and selecting frailty and copula models. The approach is based on the cumulative sum of residuals that are calculated in bivariate time. The residuals reflect the difference between the observed and expected bivariate association structures. The proposed model-checking process is interpretable with a limiting distribution which can be approximated using the bootstrap. Simulations and a data example illustrate the practical application of the method. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
371
385
http://hdl.handle.net/10.1093/biomet/asm024
application/pdf
Access to full text is restricted to subscribers.
David V. Glidden
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:763-7832013-03-04RePEc:oup:biomet
article
Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models
We consider estimation of the received treatment effect on a dichotomous outcome in randomised trials with non-compliance. We explore inference about the parameters of the structural mean models of Robins (1994, 1997) and Robins et al. (1999). We show that, in contrast to the additive and multiplicative structural mean models for continuous and count outcomes, unbiased estimating functions for a nonzero (structural) treatment effect parameter do not exist in the presence of many continuous and discrete baseline covariates, even when the randomisation probabilities are known. The best that can be hoped for are estimators, such as those proposed in this paper, that are guaranteed both to estimate consistently the (null) treatment effect when the null hypothesis of no treatment effect is true and to have small bias when the true treatment effect is close to but not equal to zero. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
763
783
http://hdl.handle.net/10.1093/biomet/91.4.763
text/html
Access to full text is restricted to subscribers.
James Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:591-6022013-03-04RePEc:oup:biomet
article
Model selection for Gaussian concentration graphs
A multivariate Gaussian graphical Markov model for an undirected graph G, also called a covariance selection model or concentration graph model, is defined in terms of the Markov properties, i.e. conditional independences associated with G, which in turn are equivalent to specified zeros among the set of pairwise partial correlation coefficients. By means of Fisher's z-transformation and Šidák's correlation inequality, conservative simultaneous confidence intervals for the entire set of partial correlations can be obtained, leading to a simple method for model selection that controls the overall error rate for incorrect edge inclusion. The simultaneous p-values corresponding to the partial correlations are partitioned into three disjoint sets, a significant set S, an indeterminate set I and a nonsignificant set N. Our model selection method selects two graphs, a graph Ĝ-sub-SI whose edges correspond to the set S∪I, and a more conservative graph Ĝ-sub-S whose edges correspond to S only. Similar considerations apply to covariance graph models, which are defined in terms of marginal independence rather than conditional independence. The method is applied to some well-known examples and to simulated data. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
591
602
Mathias Drton
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:601-6192013-03-04RePEc:oup:biomet
article
Joint modelling of paired sparse functional data using principal components
We propose a modelling framework to study the relationship between two paired longitudinally observed variables. The data for each variable are viewed as smooth curves measured at discrete time-points plus random errors. While the curves for each variable are summarized using a few important principal components, the association of the two longitudinal variables is modelled through the association of the principal component scores. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference. The proposed method can be applied in the difficult case in which the measurement times are irregular and sparse and may differ widely across individuals. Use of functional principal components enhances model interpretation and improves statistical and numerical stability of the parameter estimates. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
601
619
http://hdl.handle.net/10.1093/biomet/asn035
application/pdf
Access to full text is restricted to subscribers.
Lan Zhou
Jianhua Z. Huang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:179-1952013-03-04RePEc:oup:biomet
article
A shrinkage estimator for spectral densities
We propose a shrinkage estimator for spectral densities based on a multilevel normal hierarchical model. The first level captures the sampling variability via a likelihood constructed using the asymptotic properties of the periodogram. At the second level, the spectral density is shrunk towards a parametric time series model. To avoid selecting a particular parametric model for the second level, a third level is added which induces an estimator that averages over a class of parsimonious time series models. The estimator derived from this model, the model averaged shrinkage estimator, is consistent, is shown to be highly competitive with other spectral density estimators via simulations, and is computationally inexpensive. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
179
195
http://hdl.handle.net/10.1093/biomet/93.1.179
text/html
Access to full text is restricted to subscribers.
Carsten H. Botts
Michael J. Daniels
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:619-6322013-03-04RePEc:oup:biomet
article
Semiparametric Box–Cox power transformation models for censored survival observations
The accelerated failure time model specifies that the logarithm of the failure time is linearly related to the covariate vector without assuming a parametric error distribution. In this paper, we consider the semiparametric Box--Cox transformation model, which includes the above regression model as a special case, to analyse possibly censored failure time observations. Inference procedures for the transformation and regression parameters are proposed via a resampling technique. Prediction of the survival function of future subjects with a specific covariate vector is also provided via pointwise and simultaneous interval estimates. All the proposals are illustrated with datasets from two clinical studies. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
619
632
http://hdl.handle.net/10.1093/biomet/92.3.619
text/html
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
L. J. Wei
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:919-9312013-03-04RePEc:oup:biomet
article
Small area estimation when auxiliary information is measured with error
Small area estimation methods typically combine direct estimates from a survey with predictions from a model in order to obtain estimates of population quantities with reduced mean squared error. When the auxiliary information used in the model is measured with error, using a small area estimator such as the Fay--Herriot estimator while ignoring measurement error may be worse than simply using the direct estimator. We propose a new small area estimator that accounts for sampling variability in the auxiliary information, and derive its properties, in particular showing that it is approximately unbiased. The estimator is applied to predict quantities measured in the U.S. National Health and Nutrition Examination Survey, with auxiliary information from the U.S. National Health Interview Survey. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
919
931
http://hdl.handle.net/10.1093/biomet/asn048
application/pdf
Access to full text is restricted to subscribers.
Lynn M. R. Ybarra
Sharon L. Lohr
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:976-9812013-03-04RePEc:oup:biomet
article
Conditional likelihood inference under complex ascertainment using data augmentation
In many applications, particularly in genetics, samples are drawn under complex ascertainment rules. For example, families may only be selected for study if two or more siblings have trait values exceeding some threshold. The correct likelihood for inference in such situations involves the probabilities of ascertainment, and these are frequently intractable. A consistent, but not fully efficient, method of analysis of such studies is proposed. The main idea is to augment the data with additional pseudo-observations simulated under the ascertainment scheme, and to analyse using a conditional likelihood for discrimination between true observations and pseudo-observations. Ascertainment probabilities cancel in this likelihood. The method is illustrated with a simple example involving left-truncated failure times. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
976
981
David Clayton
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:719-7332013-03-04RePEc:oup:biomet
article
Survival analysis with temporal covariate effects
We propose a natural generalization of the Cox regression model, in which the regression coefficients have direct interpretations as temporal covariate effects on the survival function. Under the conditionally independent censoring mechanism, we develop a smoothing-free estimation procedure with a set of martingale-based equations. Our estimator is shown to be uniformly consistent and to converge weakly to a Gaussian process. A simple resampling method is proposed for approximating the limiting distribution of the estimated coefficients. Second-stage inferences with time-varying coefficients are developed accordingly. Simulations and a real example illustrate the practical utility of the proposed method. Finally, we extend this proposal of temporal covariate effects to the general class of linear transformation models and also establish a connection with the additive hazards model. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
719
733
http://hdl.handle.net/10.1093/biomet/asm058
application/pdf
Access to full text is restricted to subscribers.
Limin Peng
Yijian Huang
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:627-6462013-03-04RePEc:oup:biomet
article
Simulation and inference for stochastic volatility models driven by Lévy processes
We study Ornstein-Uhlenbeck stochastic processes driven by Lévy processes, and extend them to more general non-Ornstein-Uhlenbeck models. In particular, we investigate the means of making the correlation structure in the volatility process more flexible. For one model, we implement a method for introducing quasi long-memory into the volatility model. We demonstrate that the models can be fitted to real share price returns data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
627
646
http://hdl.handle.net/10.1093/biomet/asm048
application/pdf
Access to full text is restricted to subscribers.
Matthew P. S. Gander
David A. Stephens
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:627-6402013-03-04RePEc:oup:biomet
article
Efficient estimation of semiparametric transformation models for counting processes
A class of semiparametric transformation models is proposed to characterise the effects of possibly time-varying covariates on the intensity functions of counting processes. The class includes the proportional intensity model and linear transformation models as special cases. Nonparametric maximum likelihood estimators are developed for the regression parameters and cumulative intensity functions of these models based on censored data. The estimators are shown to be consistent and asymptotically normal. The limiting variances for the estimators of the regression parameters achieve the semi-parametric efficient bounds and can be consistently estimated. The limiting variances for the estimators of smooth functionals of the cumulative intensity function can also be consistently estimated. Simulation studies reveal that the proposed inference procedures perform well in practical settings. Two medical studies are provided. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
627
640
http://hdl.handle.net/10.1093/biomet/93.3.627
text/html
Access to full text is restricted to subscribers.
Donglin Zeng
D. Y. Lin
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:997-10012013-03-04RePEc:oup:biomet
article
On consistency of Kendall's tau under censoring
Necessary and sufficient conditions for consistency of a simple estimator of Kendall's tau under bivariate censoring are presented. The results are extended to data subject to bivariate left truncation as well as right censoring. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asn037
application/pdf
Access to full text is restricted to subscribers.
David Oakes
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:873-8922013-03-04RePEc:oup:biomet
article
A General Approach to the Predictability Issue in Survival Analysis with Applications
Very often in survival analysis one has to study martingale integrals where the integrand is not predictable and where the counting process theory of martingales is not directly applicable, as for example in nonparametric and semiparametric applications where the integrand is based on a pilot estimate. We call this the predictability issue in survival analysis. The problem has been resolved by approximations of the integrand by predictable functions which have been justified by ad hoc procedures. We present a general approach to the solution of this problem. The usefulness of the approach is shown in three applications. In particular, we argue that earlier ad hoc procedures do not work in higher-dimensional smoothing problems in survival analysis. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
873
892
http://hdl.handle.net/10.1093/biomet/asm062
application/pdf
Access to full text is restricted to subscribers.
Enno Mammen
Jens Perch Nielsen
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:599-6142013-03-04RePEc:oup:biomet
article
Testing parametric assumptions of trends of a nonstationary time series
The paper considers testing whether the mean trend of a nonstationary time series is of certain parametric forms. A central limit theorem for the integrated squared error is derived, and a hypothesis-testing procedure is proposed. The method is illustrated in a simulation study, and is applied to assess the mean pattern of lifetime-maximum wind speeds of global tropical cyclones from 1981 to 2006. We also revisit the trend pattern in the central England temperature series. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
599
614
http://hdl.handle.net/10.1093/biomet/asr017
application/pdf
Access to full text is restricted to subscribers.
Ting Zhang
Wei Biao Wu
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:809-8252013-03-04RePEc:oup:biomet
article
Generalized Spatial Dirichlet Process Models
Many models for the study of point-referenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zero-mean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probability-weighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stick-breaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
809
825
http://hdl.handle.net/10.1093/biomet/asm071
application/pdf
Access to full text is restricted to subscribers.
Jason A. Duan
Michele Guindani
Alan E. Gelfand
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:415-4262013-03-04RePEc:oup:biomet
article
Aster models for life history analysis
We present a new class of statistical models, designed for life history analysis of plants and animals, that allow joint analysis of data on survival and reproduction over multiple years, allow for variables having different probability distributions, and correctly account for the dependence of variables on earlier variables. We illustrate their utility with an analysis of data taken from an experimental study of Echinacea angustifolia sampled from remnant prairie populations in western Minnesota. These models generalize both generalized linear models and survival analysis. The joint distribution is factorized as a product of conditional distributions, each an exponential family with the conditioning variable being the sample size of the conditional distribution. The model may be heterogeneous, each conditional distribution being from a different exponential family. We show that the joint distribution is from a flat exponential family and derive its canonical parameters, Fisher information and other properties. These models are implemented in an R package 'aster' available from the Comprehensive R Archive Network, CRAN. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
415
426
http://hdl.handle.net/10.1093/biomet/asm030
application/pdf
Access to full text is restricted to subscribers.
Charles J. Geyer
Stuart Wagenius
Ruth G. Shaw
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:461-4702013-03-04RePEc:oup:biomet
article
Efficient Robbins--Monro procedure for binary data
The Robbins--Monro procedure does not perform well in the estimation of extreme quantiles, because the procedure is implemented using asymptotic results, which are not suitable for binary data. Here we propose a modification of the Robbins--Monro procedure and derive the optimal procedure for binary data under some reasonable approximations. The improvement obtained by using the optimal procedure for the estimation of extreme quantiles is substantial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
461
470
V. Roshan Joseph
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:175-1862013-03-04RePEc:oup:biomet
article
Reducing variability of crossvalidation for smoothing-parameter choice
One of the attractions of crossvalidation, as a tool for smoothing-parameter choice, is its applicability to a wide variety of estimator types and contexts. However, its detractors comment adversely on the relatively high variance of crossvalidatory smoothing parameters, noting that this compromises the performance of the estimators in which those parameters are used. We show that the variability can be reduced simply, significantly and reliably by employing bootstrap aggregation or bagging. We establish that in theory, when bagging is implemented using an adaptively chosen resample size, the variability of crossvalidation can be reduced by an order of magnitude. However, it is arguably more attractive to use a simpler approach, based for example on half-sample bagging, which can reduce variability by approximately 50%. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
175
186
http://hdl.handle.net/10.1093/biomet/asn068
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Andrew P. Robinson
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:891-9052013-03-04RePEc:oup:biomet
article
Model diagnostic tests for selecting informative correlation structure in correlated data
In the generalized method of moments approach to longitudinal data analysis, unbiased estimating functions can be constructed to incorporate both the marginal mean and the correlation structure of the data. Increasing the number of parameters in the correlation structure corresponds to increasing the number of estimating functions. Thus, building a correlation model is equivalent to selecting estimating functions. This paper proposes a chi-squared test to choose informative unbiased estimating functions. We show that this methodology is useful for identifying which source of correlation it is important to incorporate when there are multiple possible sources of correlation. This method can also be applied to determine the optimal working correlation for the generalized estimating equation approach. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
891
905
http://hdl.handle.net/10.1093/biomet/asn051
application/pdf
Access to full text is restricted to subscribers.
Annie Qu
J. Jack Lee
Bruce G. Lindsay
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Observation-driven models for Poisson counts
This paper is concerned with a general class of observation-driven models for time series of counts whose conditional distributions given past observations and explanatory variables follow a Poisson distribution. These models provide a flexible framework for modelling a wide range of dependence structures. Conditions for stationarity and ergodicity of these processes are established from which the large-sample properties of the maximum likelihood estimators can be derived. Simulations are provided to give additional insight into the finite-sample behaviour of the estimators. Finally an application to a regression model for daily counts of asthma presentations at a Sydney hospital is described. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
777
790
Richard A. Davis
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:705-7142013-03-04RePEc:oup:biomet
article
The geometry of biplot scaling
A simple geometry allows the main properties of matrix approximations used in biplot displays to be developed. It establishes orthogonal components of an analysis of variance, from which different contributions to approximations may be assessed. Particular attention is paid to approximations that share the same singular vectors, in which case the solution space is a convex cone. Two- and three-dimensional approximations are examined in detail and then the geometry is interpreted for different forms of the matrix being approximated. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
705
714
J. C. Gower
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:195-2092013-03-04RePEc:oup:biomet
article
Generalised likelihood ratio tests for spectral density
There are few techniques available for testing whether or not a family of parametric times series models fits a set of data reasonably well without serious restrictions on the forms of alternative models. In this paper, we consider generalised likelihood ratio tests of whether or not the spectral density function of a stationary time series admits certain parametric forms. We propose a bias correction method for the generalised likelihood ratio test of Fan et al. (2001). In particular, our methods can be applied to test whether or not a residual series is white noise. Sampling properties of the proposed tests are established. A bootstrap approach is proposed for estimating the null distribution of the test statistics. Simulation studies investigate the accuracy of the proposed bootstrap estimate and compare the power of the various ways of constructing the generalised likelihood ratio tests as well as some classic methods like the Cramer--von Mises and Ljung--Box tests. Our results favour the newly proposed bias reduction method using the local likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
195
209
Jianqing Fan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:459-4682013-03-04RePEc:oup:biomet
article
Optimal nested row-column designs with specified components
We consider nested row-column designs where each of the row and column component designs is specified. For the case that each of the component designs has second-order balance, we define such a nested row-column design to be special if it is generally balanced, with the smallest possible number of canonical treatment contrasts having the lower canonical efficiency factor in both components. We show that if any special row-column design exists then it is A-optimal over all nested row-column designs with the given components. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
459
468
http://hdl.handle.net/10.1093/biomet/asm039
application/pdf
Access to full text is restricted to subscribers.
R. A. Bailey
E. R. Williams
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:27-432013-03-04RePEc:oup:biomet
article
Bayesian information criteria and smoothing parameter selection in radial basis function networks
By extending Schwarz's (1978) basic idea we derive a Bayesian information criterion which enables us to evaluate models estimated by the maximum penalised likelihood method or the method of regularisation. The proposed criterion is applied to the choice of smoothing parameters and the number of basis functions in radial basis function network models. Monte Carlo experiments were conducted to examine the performance of the nonlinear modelling strategy of estimating the weight parameters by regularisation and then determining the adjusted parameters by the Bayesian information criterion. The simulation results show that our modelling procedure performs well in various situations. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
27
43
Sadanori Konishi
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:849-8622013-03-04RePEc:oup:biomet
article
A semiparametric changepoint model
A semiparametric changepoint model is considered and the empirical likelihood method is applied to detect the change from a distribution to a weighted distribution in a sequence of independent random variables. The maximum likelihood changepoint estimator is shown to be consistent. The empirical likelihood ratio test statistic is proved to have the same limit null distribution as that with parametric models. A data-based test for the validity of the models is also proposed. Simulation shows the sensitivity and robustness of the semiparametric approach. The methods are applied to some classical datasets such as the Nile River data and stock price data. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
849
862
http://hdl.handle.net/10.1093/biomet/91.4.849
text/html
Access to full text is restricted to subscribers.
Zhong Guan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:469-4852013-03-04RePEc:oup:biomet
article
Resampling-based empirical prediction: an application to small area estimation
Best linear unbiased prediction is well known for its wide range of applications including small area estimation. While the theory is well established for mixed linear models and under normality of the error and mixing distributions, the literature is sparse for nonlinear mixed models under nonnormality of the error distribution or of the mixing distributions. We develop a resampling-based unified approach for predicting mixed effects under a generalized mixed model set-up. Second-order-accurate nonnegative estimators of mean squared prediction errors are also developed. Given the parametric model, the proposed methodology automatically produces estimators of the small area parameters and their mean squared prediction errors, without requiring explicit analytical expressions for the mean squared prediction errors. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
469
485
http://hdl.handle.net/10.1093/biomet/asm035
application/pdf
Access to full text is restricted to subscribers.
Soumendra N. Lahiri
Tapabrata Maiti
Myron Katzoff
Van Parsons
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:462-4692013-03-04RePEc:oup:biomet
article
On some models for multivariate binary variables parallel in complexity with the multivariate Gaussian distribution
It is shown that both the simple form of the Rasch model for binary data and a generalisation are essentially equivalent to special dichotomised Gaussian models. In these the underlying Gaussian structure is of single factor form; that is, the correlations between the binary variables arise via a single underlying variable, called in psychometrics a latent trait. The implications for scoring of the binary variables are discussed, in particular regarding the scoring system as in effect estimating the latent trait. In particular, the role of the simple sum score, in effect the total number of 'successes', is examined. Relations with the principal component analysis of binary data are outlined and some connections with the quadratic exponential binary model are sketched. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
462
469
D. R. Cox
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:119-1332013-03-04RePEc:oup:biomet
article
Multiscale generalised linear models for nonparametric function estimation
We present a method for extracting information about both the scale and trend of local components of an inhomogeneous function in a nonparametric generalised linear model. Our multiscale framework combines recursive partitions, which allow for the incorporation of scale in a natural manner, with systems of piecewise polynomials supported on the partition intervals, which serve to summarise the smooth trend within each interval. Our estimators are formulated as solutions of complexity-penalised likelihood optimisations, where the penalty seeks to limit the number of intervals used to model the data. The actual calculation of the estimators may be accomplished using standard software routines for generalised linear models, within the context of efficient, tree-based, polynomial-time algorithms. A risk analysis shows that these estimators achieve the same asymptotic rates in the nonparametric generalised linear model as the classical wavelet-based estimators in the Gaussian 'function plus noise' model, for suitably defined ranges of Besov spaces. Numerical simulations show that the method tends to perform at least as well as, and often better than, alternative wavelet-based methodologies in the context of finite samples, while applications to gamma-ray burst data in astronomy and packet loss data in computer network tra.c analysis confirm its practical relevance. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
119
133
http://hdl.handle.net/10.1093/biomet/92.1.119
text/html
Access to full text is restricted to subscribers.
Eric D. Kolaczyk
Robert D. Nowak
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:119-1322013-03-04RePEc:oup:biomet
article
Tapered empirical likelihood for time series data in time and frequency domains
We investigate data tapering in two formulations of empirical likelihood for time series. One empirical likelihood is formed from tapered data blocks in the time domain and a second is based on the tapered periodogram in the frequency domain. Limiting distributions are provided for both empirical likelihood versions under tapering. Theoretical and simulation evidence indicates that a data taper improves the coverage accuracy of empirical likelihood confidence intervals for time series parameters, such as means and correlations. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asn071
application/pdf
Access to full text is restricted to subscribers.
Daniel J. Nordman
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:147-1612013-03-04RePEc:oup:biomet
article
On least-squares regression with censored data
The semiparametric accelerated failure time model relates the logarithm of the failure time linearly to the covariates while leaving the error distribution unspecified. The present paper describes simple and reliable inference procedures based on the least-squares principle for this model with right-censored data. The proposed estimator of the vector-valued regression parameter is an iterative solution to the Buckley--James estimating equation with a preliminary consistent estimator as the starting value. The estimator is shown to be consistent and asymptotically normal. A novel resampling procedure is developed for the estimation of the limiting covariance matrix. Extensions to marginal models for multivariate failure time data are considered. The performance of the new inference procedures is assessed through simulation studies. Illustrations with medical studies are provided. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
147
161
http://hdl.handle.net/10.1093/biomet/93.1.147
text/html
Access to full text is restricted to subscribers.
Zhezhen Jin
D. Y. Lin
Zhiliang Ying
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:81-942013-03-04RePEc:oup:biomet
article
Statistical inference for infinite-dimensional parameters via asymptotically pivotal estimating functions
Suppose that a consistent estimator for an infinite-dimensional parameter can be readily obtained via a set of estimating functions which has a 'good' local linear approximation around the true value of the parameter. However, it may be difficult to estimate the variance function of this estimator well. We show that, if the set of estimating functions evaluated at the true parameter value is 'asymptotically pivotal', then the 'fiducial' distribution of the parameter can be used to approximate the distribution of this consistent estimator. We present three examples to illustrate that the corresponding inference for the parameter can be made via a simple simulation technique without involving complex, high-dimensional nonparametric density estimates. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
81
94
M. A. Goldwasser
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:843-8602013-03-04RePEc:oup:biomet
article
Building mixture trees from binary sequence data
We develop a new method for building a hierarchical tree from binary sequence data. It is based on an ancestral mixture model. The sieve parameter in the model plays the role of time in the evolutionary tree of the sequences. By varying the sieve parameter, one can create a hierarchical tree that estimates the population structure at each fixed backward point in time. Application to the clustering of the mitochondrial DNA sequences of Griffiths & Tavare (1994) shows that the approach performs well. Theoretical and computational properties of the ancestral mixture model are further developed. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
843
860
http://hdl.handle.net/10.1093/biomet/93.4.843
text/html
Access to full text is restricted to subscribers.
Shu-Chuan Chen
Bruce G. Lindsay
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:465-4762013-03-04RePEc:oup:biomet
article
Saddlepoint approximations for the Bingham and Fisher–Bingham normalising constants
The Fisher--Bingham distribution is obtained when a multivariate normal random vector is conditioned to have unit length. Its normalising constant can be expressed as an elementary function multiplied by the density, evaluated at 1, of a linear combination of independent noncentral χ-sub-1-super-2 random variables. Hence we may approximate the normalising constant by applying a saddlepoint approximation to this density. Three such approximations, implementation of each of which is straightforward, are investigated: the first-order saddlepoint density approximation, the second-order saddlepoint density approximation and a variant of the second-order approximation which has proved slightly more accurate than the other two. The numerical and theoretical results we present showthat this approach provides highly accurate approximations in a broad spectrum of cases. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
465
476
http://hdl.handle.net/10.1093/biomet/92.2.465
text/html
Access to full text is restricted to subscribers.
A. Kume
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:893-9042013-03-04RePEc:oup:biomet
article
Uniform designs limit aliasing
When fitting a linear regression model to data, aliasing can adversely affect the estimates of the model coefficients and the decision of whether or not a term is significant. Optimal experimental designs give efficient estimators assuming that the true form of the model is known, while robust experimental designs guard against inaccurate estimates caused by model misspecification. Although it is rare for a single design to be both maximally efficient and robust, it is shown here that uniform designs limit the effects of aliasing to yield reasonable efficiency and robustness together. Aberration and resolution measure how well fractional factorial designs guard against the effects of aliasing. Here it is shown that the definitions of aberration and resolution may be generalised to other types of design using the discrepancy. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
893
904
Fred J. Hickernell
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:228-2342013-03-04RePEc:oup:biomet
article
A note on kernel polygons
Jones (1989) has pointed out that piecewise linear interpolated kernel density estimators on a sufficiently fine grid can be visually indistinguishable from the true density. A simple device, the kernel polygon, is proposed for eliminating the evaluation of the normalisation constant of the estimator while retaining its property of being a density function as well as providing practical advantages. The class of uniform and linear kernels of the kernel polygons is given. Finally, we present a simulation study and a real data example in which we compare bandwidth selectors for the kernel polygons. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
228
234
http://hdl.handle.net/10.1093/biomet/93.1.228
text/html
Access to full text is restricted to subscribers.
Chien-Tai Lin
Jyh-Shyang Wu
Chia-Hung Yen
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:709-7192013-03-04RePEc:oup:biomet
article
The Benjamini--Hochberg method with infinitely many contrasts in linear models
Benjamini and Hochberg's method for controlling the false discovery rate is applied to the problem of testing infinitely many contrasts in linear models. Exact, easily calculated critical values are derived, defining a new multiple comparisons method for testing contrasts in linear models. The method is adaptive, depending on the data through the F-statistic, like the Waller--Duncan Bayesian multiple comparisons method. Comparisons with Scheffé's method are given, and the method is extended to the simultaneous confidence intervals of Benjamini and Yekutieli. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
709
719
http://hdl.handle.net/10.1093/biomet/asn033
application/pdf
Access to full text is restricted to subscribers.
Peter H. Westfall
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:105-1182013-03-04RePEc:oup:biomet
article
Theory for penalised spline regression
Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rateestablished by Stone (1982). Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
105
118
http://hdl.handle.net/10.1093/biomet/92.1.105
text/html
Access to full text is restricted to subscribers.
Peter Hall
J. D. Opsomer
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:961-9772013-03-04RePEc:oup:biomet
article
Estimating the false discovery rate using the stochastic approximation algorithm
Testing of multiple hypotheses involves statistics that are strongly dependent in some applications, but most work on this subject is based on the assumption of independence. We propose a new method for estimating the false discovery rate of multiple hypothesis tests, in which the density of test scores is estimated parametrically by minimizing the Kullback--Leibler distance between the unknown density and its estimator using the stochastic approximation algorithm, and the false discovery rate is estimated using the ensemble averaging method. Our method is applicable under general dependence between test statistics. Numerical comparisons between our method and several competitors, conducted on simulated and real data examples, show that our method achieves more accurate control of the false discovery rate in almost all scenarios. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
961
977
http://hdl.handle.net/10.1093/biomet/asn036
application/pdf
Access to full text is restricted to subscribers.
Faming Liang
Jian Zhang
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:183-1962013-03-04RePEc:oup:biomet
article
On measuring the variability of small area estimators under a basic area level model
In this paper based on a basic area level model we obtain second-order accurate approximations to the mean squared error of model-based small area estimators, using the Fay & Herriot (1979) iterative method of estimating the model variance based on weighted residual sum of squares. We also obtain mean squared error estimators unbiased to second order. Based on simulations, we compare the finite-sample performance of our mean squared error estimators with those based on method-of-moments, maximum likelihood and residual maximum likelihood estimators of the model variance. Our results suggest that the Fay--Herriot method performs better, in terms of relative bias of mean squared error estimators, than the other methods across different combinations of number of areas, pattern of sampling variances and distribution of small area effects. We also derive a noninformative prior on the model parameters for which the posterior variance of a small area mean is second-order unbiased for the mean squared error. The posterior variance based on such a prior possesses both Bayesian and frequentist interpretations. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
183
196
http://hdl.handle.net/10.1093/biomet/92.1.183
text/html
Access to full text is restricted to subscribers.
Gauri Sankar Datta
J. N. K. Rao
David Daniel Smith
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:217-2292013-03-04RePEc:oup:biomet
article
Variable selection for the single‐index model
We consider variable selection in the single-index model. We prove that the popular leave-m-out crossvalidation method has different behaviour in the single-index model from that in linear regression models or nonparametric regression models. A new consistent variable selection method, called separated crossvalidation, is proposed. Further analysis suggests that the method has better finite-sample performance and is computationally easier than leave-m-out crossvalidation. Separated crossvalidation, applied to the Swiss banknotes data and the ozone concentration data, leads to single-index models with selected variables that have better prediction capability than models based on all the covariates. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
217
229
http://hdl.handle.net/10.1093/biomet/asm008
application/pdf
Access to full text is restricted to subscribers.
Efang Kong
Yingcun Xia
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:995-9992013-03-04RePEc:oup:biomet
article
Wild bootstrap for quantile regression
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
995
999
http://hdl.handle.net/10.1093/biomet/asr052
application/pdf
Access to full text is restricted to subscribers.
Xingdong Feng
Xuming He
Jianhua Hu
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:484-4892013-03-04RePEc:oup:biomet
article
Hypothesis testing when a nuisance parameter is present only under the alternative: Linear model case
The results of Davies (1977, 1987) are extended to a linear model situation with unknown residual variance. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
484
489
Robert B. Davies
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:291-3032013-03-04RePEc:oup:biomet
article
Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data
Sequentially ordered multivariate failure time data are often observed in biomedical studies and inter-event, or gap, times are often of interest. Generally, standard hazard regression methods cannot be applied to the gap times because of identifiability issues and induced dependent censoring. We propose estimating equations for fitting proportional hazards regression models to the gap times. Model parameters are shown to be consistent and asymptotically normal. Simulation studies reveal the appropriateness of the asymptotic approximations in finite samples. The proposed methods are applied to renal failure data to assess the association between demographic covariates and both time until wait-listing and time from wait-listing to kidney transplantation. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
291
303
Douglas E. Schaubel
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:443-4582013-03-04RePEc:oup:biomet
article
Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models
The problem of evaluating the goodness of the predictive distributions of hierarchical Bayesian and empirical Bayes models is investigated. A Bayesian predictive information criterion is proposed as an estimator of the posterior mean of the expected loglikelihood of the predictive distribution when the specified family of probability distributions does not contain the true distribution. The proposed criterion is developed by correcting the asymptotic bias of the posterior mean of the loglikelihood as an estimator of its expected loglikelihood. In the evaluation of hierarchical Bayesian models with random effects, regardless of our parametric focus, the proposed criterion considers the bias correction of the posterior mean of the marginal loglikelihood because it requires a consistent parameter estimator. The use of the bootstrap in model evaluation is also discussed. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
443
458
http://hdl.handle.net/10.1093/biomet/asm017
application/pdf
Access to full text is restricted to subscribers.
Tomohiro Ando
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:719-7232013-03-04RePEc:oup:biomet
article
Probabilistic model for two dependent circular variables
Motivated by problems in molecular biology and molecular physics, we propose a five-parameter torus analogue of the bivariate normal distribution for modelling the distribution of two circular random variables. The conditional distributions of the proposed distribution are von Mises. The marginal distributions are symmetric around their means and are either unimodal or bimodal. The type of shape depends on the configuration of parameters, and we derive the conditions that ensure a specific shape. The utility of the proposed distribution is illustrated by the modelling of angular variables in a short linear peptide. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
719
723
Harshinder Singh
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:53-712013-03-04RePEc:oup:biomet
article
Pattern-mixture models with proper time dependence
Recently, pattern-mixture modelling has become a popular tool for modelling incomplete longitudinal data. Such models are under-identified in the sense that, for any drop-out pattern, the data provide no direct information on the distribution of the unobserved outcomes, given the observed ones. One simple way of overcoming this problem, ordinary extrapolation of sufficiently simple pattern-specific models, often produces rather unlikely descriptions; several authors consider identifying restrictions instead. Molenberghs et al. (1998) have constructed identifying restrictions corresponding to missing at random. In this paper, the family of restrictions where drop-out does not depend on future, unobserved observations is identified. The ideas are illustrated using a clinical study of Alzheimer patients. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
53
71
M. G. Kenward
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:35-472013-03-04RePEc:oup:biomet
article
Population intervention models in causal inference
We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
35
47
http://hdl.handle.net/10.1093/biomet/asm097
application/pdf
Access to full text is restricted to subscribers.
Alan E. Hubbard
Mark J. van der Laan
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:958-9612013-03-04RePEc:oup:biomet
article
A note on a partial empirical likelihood
A partial profile empirical likelihood for a semiparametric mixture model (Zou et al., 2002) is shown to originate in a conditional likelihood involving additional nuisance parameters. The partial likelihood is the conditional likelihood with the nuisance parameters replaced by their estimators from the full likelihood. The conditional likelihood suggests alternative estimators. We demonstrate that the partial likelihood estimator is more efficient than an estimator for which the nuisance parameters are known. The practical implications of this counter-intuitive result are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
958
961
F. Zou
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:859-8742013-03-04RePEc:oup:biomet
article
Bayesian nonparametric inference on stochastic ordering
We consider Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of restricted dependent Dirichlet process priors. These priors have full support in the space of stochastically ordered distributions, and can be used for collections of unknown mixture distributions to obtain a flexible class of mixture models. Theoretical properties are discussed, efficient methods are developed for posterior computation using Markov chain Monte Carlo simulation and the methods are illustrated using data from a study of DNA damage and repair. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
859
874
http://hdl.handle.net/10.1093/biomet/asn043
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
Shyamal D. Peddada
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:763-7752013-03-04RePEc:oup:biomet
article
Analysing panel count data with informative observation times
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximising a conditional likelihood function of observed event counts and solving estimation equations. Large-sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumour study is presented. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
763
775
http://hdl.handle.net/10.1093/biomet/93.4.763
text/html
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Mei-Cheng Wang
Ying Zhang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:979-9862013-03-04RePEc:oup:biomet
article
Identification of the age-period-cohort model and the extended chain-ladder model
We consider the identification problem that arises in the age-period-cohort models as well as in the extended chain-ladder model. We propose a canonical parameterization based on the accelerations of the trends in the three factors. This parameterization is exactly identified and eases interpretation, estimation and forecasting. The canonical parameterization is applied to a class of index sets which have trapezoidal shapes, including various Lexis diagrams and the insurance-reserving triangles. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
979
986
http://hdl.handle.net/10.1093/biomet/asn026
application/pdf
Access to full text is restricted to subscribers.
D. Kuang
B. Nielsen
J. P. Nielsen
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:982-9842013-03-04RePEc:oup:biomet
article
Conditional and marginal association for binary random variables
The relationship between marginal and conditional distributions of binary random variables is analysed via a log-linear model. Conditions for the Yule--Simpson effect are established and the implications for latent class analysis examined. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
982
984
D. R. Cox
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:919-9342013-03-04RePEc:oup:biomet
article
Quantifying the failure of bootstrap likelihood ratio tests
When testing geometrically irregular parametric hypotheses, the bootstrap is an intuitively appealing method to circumvent difficult distribution theory. It has been shown, however, that the usual bootstrap is inconsistent in estimating the asymptotic distributions involved in such problems. This paper is concerned with the asymptotic size of likelihood ratio tests when critical values are computed using the inconsistent bootstrap. We clarify how the asymptotic size of such a test can be obtained from the size of the corresponding bootstrap test in the relevant limiting normal experiment. For boundary problems, that is, hypotheses given by convex cones, we show the bootstrap test to always be anticonservative, and we compute the size numerically for different two-dimensional examples. The examples illustrate that the size can be below or above the nominal level, and reveal that the relationship between the size of the test and the geometry of the considered hypotheses is surprisingly subtle. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
919
934
http://hdl.handle.net/10.1093/biomet/asr033
application/pdf
Access to full text is restricted to subscribers.
Mathias Drton
Benjamin Williams
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:911-9262013-03-04RePEc:oup:biomet
article
A functional-based distribution diagnostic for a linear model with correlated outcomes
In this paper we present an easy-to-implement graphical distribution diagnostic for linear models with correlated errors. Houseman et al. (2004) constructed quantile--quantile plots for the marginal residuals of such models, suitably transformed. We extend the pointwise asymptotic theory to address the global stochastic behaviour of the corresponding empirical cumulative distribution function, and describe a simulation technique that serves as a computationally efficient parametric bootstrap for generating representatives of its stochastic limit. Thus, continuous functionals of the empirical cumulative distribution function may be used to form global tests of normality. Through the use of projection matrices, we generalised our methods to include tests that are directed at assessing the normality of particular components of the error. Thus, tests proposed by Lange & Ryan (1989) follow as a special case. Our method works well both for models having independent units of sampling and for those in which all observations are correlated. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
911
926
http://hdl.handle.net/10.1093/biomet/93.4.911
text/html
Access to full text is restricted to subscribers.
E. Andres Houseman
Brent A. Coull
Louise M. Ryan
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:799-8122013-03-04RePEc:oup:biomet
article
Covariance reducing models: An alternative to spectral modelling of covariance matrices
We introduce covariance reducing models for studying the sample covariance matrices of a random vector observed in different populations. The models are based on reducing the sample covariance matrices to an informational core that is sufficient to characterize the variance heterogeneity among the populations. They possess useful equivariance properties and provide a clear alternative to spectral models for covariance matrices. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
799
812
http://hdl.handle.net/10.1093/biomet/asn052
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:645-6592013-03-04RePEc:oup:biomet
article
On multiple regression models with nonstationary correlated errors
We consider the estimation of parameters of a multiple regression model with nonstationary errors. We assume the nonstationary errors satisfy a time-dependent autoregressive process and describe a method for estimating the parameters of the regressors and the time-dependent autoregressive parameters. The parameters are rescaled as in nonparametric regression to obtain the asymptotic sampling properties of the estimators. The method is illustrated with an example taken from global temperature anomalies. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
645
659
Suhasini Subba Rao
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:455-4632013-03-04RePEc:oup:biomet
article
A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations
We introduce a family of multivariate binary distributions with certain conditional linear property. This family is particularly useful for efficient and easy simulation of correlated binary variables with a given marginal mean vector and correlation matrix. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
455
463
Bahjat F. Qaqish
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:345-3622013-03-04RePEc:oup:biomet
article
Stochastic multitype epidemics in a community of households: Estimation of threshold parameter R-sub-* and secure vaccination coverage
This paper is concerned with estimation of the threshold parameter R-sub-* for a stochastic model for the spread of a susceptible → infective → removed epidemic among a closed, finite population that contains several types of individual and is partitioned into households. It turns out that R-sub-* cannot be estimated consistently from final outcome data, so a Perron--Frobenius argument is used to obtain sharp lower and upper bounds for R-sub-*, which can be estimated consistently. Determining the allocation of vaccines that reduces the upper bound for R-sub-* to its threshold value of one, thus preventing the occurrence of a major outbreak, with minimum vaccine coverage is shown to be a linear programming problem. The estimates of R-sub-*, before and after vaccination, and of the secure vaccination coverage, i.e. the proportion of individuals that have to be vaccinated to reduce the upper bound for R-sub-* to 1 assuming an optimal vaccination scheme, are equipped with standard errors, thus yielding conservative confidence bounds for these key epidemiological parameters. The methodology is illustrated by application to data on influenza outbreaks in Tecumseh, Michigan. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
345
362
Frank G. Ball
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:1-182013-03-04RePEc:oup:biomet
article
Maxima of discretely sampled random fields, with an application to 'bubbles'
A smooth Gaussian random field with zero mean and unit variance is sampled on a discrete lattice, and we are interested in the exceedance probability or P-value of the maximum in a finite region. If the random field is smooth relative to the mesh size, then the P-value can be well approximated by results for the continuously sampled smooth random field (Adler, 1981; Worsley, 1995a; Taylor & Adler, 2003; Adler & Taylor, 2007). If the random field is not smooth, so that adjacent lattice values are nearly independent, then the usual Bonferroni bound is very accurate. The purpose of this paper is to bridge the gap between the two, and derive a simple, accurate upper bound for intermediate mesh sizes. The result uses a new improved Bonferroni-type bound based on discrete local maxima. We give an application to the 'bubbles' technique for detecting areas of the face used to discriminate fear from happiness. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
1
18
http://hdl.handle.net/10.1093/biomet/asm004
application/pdf
Access to full text is restricted to subscribers.
J. E. Taylor
K. J. Worsley
F. Gosselin
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:319-3262013-03-04RePEc:oup:biomet
article
Bayesian empirical likelihood
Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
319
326
Nicole A. Lazar
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:775-7892013-03-04RePEc:oup:biomet
article
Nonparametric estimation of the variogram and its spectrum
In the study of intrinsically stationary spatial processes, a new nonparametric variogram estimator is proposed through its spectral representation. The methodology is based on estimation of the variogram's spectrum by solving a regularized inverse problem through quadratic programming. The estimated variogram is guaranteed to be conditionally negative-definite. Simulation shows that our estimator is flexible and generally has smaller mean integrated squared error than the parametric estimator under model misspecification. Our methodology is applied to a spatial dataset of decadal temperature changes. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
775
789
http://hdl.handle.net/10.1093/biomet/asr056
application/pdf
Access to full text is restricted to subscribers.
Chunfeng Huang
Tailen Hsing
Noel Cressie
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:711-7202013-03-04RePEc:oup:biomet
article
Sudoku-based space-filling designs
Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game board is a nine-by-nine grid of numbers from one to nine. Several entries within the grid are provided and the remaining entries must be filled in subject to no row, column, or three-by-three subsquare containing duplicate numbers. By exploiting these three types of uniformity, we propose an approach to constructing a new type of design, called a Sudoku-based space-filling design. Such a design can be divided into groups of subdesigns so that the complete design and each subdesign achieve maximum uniformity in univariate and bivariate margins. Examples are given illustrating the proposed construction method. Applications of such designs include computer experiments with qualitative and quantitative factors, linking parameters in engineering and crossvalidation. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
711
720
http://hdl.handle.net/10.1093/biomet/asr024
application/pdf
Access to full text is restricted to subscribers.
Xu Xu
BEN Haaland
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:807-8172013-03-04RePEc:oup:biomet
article
A new Bayesian method for nonparametric capture-recapture models in presence of heterogeneity
The intrinsic heterogeneity of individuals is a potential source of bias in estimation procedures for capture-recapture models. To account for this heterogeneity in the model a hierarchical structure has been proposed whereby the probabilities that each animal is caught on a single occasion are modelled as independent draws from a common unknown distribution F. However, there is general agreement that modelling F by a simple parametric curve may lead to unsatisfactory results. Here we propose an alternative Bayesian approach that relies on a different parameterisation which imposes no assumption on the shape of F but drives the problem back to a finite-dimensional setting. Our approach avoids some identifiability issues related to such a recapture model while allowing for a formal Bayesian default analysis. Results of analyses of computer simulations and of real data show that the method performs well. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
807
817
Luca Tardella
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:267-2832013-03-04RePEc:oup:biomet
article
A weighted multivariate sign test for cluster-correlated data
We consider the multivariate location problem with cluster-correlated data. A family of multivariate weighted sign tests is introduced for which observations from different clusters can receive different weights. Under weak assumptions, the test statistic is asymptotically distributed as a chi-squared random variable as the number of clusters goes to infinity. The asymptotic distribution of the test statistic is also given for a local alternative model under multivariate normality. Optimal weights maximizing Pitman asymptotic efficiency are provided. These weights depend on the cluster sizes and on the intracluster correlation. Several approaches for estimating these weights are presented. Using Pitman asymptotic efficiency, we show that appropriate weighting can increase substantially the efficiency compared to a test that gives the same weight to each cluster. A multivariate weighted t-test is also introduced. The finite-sample performance of the weighted sign test is explored through a simulation study which shows that the proposed approach is very competitive. A real data example illustrates the practical application of the methodology. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
267
283
http://hdl.handle.net/10.1093/biomet/asm026
application/pdf
Access to full text is restricted to subscribers.
Denis Larocque
Jaakko Nevalainen
Hannu Oja
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:759-7712013-03-04RePEc:oup:biomet
article
Extended Bayesian information criteria for model selection with large model spaces
The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayesian information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayesian information criteria are extremely useful for variable selection in problems with a moderate sample size but with a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
759
771
http://hdl.handle.net/10.1093/biomet/asn034
application/pdf
Access to full text is restricted to subscribers.
Jiahua Chen
Zehua Chen
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:87-992013-03-04RePEc:oup:biomet
article
The unobserved heterogeneity distribution in duration analysis
In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a general result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
87
99
http://hdl.handle.net/10.1093/biomet/asm013
application/pdf
Access to full text is restricted to subscribers.
Jaap H. Abbring
Gerard J. Van Den Berg
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:701-7102013-03-04RePEc:oup:biomet
article
Generalized varying coefficient models with unknown link function
We propose a new estimation method for generalized varying coefficient models where the link function is specified up to some smoothness conditions. Consistency and asymptotic normality of the estimated varying coefficient functions are established. Simulation results and a real data application demonstrate the usefulness of the new method. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
701
710
http://hdl.handle.net/10.1093/biomet/asr031
application/pdf
Access to full text is restricted to subscribers.
C. N. Kuruwita
K. B. Kulasekera
C. M. Gallagher
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:996-10022013-03-04RePEc:oup:biomet
article
Identification of a competing risks model with unknown transformations of latent failure times
This paper is concerned with identification of a competing risks model with unknown transformations of latent failure times. The model includes, as special cases, competing risks versions of proportional hazards, mixed proportional hazards and accelerated failure time models. It is shown that covariate effects on latent failure times, cause-specific link functions and the joint survivor function of the disturbance terms can be identified without relying on modelling the dependence between latent failure times parametrically nor using an exclusion restriction among covariates. As a result, the paper provides an identification result about the joint survivor function of the latent failure times conditional on covariates. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
996
1002
http://hdl.handle.net/10.1093/biomet/93.4.996
text/html
Access to full text is restricted to subscribers.
Sokbae Lee
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:807-8202013-03-04RePEc:oup:biomet
article
Sparse estimation of a covariance matrix
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method's close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
807
820
http://hdl.handle.net/10.1093/biomet/asr054
application/pdf
Access to full text is restricted to subscribers.
Jacob Bien
Robert J. Tibshirani
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:579-5892013-03-04RePEc:oup:biomet
article
A diagnostic procedure based on local influence
Cook's (1986) normal curvature measure is useful for sensitivity analysis of model assumptions in statistical models. However, there is no rigorous approach based on the normal curvature for addressing two fundamental issues: to assess the extent of discrepancy between an assumed model and the underlying model from which the data are generated, and to identify suspicious data points for which the discrepancy is most evident. Our purpose is to establish a theoretically sound procedure for resolving these issues for case-weight perturbation under the framework of independent distributions. We show that the local influence measure, Cook's distance and likelihood distance are asymptotically equivalent. A diagnostic procedure, based on local influence, is proposed for evaluating model misspecification and for detecting influential points simultaneously. We analyse two real datasets. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
579
589
Hongtu Zhu
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:283-2982013-03-04RePEc:oup:biomet
article
A flexible additive multiplicative hazard model
We present a new additive-multiplicative hazard model which consists of two components. The first component contains additive covariate effects through an additive Aalen model while the second component contains multiplicative covariate effects through a Cox regression model. The Aalen model allows for time-varying covariate effects, while the Cox model allows only a common time-dependence through the baseline. Approximate maximum likelihood estimators are derived by solving the simultaneous score equations for the nonparametric and parametric components of the model. The suggested estimators are provided with large-sample properties and are shown to be efficient. The efficient estimators depend, however, on some estimated weights. We therefore also consider unweighted estimators and describe their large-sample properties. We finally extend the model to allow for time-varying covariate effects in the multiplicative part of the model as well. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
283
298
Torben Martinussen
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:447-4592013-03-04RePEc:oup:biomet
article
Assessing robustness of generalised estimating equations and quadratic inference functions
In the presence of data contamination or outliers, some empirical studies have indicated that the two methods of generalised estimating equations and quadratic inference functions appear to have rather different robustness behaviour. This paper presents a theoretical investigation from the perspective of the influence function to identify the causes for the difference. We show that quadratic inference functions lead to bounded influence functions and the corresponding M-estimator has a redescending property, but the generalised estimating equation approach does not. We also illustrate that, unlike generalised estimating equations, quadratic inference functions can still provide consistent estimators even if part of the data is contaminated. We conclude that the quadratic inference function is a preferable method to the generalised estimating equation as far as robustness is concerned. This conclusion is supported by simulations and real-data examples. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
447
459
Annie Qu
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:971-9742013-03-04RePEc:oup:biomet
article
A note on reducing the bias of the approximate Bayesian bootstrap imputation variance estimator
Rubin & Schenker (1986) proposed the approximate Bayesian bootstrap, a two-stage resampling procedure, as a method of creating multiple imputations when missing data are ignorable. Kim (2002) showed that the multiple imputation variance estimator is biased for moderate sample sizes when this method is used. To reduce the bias, Kim (2002) proposed modifying the number of samples drawn at the first stage of the Bayesian bootstrap procedure. In this note, we suggest an alternative method for reducing the bias via a simple correction factor applied to the standard multiple imputation variance estimate. The proposed correction is more easily implemented and more efficient than the procedure proposed by Kim (2002). Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/92.4.971
text/html
Access to full text is restricted to subscribers.
Michael Parzen
Stuart R. Lipsitz
Garrett M. Fitzmaurice
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:529-5412013-03-04RePEc:oup:biomet
article
Case-control current status data
In this paper, we show that the distribution function of survival times is identified, up to a one-parameter family of distribution functions, based on information from case-control current status data. With supplementary information on the population frequency of cases relative to controls, a simple weighted version of the nonparametric maximum likelihood estimator for prospective current status data provides a natural estimator for case-control samples. Following the parametric results of Scott & Wild (1997), we show that this estimator is, in fact, the nonparametric maximum likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
529
541
Nicholas P. Jewell
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:909-9202013-03-04RePEc:oup:biomet
article
First-order intrinsic autoregressions and the de Wijs process
We discuss intrinsic autoregressions for a first-order neighbourhood on a two-dimensional rectangular lattice and give an exact formula for the variogram that extends known results to the asymmetric case. We obtain a corresponding asymptotic expansion that is more accurate and more general than previous ones and use this to derive the de Wijs variogram under appropriate averaging, a result that can be interpreted as a two-dimensional spatial analogue of Brownian motion obtained as the limit of a random walk in one dimension. This provides a bridge between geostatistics, where the de Wijs process was once the most popular formulation, and Markov random fields, and also explains why statistical analysis using intrinsic autoregressions is usually robust to changes of scale. We briefly describe corresponding calculations in the frequency domain, including limiting results for higher-order autoregressions. The paper closes with some practical considerations, including applications to irregularly-spaced data. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
909
920
http://hdl.handle.net/10.1093/biomet/92.4.909
text/html
Access to full text is restricted to subscribers.
Julian Besag
Debashis Mondal
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:243-2482013-03-04RePEc:oup:biomet
article
Plant-capture estimation of the size of a homogeneous population
We consider maximum likelihood estimation of the size of a target population to which has been added a known number of planted individuals. The standard equal-catchability model used in mark-recapture is assumed to be applicable to the augmented population. After proving the unimodality of the profile likelihood for the target population size, we obtain both the maximum likelihood estimator of this size and interval estimators based on its asymptotic distribution. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
243
248
http://hdl.handle.net/10.1093/biomet/asm012
application/pdf
Access to full text is restricted to subscribers.
I. B. J. Goudie
P. E. Jupp
J. Ashbridge
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:743-7502013-03-04RePEc:oup:biomet
article
Nonparametric confidence intervals for receiver operating characteristic curves
We study methods for constructing confidence intervals and confidence bands for estimators of receiver operating characteristics. Particular emphasis is placed on the way in which smoothing should be implemented, when estimating either the characteristic itself or its variance. We show that substantial undersmoothing is necessary if coverage properties are not to be impaired. A theoretical analysis of the problem suggests an empirical, plug-in rule for bandwidth choice, optimising the coverage accuracy of interval estimators. The performance of this approach is explored. Our preferred technique is based on asymptotic approximation, rather than a more sophisticated approach using the bootstrap, since the latter requires a multiplicity of smoothing parameters all of which must be chosen in nonstandard ways. It is shown that the asymptotic method can give very good performance. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
743
750
Peter Hall
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:199-2082013-03-04RePEc:oup:biomet
article
A nonparametric test for panel count data
Panel count data arise when a recurrent event is under investigation and each study subject is observed only at discrete time points. In this situation, observed data include only the numbers of occurrences of the event of interest between observation time points and no information is available on subjects between their observation time points. We propose a nonparametric test for comparing the point processes characterising the recurrent event when only panel count data are available. The asymptotic distribution of the test statistic is derived and a simulation study is conducted to evaluate its performance. The method is illustrated using data from a medical follow-up study. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
199
208
Jianguo Sun
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:519-5282013-03-04RePEc:oup:biomet
article
A note on composite likelihood inference and model selection
A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
519
528
http://hdl.handle.net/10.1093/biomet/92.3.519
text/html
Access to full text is restricted to subscribers.
Cristiano Varin
Paolo Vidoni
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:159-1822013-03-04RePEc:oup:biomet
article
Spectral models for covariance matrices
A new model for the simultaneous eigenstructure of multiple covariance matrices is proposed. The model is much more flexible than existing models and subsumes most of them as special cases. A Fisher scoring algorithm for computing maximum likelihood estimates of the parameters under normality is given. Asymptotic distributions of the estimators are derived under normality as well as under arbitrary distributions having finite fourth-order cumulants. Special attention is given to elliptically contoured distributions. Likelihood ratio tests are described and sufficient conditions are given under which the test statistics are asymptotically distributed as chi-squared random variables. Procedures are derived for evaluating Bartlett corrections under normality. Some conjectures made by Flury (1988) are verified; others are refuted. A small simulation study of the adequacy of the Bartlett correction is described and the new procedures are illustrated on two datasets. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
159
182
Robert J. Boik
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:355-3662013-03-04RePEc:oup:biomet
article
A serially correlated gamma frailty model for longitudinal count data
A Poisson-gamma model is introduced to account for between-subjects heterogeneity and within-subjects serial correlation occurring in longitudinal count data. The model extends the usual time-constant shared frailty approach to allow time-varying serially correlated gamma frailty whilst retaining standard marginal assumptions. A composite likelihood approach to estimation and testing for serial correlation is proposed. The work is motivated by a clinical trial on patient-controlled analgesia where the number of analgesic doses taken by hospital patients in successive time intervals following abdominal surgery is recorded. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
355
366
Robin Henderson
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:905-9192013-03-04RePEc:oup:biomet
article
Using Hierarchical Likelihood for Missing Data Problems
Most statistical solutions to the problem of statistical inference with missing data involve integration or expectation. This can be done in many ways: directly or indirectly, analytically or numerically, deterministically or stochastically. Missing-data problems can be formulated in terms of latent random variables, so that hierarchical likelihood methods of Lee & Nelder (1996) can be applied to missing-value problems to provide one solution to the problem of integration of the likelihood. The resulting methods effectively use a Laplace approximation to the marginal likelihood with an additional adjustment to the measures of precision to accommodate the estimation of the fixed effects parameters. We first consider missing at random cases where problems are simpler to handle because the integration does not need to involve the missing-value mechanism and then consider missing not at random cases. We also study tobit regression and refit the missing not at random selection model to the antidepressant trial data analyzed in Diggle & Kenward (1994). Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
905
919
http://hdl.handle.net/10.1093/biomet/asm063
application/pdf
Access to full text is restricted to subscribers.
Sung-Cheol Yun
Youngjo Lee
Michael G. Kenward
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Fully Bayesian spline smoothing and intrinsic autoregressive priors
There is a well-known Bayesian interpretation for function estimation by spline smoothing using a limit of proper normal priors. The limiting prior and the conditional and intrinsic autoregressive priors popular for spatial modelling have a common form, which we call partially informative normal. We derive necessary and sufficient conditions for the propriety of the posterior for this class of partially informative normal priors with noninformative priors on the variance components, a condition crucial for successful implementation of the Gibbs sampler. The results apply for fully Bayesian smoothing splines, thin-plate splines and L-splines, as well as models using intrinsic autoregressive priors. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
289
302
Paul L. Speckman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:629-6412013-03-04RePEc:oup:biomet
article
A Bayesian justification of Cox's partial likelihood
In this paper, we establish both naive and formal Bayesian justifications of Cox's (1975) partial likelihood and its various modifications. We extend the original work of Kalbfieisch (1978), who showed that the partial likelihood is a limiting marginal posterior under noninformative priors for baseline hazards. We extend the result to scenarios with time-dependent covariates and time-varying regression parameters. We establish results for continuous time as well as grouped survival data. In addition, we present a Bayesian justification of a modified partial likelihood for handling ties. We also present tools for simplification of the Gibbs sampling algorithm for implementing partial likelihood based Bayesian inference in various practical applications. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
629
641
Debajyoti Sinha
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:63-742013-03-04RePEc:oup:biomet
article
Shared parameter models under random effects misspecification
A common objective in longitudinal studies is the investigation of the association structure between a longitudinal response process and the time to an event of interest. An attractive paradigm for the joint modelling of longitudinal and survival processes is the shared parameter framework, where a set of random effects is assumed to induce their interdependence. In this work, we propose an alternative parameterization for shared parameter models and investigate the effect of misspecifying the random effects distribution in the parameter estimates and their standard errors. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
63
74
http://hdl.handle.net/10.1093/biomet/asm087
application/pdf
Access to full text is restricted to subscribers.
Dimitris Rizopoulos
Geert Verbeke
Geert Molenberghs
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:481-4852013-03-04RePEc:oup:biomet
article
Models for recurring events with marginal proportional hazards
Semiparametric methods were proposed by Wei et al. (1989) to analyse recurring event-time data. They modelled the marginal distribution of each event time with a Cox proportional hazards model without imposing any constraint on the joint distribution of different event times. Therefore, it is unclear whether or not event times can simultaneously satisfy their respective marginal proportional hazards assumptions, while having continuous joint distribution. Often this leads to a difficulty of conducting simulation studies. In this note we construct parametric marginal proportional hazards models for recurring event times with proper joint density functions. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
481
485
http://hdl.handle.net/10.1093/biomet/93.2.481
text/html
Access to full text is restricted to subscribers.
Nader Ebrahimi
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:149-1582013-03-04RePEc:oup:biomet
article
Standard errors and covariance matrices for smoothed rank estimators
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
149
158
http://hdl.handle.net/10.1093/biomet/92.1.149
text/html
Access to full text is restricted to subscribers.
B. M. Brown
You-Gan Wang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:95-1102013-03-04RePEc:oup:biomet
article
Estimating and interpolating a Markov chain from aggregate data
Given aggregated longitudinal data generated by a Markov chain, which may be nonhomogeneous, the problem considered is that of modelling, estimating and interpolating the logarithms of partial odds and hence the transition probabilities. By partial odds is meant the probability of a transition to another state divided by the probability of no transition. A result establishing asymptotic normality leads to vector weighted least squares estimation of parameterised partial odds using standard regression methods. It is shown how to obtain estimates of one-step transition probabilities from widely or irregularly spaced data. The methods are illustrated on an example concerning competing causes of death. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
95
110
B. A. Davis
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:197-2122013-03-04RePEc:oup:biomet
article
Adaptive two-stage test procedures to find the best treatment in clinical trials
A main objective in clinical trials is to find the best treatment in a given finite class of competing treatments and then to show superiority of this treatment against a control treatment. The traditional procedure estimates the best treatment in a first trial. Then in an independent second trial superiority of this treatment, estimated as best in the first trial, is to be shown against the control treatment by a size α test. In this paper we investigate these two trials of this traditional procedure as a two-stage test procedure. Additionally we introduce competing two-stage group-sequential test procedures. Then we derive formulae for the expected number of patients. These formulae depend on unknown parameters. When we have a prior for the unknown parameters we can determine the two-stage test procedure of size α and power β that is optimal, in that it needs a minimal number of observations. The results are illustrated by a numerical example, which indicates the superiority of the group-sequential procedures. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
197
212
http://hdl.handle.net/10.1093/biomet/92.1.197
text/html
Access to full text is restricted to subscribers.
Wolfgang Bischoff
Frank Miller
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:573-5862013-03-04RePEc:oup:biomet
article
A semiparametric regression cure model with current status data
This paper considers the analysis of current status data with a cured proportion in the population using a mixture model that combines a logistic regression formulation for the probability of cure with a semiparametric regression model for the time to occurrence of the event. The semiparametric regression model belongs to the flexible class of partly linear models that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estimator for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method and the model is fitted to a dataset from a study of calcification of the hydrogel intraocular lenses, a complication of cataract treatment. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
573
586
http://hdl.handle.net/10.1093/biomet/92.3.573
text/html
Access to full text is restricted to subscribers.
K. F. Lam
Hongqi Xue
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:979-9852013-03-04RePEc:oup:biomet
article
False discovery rate for scanning statistics
The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
979
985
http://hdl.handle.net/10.1093/biomet/asr057
application/pdf
Access to full text is restricted to subscribers.
D. O. Siegmund
N. R. Zhang
B. Yakir
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:285-2962013-03-04RePEc:oup:biomet
article
Marginal tests with sliced average variance estimation
We present a new computationally feasible test for the dimension of the central subspace in a regression problem based on sliced average variance estimation. We also provide a marginal coordinate test. Under the null hypothesis, both the test of dimension and the marginal coordinate test involve test statistics that asymptotically have chi-squared distributions given normally distributed predictors, and have a distribution that is a linear combination of chi-squared distributions in general. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
285
296
http://hdl.handle.net/10.1093/biomet/asm021
application/pdf
Access to full text is restricted to subscribers.
Yongwu Shao
R. Dennis Cook
Sanford Weisberg
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:399-4182013-03-04RePEc:oup:biomet
article
Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies
We consider the problem of maximum-likelihood estimation in case-control studies of gene-environment associations with disease when genetic and environmental exposures can be assumed to be independent in the underlying population. Traditional logistic regression analysis may not be efficient in this setting. We study the semiparametric maximum likelihood estimates of logistic regression parameters that exploit the gene-environment independence assumption and leave the distribution of the environmental exposures to be nonparametric. We use a profile-likelihood technique to derive a simple algorithm for obtaining the estimator and we study the asymptotic theory. The results are extended to situations where genetic and environmental factors are independent conditional on some other factors. Simulation studies investigate small-sample properties. The method is illustrated using data from a case-control study designed to investigate the interplay of BRCA1/2 mutations and oral contraceptive use in the aetiology of ovarian cancer. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
399
418
http://hdl.handle.net/10.1093/biomet/92.2.399
text/html
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Raymond J. Carroll
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:399-4102013-03-04RePEc:oup:biomet
article
Optimal testing of multiple hypotheses with common effect direction
We present a theoretical basis for testing related endpoints. Typically, it is known how to construct tests of the individual hypotheses, but not how to combine them into a multiple test procedure that controls the familywise error rate. Using the closure method, we emphasize the role of consonant procedures, from an interpretive as well as a theoretical viewpoint. Surprisingly, even if each intersection test has an optimality property, the overall procedure obtained by applying closure to these tests may be inadmissible. We introduce a new procedure, which is consonant and has a maximin property under the normal model. The results are then applied to PROactive, a clinical trial designed to investigate the effectiveness of a glucose-lowering drug on macrovascular outcomes among patients with type 2 diabetes. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
399
410
http://hdl.handle.net/10.1093/biomet/asp006
application/pdf
Access to full text is restricted to subscribers.
Richard M. Bittman
Joseph P. Romano
Carlos Vallarino
Michael Wolf
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:241-2472013-03-04RePEc:oup:biomet
article
A note on path-based variable selection in the penalized proportional hazards model
We propose an efficient and adaptive shrinkage method for variable selection in the Cox model. The method constructs a piecewise-linear regularization path connecting the maximum partial likelihood estimator and the origin. Then a model is selected along the path. We show that the constructed path is adaptive in the sense that, with a proper choice of regularization parameter, the fitted model works as well as if the true underlying submodel were given in advance. A modified algorithm of the least-angle-regression type efficiently computes the entire regularization path of the new estimator. Furthermore, we show that, with a proper choice of shrinkage parameter, the method is consistent in variable selection and efficient in estimation. Simulation shows that the new method tends to outperform the lasso and the smoothly-clipped-absolute-deviation estimators with moderate samples. We apply the methodology to data concerning nursing homes. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
241
247
http://hdl.handle.net/10.1093/biomet/asm083
application/pdf
Access to full text is restricted to subscribers.
Hui Zou
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:509-5112013-03-04RePEc:oup:biomet
article
Adjusting estimative prediction limits
This note presents a direct adjustment of the estimative prediction limit to reduce the coverage error from a target value to third-order accuracy. The adjustment is asymptotically equivalent to those of Barndorff-Nielsen & Cox (1994, 1996) and Vidoni (1998). It has a simpler form with a plug-in estimator of the coverage probability of the estimative limit at the target value. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
509
511
http://hdl.handle.net/10.1093/biomet/asm032
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
Kaoru Fueda
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Some nonregular designs from the Nordstrom–Robinson code and their statistical properties
The Nordstrom--Robinson code is a well-known nonlinear code in coding theory. This paper explores the statistical properties of this nonlinear code. Many nonregular designs with 32, 64, 128 and 256 runs and 7--16 factors are derived from it. It is shown that these nonregular designs are better than regular designs of the same size in terms of resolution, aberration and projectivity. Furthermore, many of these nonregular designs are shown to have generalised minimum aberration among all possible designs. Seven orthogonal arrays are shown to have unique word-length pattern and four of them are shown to be unique up to isomorphism. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/92.2.385
text/html
Access to full text is restricted to subscribers.
Hongquan Xu
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:553-5662013-03-04RePEc:oup:biomet
article
Bayesian analysis of covariance matrices and dynamic models for longitudinal data
Parsimonious modelling of the within-subject covariance structure while heeding its positive-definiteness is of great importance in the analysis of longitudinal data. Using the Cholesky decomposition and the ensuing unconstrained and statistically meaningful reparameterisation, we provide a convenient and intuitive framework for developing conditionally conjugate prior distributions for covariance matrices and show their connections with generalised inverse Wishart priors. Our priors offer many advantages with regard to elicitation, positive definiteness, computations using Gibbs sampling, shrinking covariances toward a particular structure with considerable flexibility, and modelling covariances using covariates. Bayesian estimation methods are developed and the results are compared using two simulation studies. These simulations suggest simpler and more suitable priors for the covariance structure of longitudinal data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
553
566
Michael J. Daniels
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:293-3062013-03-04RePEc:oup:biomet
article
Generalized method of moments estimation for linear regression with clustered failure time data
We propose a generalized method of moments approach to the accelerated failure time model with correlated survival data. We study the semiparametric rank estimator using martingale-based moments. We circumvent direct estimation of correlation parameters by concatenating the moments and minimizing a quadratic objective function. We establish the consistency and asymptotic normality of the parameter estimators, and derive the limiting distribution of the objective function. We carry out simulation studies to examine the finite-sample properties of the method, and demonstrate its substantial efficiency gain over the conventional method. Finally, we illustrate the new proposal with an example from a diabetic retinopathy study. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
293
306
http://hdl.handle.net/10.1093/biomet/asp005
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:921-9362013-03-04RePEc:oup:biomet
article
Towards reconciling two asymptotic frameworks in spatial statistics
Two asymptotic frameworks, increasing domain asymptotics and infill asymptotics, have been advanced for obtaining limiting distributions of maximum likelihood estimators of covariance parameters in Gaussian spatial models with or without a nugget effect. These limiting distributions are known to be different in some cases. It is therefore of interest to know, for a given finite sample, which framework is more appropriate. We consider the possibility of making this choice on the basis of how well the limiting distributions obtained under each framework approximate their finite-sample counterparts. We investigate the quality of these approximations both theoretically and empirically, showing that, for certain consistently estimable parameters of exponential covariograms, approximations corresponding to the two frameworks perform about equally well. For those parameters that cannot be estimated consistently, however, the infill asymptotic approximation is preferable. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
921
936
http://hdl.handle.net/10.1093/biomet/92.4.921
text/html
Access to full text is restricted to subscribers.
Hao Zhang
Dale L. Zimmerman
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:357-3662013-03-04RePEc:oup:biomet
article
Confidence bands for hazard rates under random censorship
We suggest a completely empirical approach to the construction of confidence bands for hazard functions, based on smoothing the Nelsen-Aalen estimator. In particular, we introduce a local bandwidth-choice method. Our approach uses empirical information about both the survival rate and the censoring rate, and employs undersmoothing to alleviate difficulties caused by bias. We use both Edgeworth expansion and numerical simulation, the former to develop a basic formula and the latter to modify it for general use. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
357
366
http://hdl.handle.net/10.1093/biomet/93.2.357
text/html
Access to full text is restricted to subscribers.
Ming-Yen Cheng
Peter Hall
Dongsheng Tu
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:471-4902013-03-04RePEc:oup:biomet
article
Efficient importance sampling for events of moderate deviations with applications
We propose a method for finding the alternative distribution in importance sampling. The alternative distribution is optimal in the sense that the asymptotic variance is minimised for estimating tail probabilities of asymptotically normal statistics. Our contribution to importance sampling is three-fold. To begin with, we obtain an explicit expression for the mean of the optimal alternative distribution and the expression motivates a recursive approximation algorithm. Secondly, a new multi-dimensional exponential tilting formula is presented. Lastly, a conservative estimator of the variance is given to facilitate a quick comparison among different stratified sampling schemes in conjunction with importance sampling. Several numerical examples illustrating the efficacy of the proposed method are also included. These results indicate that the proposed method is considerably more efficient than the method based on large deviations theory and the efficiency gain is more significant in higher dimensions. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
471
490
Cheng-Der Fuh
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:393-4082013-03-04RePEc:oup:biomet
article
Nonparametric inference for stochastic linear hypotheses: Application to high-dimensional data
The Mann--Whitney--Wilcoxon rank sum test is limited to comparison of two groups with univariate responses. In this paper, we introduce a class of stochastic linear hypotheses that addresses these limitations within a nonparametric setting. We formulate hypotheses for simultaneous comparisons of several, multivariate response groups, without modelling the response distributions. Inference is developed based on U-statistics theory and an exchangeability assumption. The latter condition is required to identify testable hypotheses for high-dimensional response vectors, such as those arising in genomic and psychosocial research. The methodology is illustrated with two real-data applications. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
393
408
Jeanne Kowalski
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:283-3012013-03-04RePEc:oup:biomet
article
Additive hazards Markov regression models illustrated with bone marrow transplant data
When there are covariate effects to be considered, multi-state survival analysis is dominated either by parametric Markov regression models or by semiparametric Markov regression models using Cox's (1972) proportional hazards models for transition intensities between the states. The purpose of this research work is to study alternatives to Cox's model in a general finite-state Markov process setting. We shall look at two alternative models, Aalen's (1989) nonparametric additive hazards model and Lin & Ying's (1994) semiparametric additive hazards model. The former allows the effects of covariates to vary freely over time, while the latter assumes that the regression coefficients are constant over time. With the basic tools of the product integral and the functional delta-method, we present an estimator of the transition probability matrix and develop the large-sample theory for the estimator under each of these two models. Data on 1459 HLA identical sibling transplants for acute leukaemia from the International Bone Marrow Transplant Registry serve as illustration. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
283
301
http://hdl.handle.net/10.1093/biomet/92.2.283
text/html
Access to full text is restricted to subscribers.
Youyi Shu
John P. Klein
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:19-362013-03-04RePEc:oup:biomet
article
Efficient nonparametric estimation of causal effects in randomized trials with noncompliance
Causal approaches based on the potential outcome framework provide a useful tool for addressing noncompliance problems in randomized trials. We propose a new estimator of causal treatment effects in randomized clinical trials with noncompliance. We use the empirical likelihood approach to construct a profile random sieve likelihood and take into account the mixture structure in outcome distributions, so that our estimator is robust to parametric distribution assumptions and provides substantial finite-sample efficiency gains over the standard instrumental variable estimator. Our estimator is asymptotically equivalent to the standard instrumental variable estimator, and it can be applied to outcome variables with a continuous, ordinal or binary scale. We apply our method to data from a randomized trial of an intervention to improve the treatment of depression among depressed elderly patients in primary care practices. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
19
36
http://hdl.handle.net/10.1093/biomet/asn056
application/pdf
Access to full text is restricted to subscribers.
Jing Cheng
Dylan S. Small
Zhiqiang Tan
Thomas R. Ten Have
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:647-6662013-03-04RePEc:oup:biomet
article
The accelerated gap times model
This paper develops a new semiparametric model for the effect of covariates on the conditional intensity of a recurrent event counting process. The model is a transparent extension of the accelerated failure time model for univariate survival data. Estimation of the regression parameter is motivated by semiparametric efficiency considerations, extending the class of weighted log-rank estimating functions originally proposed in Prentice (1978) and subsequently studied in detail by Tsiatis (1990) and Ritov (1990). A novel rank-based one-step estimator for the regression parameter is proposed. An Aalen-type estimator for the baseline intensity function is obtained. Asymptotics are handled with empirical process methods, and finite sample properties are studied via simulation. Finally, the new model is applied to the bladder tumour data of Byar (1980). Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
647
666
http://hdl.handle.net/10.1093/biomet/92.3.647
text/html
Access to full text is restricted to subscribers.
Robert L. Strawderman
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:747-7622013-03-04RePEc:oup:biomet
article
Censored linear regression for case-cohort studies
Right-censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design this subcohort is elected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analysed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the pseudolikelihood estimating equation that has been used in relative risk regression for these models. The estimators so obtained are shown to be consistent and asymptotically normal. Variance estimation and numerical illustrations are also provided. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
747
762
http://hdl.handle.net/10.1093/biomet/93.4.747
text/html
Access to full text is restricted to subscribers.
Bin Nan
Menggang Yu
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:153-1652013-03-04RePEc:oup:biomet
article
Modelling the effects of partially observed covariates on Poisson process intensity
We propose an estimating function for parameters in a model for Poisson process intensity when time- or space-varying covariates are observed for both the events of the process and at sample times or locations selected from a probability-based sampling design. We investigate the large-sample properties of the proposed estimator under increasing domain asymptotics, demonstrating that it is consistent and asymptotically normally distributed. We illustrate our approach using data from an ecological momentary assessment of smoking. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
153
165
http://hdl.handle.net/10.1093/biomet/asm009
application/pdf
Access to full text is restricted to subscribers.
Stephen L. Rathbun
Saul Shiffman
Chad J. Gwaltney
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:585-5962013-03-04RePEc:oup:biomet
article
Using logistic regression procedures for estimating receiver operating characteristic curves
Estimation of a receiver operating characteristic, ROC, curve is usually based either on a fully parametric model such as a normal model or on a fully nonparametric model. In this paper, we explore a semiparametric approach by assuming a density ratio model for disease and disease-free densities. This model has a natural connection with the logistic regression model. The proposed semiparametric approach is more robust than a fully parametric approach and is more efficient than a fully nonparametric approach. Two real examples demonstrate that the ROC curve estimated by our semiparametric method is much smoother than that estimated by the nonparametric method. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
585
596
Jing Qin
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:225-2292013-03-04RePEc:oup:biomet
article
Optimal main effect plans with non-orthogonal blocking
The current literature on fractional factorial plans in block designs centres around orthogonal blocking which may not, however, always be attainable because of practical restrictions on the block size. For general factorials, including asymmetric ones, sufficient conditions are indicated in this paper for a main effect plan to be universally optimal under possibly non-orthogonal blocking. A construction procedure is given using generalised Youden designs in conjunction with orthogonal arrays. We also illustrate how the procedure can be applied to obtain optimal main effect plans in the practically important situation where each factor has two or three levels and the block size is small. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
225
229
Rahul Mukerjee
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:953-9642013-03-04RePEc:oup:biomet
article
A Jackknife Variance Estimator for Unistage Stratified Samples with Unequal Probabilities
Existing jackknife variance estimators used with sample surveys can seriously overestimate the true variance under unistage stratified sampling without replacement with unequal probabilities. A novel jackknife variance estimator is proposed which is as numerically simple as existing jackknife variance estimators. Under certain regularity conditions, the proposed variance estimator is consistent under stratified sampling without replacement with unequal probabilities. The high entropy regularity condition necessary for consistency is shown to hold for the Rao--Sampford design. An empirical study of three unequal probability sampling designs supports our findings. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
953
964
http://hdl.handle.net/10.1093/biomet/asm072
application/pdf
Access to full text is restricted to subscribers.
Yves G. Berger
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:49-602013-03-04RePEc:oup:biomet
article
Optimal asymmetric one-sided group sequential tests
We extend the optimal symmetric group sequential tests of Eales & Jennison (1992) to the broader class of asymmetric designs. Two forms of asymmetry are considered, involving unequal type I and type II error rates and different emphases on expected sample sizes at the null and alternative hypotheses. We discuss the properties of our optimal designs and use them to assess the efficiency of the family of tests proposed by Pampallona & Tsiatis (1994) and two families of one-sided tests defined through error spending functions. We show that the error spending designs are highly efficient, while the easily implemented tests of Pampallona & Tsiatis are a little less efficient but still not far from optimal. Our results demonstrate that asymmetric designs can decrease the expected sample size under one hypothesis, but only at the expense of a significantly larger expected sample size under the other hypothesis. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
49
60
Stuart Barber
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:37-472013-03-04RePEc:oup:biomet
article
Graphical identifiability criteria for causal effects in studies with an unobserved treatment/response variable
We consider the problem of using data in studies with an unobserved treatment/response variable in order to evaluate average causal effects, when cause-effect relationships between variables can be described by a directed acyclic graph and the corresponding recursive factorization of a joint distribution. The paper proposes graphical criteria to test whether average causal effects are identifiable even if a treatment/response variable is unobserved. If the answer is affirmative, we provide further formulations for average causal effects from the observed data. The graphical criteria enable us to evaluate average causal effects when it is difficult to observe a treatment/response variable. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
37
47
http://hdl.handle.net/10.1093/biomet/asm005
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:559-5782013-03-04RePEc:oup:biomet
article
Fractional hot deck imputation
To compensate for item nonresponse, hot deck imputation procedures replace missing values with values that occur in the sample. Fractional hot deck imputation replaces each missing observation with a set of imputed values and assigns a weight to each imputed value. Under the model in which observations in an imputation cell are independently and identically distributed, fractional hot deck imputation is shown to be an effective imputation procedure. A consistent replication variance estimation procedure for estimators computed with fractional imputation is suggested. Simulations show that fractional imputation and the suggested variance estimator are superior to multiple imputation estimators in general, and much superior to multiple imputation for estimating the variance of a domain mean. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
559
578
Jae Kwang Kim
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:248-2482013-03-04RePEc:oup:biomet
article
A note on time-ordered classification
1
2009
96
Biometrika
248
248
http://hdl.handle.net/10.1093/biomet/asn065
application/pdf
Access to full text is restricted to subscribers.
H. He
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:605-6182013-03-04RePEc:oup:biomet
article
Semiparametric inference in observational duration-response studies, with duration possibly right-censored
Once treatment is found to be effective in clinical studies, attention often focuses on optimum or efficacious treatment delivery. In treatment duration-response studies, the optimum treatment delivery refers to the treatment length that optimises the mean response. In many studies, the treatment length is often left to the discretion of an attending investigator or physician but may be abruptly terminated because of treatment-terminating events. Thus, a recommended treatment length often delineates a 'treatment duration policy' which prescribes that treatment be given for a specified length of time or until a treatment-terminating event occurs, whichever comes first. Estimating a functional relationship between the response and a treatment duration policy, continuously in time, is the focus of this paper. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
605
618
http://hdl.handle.net/10.1093/biomet/92.3.605
text/html
Access to full text is restricted to subscribers.
Brent A. Johnson
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:613-6282013-03-04RePEc:oup:biomet
article
Large-sample properties of the periodogram estimator of seasonally persistent processes
Seasonally persistent models were first introduced by Andel (1986) and Gray et al. (1989) to extend autoregressive moving-average and fractionally differenced models and to encompass long-memory quasi-periodic behaviour. These models are, for certain ranges of parameters, stationary, and we prove here that the behaviour of the periodogram and other tapered estimators cannot be simply extended from the work of Kunsch (1986) and Hurvich & Beltrao (1993) on long memory induced by a pole at the origin. We demonstrate that potentially large both positive and negative bias can be found from the same value of the long-memory parameter, and that the new distribution can be easily written down in the case of Gaussian processes. We also consider using both the cosine taper and the sine taper. The extended least squares estimator is also considered in this context. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
613
628
Sofia C. Olhede
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:953-9662013-03-04RePEc:oup:biomet
article
Asymptotic distributions of principal components based on robust dispersions
Algebraically, principal components can be defined as the eigenvalues and eigenvectors of a covariance or correlation matrix, but they are statistically meaningful as successive projections of the multivariate data in the direction of maximal variability. An attractive alternative in robust principal component analysis is to replace the classical variability measure, i.e. variance, by a robust dispersion measure. This projection-pursuit approach was first proposed in Li & Chen (1985) as a method of constructing a robust scatter matrix. Recent unpublished work of C. Croux and A. Ruiz-Gazen provided the influence functions of the resulting principal components. The present paper focuses on the asymptotic distributions of robust principal components. In particular, we obtain the asymptotic normality of the principal components that maximise a robust dispersion measure. We also explain the need to use a dispersion functional with a continuous influence function. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
953
966
Hengjian Cui
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:75-922013-03-04RePEc:oup:biomet
article
Predicting future responses based on possibly mis-specified working models
Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the 𝒦-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
75
92
http://hdl.handle.net/10.1093/biomet/asm078
application/pdf
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
Scott D. Solomon
L.J. Wei
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:705-7222013-03-04RePEc:oup:biomet
article
Empirical Bayes block shrinkage of wavelet coefficients via the noncentral χ-super-2 distribution
Empirical Bayes approaches to the shrinkage of empirical wavelet coefficients have generated considerable interest in recent years. Much of the work to date has focussed on shrinkage of individual wavelet coefficients in isolation. In this paper we propose an empirical Bayes approach to simultaneous shrinkage of wavelet coefficients in a block, based on the block sum of squares. Our approach exploits a useful identity satisfied by the noncentral χ-super-2 density and provides some tractable Bayesian block shrinkage procedures. Our numerical results indicate that the new procedures perform very well. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
705
722
http://hdl.handle.net/10.1093/biomet/93.3.705
text/html
Access to full text is restricted to subscribers.
Xue Wang
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:67-822013-03-04RePEc:oup:biomet
article
D-optimal design of split-split-plot experiments
In industrial experimentation, there is growing interest in studies that span more than one processing step. Convenience often dictates restrictions in randomization in passing from one processing step to another. When the study encompasses three processing steps, this leads to split-split-plot designs. We provide an algorithm for computing D-optimal split-split-plot designs and several illustrative examples. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
67
82
http://hdl.handle.net/10.1093/biomet/asn070
application/pdf
Access to full text is restricted to subscribers.
Bradley Jones
Peter Goos
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:351-3702013-03-04RePEc:oup:biomet
article
Conditional Akaike information for mixed-effects models
This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, cAIC. The penalty term in cAIC is related to the effective degrees of freedom ρ for a linear mixed model proposed by Hodges & Sargent (2001); ρ reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The cAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data application is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
351
370
http://hdl.handle.net/10.1093/biomet/92.2.351
text/html
Access to full text is restricted to subscribers.
Florin Vaida
Suzette Blanchard
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:127-1372013-03-04RePEc:oup:biomet