2015-02-28T19:12:05Z
http://oai.repec.openlib.org/oai.php
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:929-9442013-01-01RePEc:oup:biomet
article
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction
Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
929
944
http://hdl.handle.net/10.1093/biomet/ass044
application/pdf
Access to full text is restricted to subscribers.
James Y. Dai
Charles Kooperberg
Michael Leblanc
Ross L. Prentice
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:799-8112013-01-01RePEc:oup:biomet
article
Choosing trajectory and data type when classifying functional data
In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
799
811
http://hdl.handle.net/10.1093/biomet/ass011
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Tapabrata Maiti
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:995-10002013-01-01RePEc:oup:biomet
article
Proportional mean residual life model for right-censored length-biased data
To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika 77, 409--10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
995
1000
http://hdl.handle.net/10.1093/biomet/ass049
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
Ying Qing Chen
Chong-Zhi Di
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:981-9882013-01-01RePEc:oup:biomet
article
Finite population estimators in stochastic search variable selection
Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz--Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
981
988
http://hdl.handle.net/10.1093/biomet/ass040
application/pdf
Access to full text is restricted to subscribers.
Merlise A. Clyde
Joyee Ghosh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:813-8322013-01-01RePEc:oup:biomet
article
Dispersion operators and resistant second-order functional data analysis
Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M-test based on a spectrally truncated version of the Hilbert--Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
813
832
http://hdl.handle.net/10.1093/biomet/ass037
application/pdf
Access to full text is restricted to subscribers.
David Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:945-9582013-01-01RePEc:oup:biomet
article
Penalized balanced sampling
Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz--Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz--Thompson estimation dominates a variety of standard strategies. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
945
958
http://hdl.handle.net/10.1093/biomet/ass033
application/pdf
Access to full text is restricted to subscribers.
F. J. Breidt
G. Chauvet
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:915-9282013-01-01RePEc:oup:biomet
article
On the sparsity of signals in a random sample
This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
915
928
http://hdl.handle.net/10.1093/biomet/ass039
application/pdf
Access to full text is restricted to subscribers.
Binyan Jiang
Wei-Liem Loh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:879-8982013-01-01RePEc:oup:biomet
article
Scaled sparse linear regression
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
879
898
http://hdl.handle.net/10.1093/biomet/ass043
application/pdf
Access to full text is restricted to subscribers.
Tingni Sun
Cun-Hui Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:833-8492013-01-01RePEc:oup:biomet
article
A geometric approach to projective shape and the cross ratio
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
833
849
http://hdl.handle.net/10.1093/biomet/ass055
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Kanti V. Mardia
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:775-7862013-01-01RePEc:oup:biomet
article
Classification based on a permanental process with cyclic approximation
We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2--3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
775
786
http://hdl.handle.net/10.1093/biomet/ass047
application/pdf
Access to full text is restricted to subscribers.
J. Yang
K. Miescke
P. McCullagh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:959-9722013-01-01RePEc:oup:biomet
article
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
959
972
http://hdl.handle.net/10.1093/biomet/ass036
application/pdf
Access to full text is restricted to subscribers.
Ronald W. Butler
Douglas A. Bronson
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:865-8772013-01-01RePEc:oup:biomet
article
A two-stage dimension-reduction method for transformed responses and its applications
Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
865
877
http://hdl.handle.net/10.1093/biomet/ass042
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:787-7982013-01-01RePEc:oup:biomet
article
Orthogonalization of vectors with minimal adjustment
Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
787
798
http://hdl.handle.net/10.1093/biomet/ass041
application/pdf
Access to full text is restricted to subscribers.
Paul H. Garthwaite
Frank Critchley
Karim Anaya-Izquierdo
Emmanuel Mubwandarikwa
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:973-9802013-01-01RePEc:oup:biomet
article
Statistical properties of an early stopping rule for resampling-based multiple testing
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
973
980
http://hdl.handle.net/10.1093/biomet/ass051
application/pdf
Access to full text is restricted to subscribers.
Hui Jiang
Julia Salzman
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:1001-10072013-01-01RePEc:oup:biomet
article
An efficient empirical likelihood approach for estimating equations with missing data
We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
1001
1007
http://hdl.handle.net/10.1093/biomet/ass045
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Yongsong Qin
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:763-7742013-01-01RePEc:oup:biomet
article
Testing one hypothesis twice in observational studies
In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
763
774
http://hdl.handle.net/10.1093/biomet/ass032
application/pdf
Access to full text is restricted to subscribers.
P. R. Rosenbaum
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:989-9942013-01-01RePEc:oup:biomet
article
Compatible weighted proper scoring rules
Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
989
994
http://hdl.handle.net/10.1093/biomet/ass046
application/pdf
Access to full text is restricted to subscribers.
P. G. M. Forbes
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:851-8642013-01-01RePEc:oup:biomet
article
Bidirectional discrimination with application to data visualization
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
851
864
http://hdl.handle.net/10.1093/biomet/ass029
application/pdf
Access to full text is restricted to subscribers.
Hanwen Huang
Yufeng Liu
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:899-9142013-01-01RePEc:oup:biomet
article
Simultaneous supervised clustering and feature selection over a graph
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
899
914
http://hdl.handle.net/10.1093/biomet/ass038
application/pdf
Access to full text is restricted to subscribers.
Xiaotong Shen
Hsin-Cheng Huang
Wei Pan
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:231-2372010-03-05RePEc:oup:biomet
article
Weighted least squares approximate restricted likelihood estimation for vector autoregressive processes
We derive a weighted least squares approximate restricted likelihood estimator for a k-dimensional pth-order autoregressive model with intercept. Exact likelihood optimization of this model is generally infeasible due to the parameter space, which is complicated and high-dimensional, involving pk-super-2 parameters. The weighted least squares estimator has significantly reduced bias and mean squared error than the ordinary least squares estimator for both stationary and nonstationary processes. Furthermore, at the unit root, the limiting distribution of the weighted least squares approximate restricted likelihood estimator is shown to be the zero-intercept Dickey--Fuller distribution, unlike the ordinary least squares with intercept estimator that has a different distribution with significantly higher bias. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
231
237
http://hdl.handle.net/10.1093/biomet/asp071
application/pdf
Access to full text is restricted to subscribers.
Willa W. Chen
Rohit S. Deo
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:181-1982010-03-05RePEc:oup:biomet
article
On Bayesian testimation and its application to wavelet thresholding
We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse l<sub>p</sub>-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense l<sub>p</sub>-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
181
198
http://hdl.handle.net/10.1093/biomet/asp080
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
Athanasia Petsa
Theofanis Sapatinas
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:49-642010-03-05RePEc:oup:biomet
article
Functional quadratic regression
We extend the common linear functional regression model to the case where the dependency of a scalar response on a functional predictor is of polynomial rather than linear nature. Focusing on the quadratic case, we demonstrate the usefulness of the polynomial functional regression model, which encompasses linear functional regression as a special case. Our approach works under mild conditions for the case of densely spaced observations and also can be extended to the important practical situation where the functional predictors are derived from sparse and irregular measurements, as is the case in many longitudinal studies. A key observation is the equivalence of the functional polynomial model with a regression model that is a polynomial of the same order in the functional principal component scores of the predictor processes. Theoretical analysis as well as practical implementations are based on this equivalence and on basis representations of predictor processes. We also obtain an explicit representation of the regression surface that defines quadratic functional regression and provide functional asymptotic results for an increasing number of model components as the number of subjects in the study increases. The improvements that can be gained by adopting quadratic as compared to linear functional regression are illustrated with a case study that includes absorption spectra as functional predictors. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
49
64
http://hdl.handle.net/10.1093/biomet/asp069
application/pdf
Access to full text is restricted to subscribers.
Fang Yao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:199-2082010-03-05RePEc:oup:biomet
article
Forecasting for quantile self-exciting threshold autoregressive time series models
Self-exciting threshold autoregressive time series models have been used extensively, and the conditional mean obtained from these models can be used to predict the future value of a random variable. In this paper we consider quantile forecasts of a time series based on the quantile self-exciting threshold autoregressive time series models proposed by Cai and Stander (2008) and present a new forecasting method for them. Simulation studies and application to real time series show that the method works very well. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
199
208
http://hdl.handle.net/10.1093/biomet/asp070
application/pdf
Access to full text is restricted to subscribers.
Yuzhi Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:31-482010-03-05RePEc:oup:biomet
article
Incorporating prior probabilities into high-dimensional classifiers
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
31
48
http://hdl.handle.net/10.1093/biomet/asp081
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Jing-Hao Xue
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:133-1452010-03-05RePEc:oup:biomet
article
A semiparametric random effects model for multivariate competing risks data
We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
133
145
http://hdl.handle.net/10.1093/biomet/asp082
application/pdf
Access to full text is restricted to subscribers.
Thomas H. Scheike
Yanqing Sun
Mei-Jie Zhang
Tina Kold Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:147-1582010-03-05RePEc:oup:biomet
article
Estimation of the retransformed conditional mean in health care cost studies
We propose a new approach for analyzing skewed and heteroscedastic health care cost data through regression of the conditional quantiles of the transformed cost. Using the appealing equivariance property of quantiles to monotone transformations, we propose a distribution-free estimator of the conditional mean cost on the original scale. The proposed method is extended to a two-part heteroscedastic model to account for zero costs commonly seen in health care cost studies. Simulation studies indicate that the proposed estimator has competitive and more robust performance than existing estimators in various heteroscedastic models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
147
158
http://hdl.handle.net/10.1093/biomet/asp072
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Xiao-Hua Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:95-1082010-03-05RePEc:oup:biomet
article
On the use of stochastic ordering to test for trend with clustered binary data
We introduce the use of stochastic ordering for defining treatment-related trend in clustered exchangeable binary data for both when cluster sizes are fixed and when they vary randomly. In the latter case, there is a well-documented tendency for such data to be sparse, a problem we address by making an assumption of interpretability or, equivalently, marginal compatibility. Our procedures are based on a representation of the joint distribution of binary exchangeable random variables by a saturated model, and may hence be considered nonparametric. The definition of trend by stochastic ordering is proposed to ensure a flexibility that allows for various forms of monotone increases in response to the cluster as a whole to be included in the evaluation of the trend. We obtain maximum likelihood estimates of probability functions under stochastic ordering using mixture-likelihood-based algorithms. Since the data are sparse, we avoid the use of asymptotic results and obtain p-values of the likelihood ratio procedures by permutation resampling. We demonstrate how the proposed framework can be used in risk assessment, and provide comparisons with existing procedures. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
95
108
http://hdl.handle.net/10.1093/biomet/asp077
application/pdf
Access to full text is restricted to subscribers.
Aniko Szabo
E. Olusegun George
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:1-132010-03-05RePEc:oup:biomet
article
Systematic sampling with errors in sample locations
Systematic sampling of points in continuous space is widely used in microscopy and spatial surveys. Classical theory provides asymptotic expressions for the variance of estimators based on systematic sampling as the grid spacing decreases. However, the classical theory assumes that the sample grid is exactly periodic; real physical sampling procedures may introduce errors in the placement of the sample points. This paper studies the effect of errors in sample positioning on the variance of estimators in the case of one-dimensional systematic sampling. First we sketch a general approach to variance analysis using point process methods. We then analyze three different models for the error process, calculate exact expressions for the variances, and derive asymptotic variances. Errors in the placement of sample points can lead to substantial inflation of the variance, dampening of zitterbewegung, that is fluctuation effects, and a slower order of convergence. This suggests that the current practice in some areas of microscopy may be based on over-optimistic predictions of estimator accuracy. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
1
13
http://hdl.handle.net/10.1093/biomet/asp067
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Adrian Baddeley
Karl-Anton Dorph-Petersen
Eva B. Vedel Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:171-1802010-03-05RePEc:oup:biomet
article
On doubly robust estimation in a semiparametric odds ratio model
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007). Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
171
180
http://hdl.handle.net/10.1093/biomet/asp062
application/pdf
Access to full text is restricted to subscribers.
Eric J. Tchetgen Tchetgen
James M. Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:79-932010-03-05RePEc:oup:biomet
article
Generalized empirical likelihood methods for analyzing longitudinal data
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
79
93
http://hdl.handle.net/10.1093/biomet/asp073
application/pdf
Access to full text is restricted to subscribers.
Suojin Wang
Lianfen Qian
Raymond J. Carroll
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:123-1322010-03-05RePEc:oup:biomet
article
Sharp bounds on causal effects in case-control and cohort studies
Evaluating the causal effect of an exposure on a response from case-control and cohort studies is a major concern in epidemiological and medical research. Since causal effects are in general nonidentifiable from such studies, this paper derives bounds on two causal measures: the causal risk difference and the causal risk ratio. We use the potential response approach and a linear programming method to derive sharp bounds on the causal risk difference, and a novel fractional programming method to derive bounds on the causal risk ratio. In addition, in the presence of missing data, we consider three different missingness mechanisms and propose sharp bounds under these situations. The results provide new guidance on causal inference in case-control and cohort studies. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
123
132
http://hdl.handle.net/10.1093/biomet/asp076
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
Zhi Geng
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:65-782010-03-05RePEc:oup:biomet
article
Marginal analyses of longitudinal data with an informative pattern of observations
We consider solutions to generalized estimating equations with singular working correlation matrices, of which the estimator of Diggle et al. (2007) is a special case. We give explicit conditions for consistent estimation when the pattern of observations may be informative. In such cases, simulations reveal reduced bias and reduced mean squared error compared with existing alternatives. A study of peritoneal dialysis is used to illustrate the methodology. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
65
78
http://hdl.handle.net/10.1093/biomet/asp068
application/pdf
Access to full text is restricted to subscribers.
D. M. Farewell
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:15-302010-03-05RePEc:oup:biomet
article
Cross-covariance functions for multivariate random fields based on latent dimensions
The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
15
30
http://hdl.handle.net/10.1093/biomet/asp078
application/pdf
Access to full text is restricted to subscribers.
Tatiyana V. Apanasovich
Marc G. Genton
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:254-2592010-03-05RePEc:oup:biomet
article
The maximal data piling direction for discrimination
We study a discriminant direction vector that generally exists only in high-dimension, low sample size settings. Projections of data onto this direction vector take on only two distinct values, one for each class. There exist infinitely many such directions in the subspace generated by the data; but the maximal data piling vector has the longest distance between the projections. This paper investigates mathematical properties and classification performance of this discrimination method. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
254
259
http://hdl.handle.net/10.1093/biomet/asp084
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:215-2222010-03-05RePEc:oup:biomet
article
Pseudo-score confidence intervals for parameters in discrete statistical models
We propose pseudo-score confidence intervals for parameters in models for discrete data. The confidence interval is obtained by inverting a test that uses a Pearson chi-squared statistic to compare fitted values for the working model with fitted values of the model when a parameter of interest takes various fixed values. For multinomial models, the pseudo-score method simplifies to the score method when the model is saturated and otherwise it is asymptotically equivalent to score and likelihood ratio test-based inferences. For cases in which ordinary score methods are impractical, such as when the likelihood function is not an explicit function of model parameters, the pseudo-score method is feasible. We illustrate the method for four such examples. Generalizations of the method are also presented for future research, including inference for complex sampling designs using a quasilikelihood Pearson statistic that compares fitted values for two models relative to the variance of the observations under the simpler model. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
215
222
http://hdl.handle.net/10.1093/biomet/asp074
application/pdf
Access to full text is restricted to subscribers.
Alan Agresti
Euijung Ryu
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:109-1212010-03-05RePEc:oup:biomet
article
Stochastic approximation with virtual observations for dose-finding on discrete levels
Phase I clinical studies are experiments in which a new drug is administered to humans to determine the maximum dose that causes toxicity with a target probability. Phase I dose-finding is often formulated as a quantile estimation problem. For studies with a biological endpoint, it is common to define toxicity by dichotomizing the continuous biomarker expression. In this article, we propose a novel variant of the Robbins--Monro stochastic approximation that utilizes the continuous measurements for quantile estimation. The Robbins--Monro method has seldom seen clinical applications, because it does not perform well for quantile estimation with binary data and it works with a continuum of doses that are generally not available in practice. To address these issues, we formulate the dose-finding problem as root-finding for the mean of a continuous variable, for which the stochastic approximation procedure is efficient. To accommodate the use of discrete doses, we introduce the idea of virtual observation that is defined on a continuous dosage range. Our proposed method inherits the convergence properties of the stochastic approximation algorithm and its computational simplicity. Simulations based on real trial data show that our proposed method improves accuracy compared with the continual re-assessment method and produces results robust to model misspecification. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
109
121
http://hdl.handle.net/10.1093/biomet/asp065
application/pdf
Access to full text is restricted to subscribers.
Ying Kuen Cheung
Mitchell S. V. Elkind
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:238-2452010-03-05RePEc:oup:biomet
article
Nonparametric Bayesian inference for the spectral density function of a random field
A powerful technique for inference concerning spatial dependence in a random field is to use spectral methods based on frequency domain analysis. Here we develop a nonparametric Bayesian approach to statistical inference for the spectral density of a random field. We construct a multi-dimensional Bernstein polynomial prior for the spectral density and devise a Markov chain Monte Carlo algorithm to simulate from the posterior of the spectral density. The posterior sampling enables us to obtain a smoothed estimate of the spectral density as well as credible bands at desired levels. Simulation shows that our proposed method is more robust than a parametric approach. For illustration, we analyse a soil data example. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
238
245
http://hdl.handle.net/10.1093/biomet/asp066
application/pdf
Access to full text is restricted to subscribers.
Yanbing Zheng
Jun Zhu
Anindya Roy
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:159-1702010-03-05RePEc:oup:biomet
article
Mean loglikelihood and higher-order approximations
Higher-order approximations to p-values can be obtained from the loglikelihood function and a reparameterization that can be viewed as a canonical parameter in an exponential family approximation to the model. This approach clarifies the connection between Skovgaard (1996) and Fraser et al. (1999a), and shows that the Skovgaard approximation can be obtained directly using the mean loglikelihood function. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
159
170
http://hdl.handle.net/10.1093/biomet/asq001
application/pdf
Access to full text is restricted to subscribers.
N. Reid
D. A. S. Fraser
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:209-2142010-03-05RePEc:oup:biomet
article
A note on the sensitivity to assumptions of a generalized linear mixed model
A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
209
214
http://hdl.handle.net/10.1093/biomet/asp083
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
M. Y. Wong
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:246-2532010-03-05RePEc:oup:biomet
article
The distribution-based p-value for the outlier sum in differential gene expression analysis
Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
246
253
http://hdl.handle.net/10.1093/biomet/asp075
application/pdf
Access to full text is restricted to subscribers.
Lin-An Chen
Dung-Tsa Chen
Wenyaw Chan
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:223-2302010-03-05RePEc:oup:biomet
article
Global and local spectral-based tests for periodicities
We investigate tests for periodicity based on a spectral analysis of a time series, differentiating between global and local spectral-based tests. Global tests use information across the entire frequency band,whereas local tests are based on a window around the test frequency.We show that many spectral-based tests can be expressed in terms of a regression-based F test, which allows for approximate size and power calculations. Since global tests are usually derived assuming white noise errors, we extend to the correlated noise case. We demonstrate via a Monte Carlo study that although the global test may have better size and power, local tests are easier to use, and are comparable or better in terms of the power to detect periodicities, especially for spectra with a large dynamic range. We apply this methodology to a nonbehavioural test of hearing. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
223
230
http://hdl.handle.net/10.1093/biomet/asp079
application/pdf
Access to full text is restricted to subscribers.
L. Wei
P. F. Craigmile
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:199-2142011-07-12RePEc:oup:biomet
article
The effect of correlation in false discovery rate estimation
The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
199
214
http://hdl.handle.net/10.1093/biomet/asq075
application/pdf
Access to full text is restricted to subscribers.
Armin Schwartzman
Xihong Lin
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:867-8802011-07-12RePEc:oup:biomet
article
A weighted estimating equation approach for inhomogeneous spatial point processes
We introduce a new estimation method for parametric intensity function models of inhomogeneous spatial point processes based on weighted estimating equations. The weights can incorporate information on both inhomogeneity and dependence of the process. Simulations show that significant efficiency gains can be achieved for non-Poisson processes, compared to the Poisson maximum likelihood estimator. An application to tropical forest data illustrates the use of the proposed method. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
867
880
http://hdl.handle.net/10.1093/biomet/asq043
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
Ye Shen
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:133-1462011-07-12RePEc:oup:biomet
article
Partial envelopes for efficient estimation in multivariate linear regression
We introduce the partial envelope model, which leads to a parsimonious method for multivariate linear regression when some of the predictors are of special interest. It has the potential to achieve massive efficiency gains compared with the standard model in the estimation of the coefficients for the selected predictors. The partial envelope model is a variation on the envelope model proposed by Cook et al. (2010) but, as it focuses on part of the predictors, it has looser restrictions and can further improve the efficiency. We develop maximum likelihood estimation for the partial envelope model and discuss applications of the bootstrap. An example is provided to illustrate some of its operating characteristics. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
133
146
http://hdl.handle.net/10.1093/biomet/asq063
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:935-9462011-07-12RePEc:oup:biomet
article
Compound optimal allocation for individual and collective ethics in binary clinical trials
In recent years, several authors have investigated response-adaptive allocation rules for comparative clinical trials, in order to favour, at each stage of the trial, the treatment that appears to be best. In this paper, we define admissible allocations, namely treatment assignments that cannot be simultaneously improved upon with respect to both a specific design criterion, reflecting the inferential properties of the experiment, and the proportion of patients assigned to the best treatment or treatments; we survey existing designs from this viewpoint. We also suggest combining information and ethical considerations by taking a suitable weighted mean of two corresponding standardized criteria, with weights that depend on the actual treatment effects. This compound criterion leads to a locally optimal allocation that can be targeted by some response-adaptive randomization rule. The paper mainly deals with the case of two treatments, but the suggested methodology is shown to extend to more than two. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
935
946
http://hdl.handle.net/10.1093/biomet/asq055
application/pdf
Access to full text is restricted to subscribers.
Alessandro Baldi Antognini
Alessandra Giovagnoli
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:231-2362011-07-12RePEc:oup:biomet
article
A novel reversible jump algorithm for generalized linear models
We propose a novel methodology to construct proposal densities in reversible jump algorithms that obtain samples from parameter subspaces of competing generalized linear models with differing dimensions. The derived proposal densities are not restricted to moves between nested models and are applicable even to models that share no common parameters. We illustrate our methodology on competing logistic regression and log-linear graphical models, demonstrating how our suggested proposal densities, together with the resulting freedom to propose moves between any models, improve the performance of the reversible jump algorithm. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
231
236
http://hdl.handle.net/10.1093/biomet/asq071
application/pdf
Access to full text is restricted to subscribers.
M. Papathomas
P. Dellaportas
V. G. S. Vasdekis
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:905-9202011-07-12RePEc:oup:biomet
article
Penalized high-dimensional empirical likelihood
We propose penalized empirical likelihood for parameter estimation and variable selection for problems with diverging numbers of parameters. Our results are demonstrated for estimating the mean vector in multivariate analysis and regression coefficients in linear models. By using an appropriate penalty function, we showthat penalized empirical likelihood has the oracle property. That is, with probability tending to 1, penalized empirical likelihood identifies the true model and estimates the nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. The advantage of penalized empirical likelihood as a nonparametric likelihood approach is illustrated by testing hypotheses and constructing confidence regions. Numerical simulations confirm our theoretical findings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
905
920
http://hdl.handle.net/10.1093/biomet/asq057
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:851-8652011-07-12RePEc:oup:biomet
article
Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback--Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
851
865
http://hdl.handle.net/10.1093/biomet/asq044
application/pdf
Access to full text is restricted to subscribers.
Abhishek Bhattacharya
David B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:251-2712011-07-12RePEc:oup:biomet
article
False discovery rates and copy number variation
Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumour genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than a million chromosome locations and for hundreds of subjects. This leads to massive data sets and complicated inference problems concerning which locations are more likely to vary. In this paper we consider a relatively simple false discovery rate approach to copy number analysis. More careful parametric change-point methods can then be focused on promising regions of the genome. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
251
271
http://hdl.handle.net/10.1093/biomet/asr018
application/pdf
Access to full text is restricted to subscribers.
Bradley Efron
Nancy R. Zhang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:91-1062011-07-12RePEc:oup:biomet
article
On asymptotic normality and variance estimation for nondifferentiable survey estimators
Survey estimators of population quantities such as distribution functions and quantiles contain nondifferentiable functions of estimated quantities. The theoretical properties of such estimators are substantially more complicated to derive than those of differentiable estimators. In this article, we provide a unified framework for obtaining the asymptotic design-based properties of two common types of nondifferentiable estimators. Estimators of the first type have an explicit expression, while those of the second are defined only as the solution to estimating equations. We propose both analytical and replication-based design-consistent variance estimators for both cases, based on kernel regression. The practical behaviour of the variance estimators is demonstrated in a simulation experiment. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
91
106
http://hdl.handle.net/10.1093/biomet/asq077
application/pdf
Access to full text is restricted to subscribers.
Jianqiang C. Wang
J. D. Opsomer
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:1-152011-07-12RePEc:oup:biomet
article
Joint estimation of multiple graphical models
Gaussian graphical models explore dependence relationships between random variables, through the estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method that jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is included. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/asq060
application/pdf
Access to full text is restricted to subscribers.
Jian Guo
Elizaveta Levina
George Michailidis
Ji Zhu
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:969-9762011-07-12RePEc:oup:biomet
article
Varying coefficient transformation models with censored data
A maximum likelihood method with spline smoothing is proposed for linear transformation models with varying coefficients. The estimation and inference procedures are computationally easy. Under some regularity conditions, the estimators are proved to be consistent and asymptotically normal. A simulation study using the Stanford transplant data is presented to show that the proposed method performs well with a finite sample and is easy to use in practice. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
969
976
http://hdl.handle.net/10.1093/biomet/asq032
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Xingwei Tong
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:985-9892011-07-12RePEc:oup:biomet
article
Some insights into continuum regression and its asymptotic properties
Continuum regression encompasses ordinary least squares regression, partial least squares regression and principal component regression under the same umbrella using a nonnegative parameter Gamma. However, there seems to be no literature discussing the asymptotic properties for arbitrary continuum regression parameter Gamma. This article establishes a relation between continuum regression and sufficient dimension reduction and studies the asymptotic properties of continuum regression for arbitrary Gamma under inverse regression models. Theoretical and simulation results show that the continuum seems unnecessary when the conditional distribution of the predictors given the response follows the multivariate normal distribution. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
985
989
http://hdl.handle.net/10.1093/biomet/asq024
application/pdf
Access to full text is restricted to subscribers.
Xin Chen
R. Dennis Cook
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:237-2422011-07-12RePEc:oup:biomet
article
Recapture models under equality constraints for the conditional capture probabilities
We introduce a general class of capture-recapture models in which capture probabilities depend on capture history. We discuss constrained versions of the saturated model based on equality constraints. Inference can be performed through a simple estimating equation. The approach is illustrated on a dataset concerning Great Copper butterflies in Willamette Valley of Oregon. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asq068
application/pdf
Access to full text is restricted to subscribers.
A. Farcomeni
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:807-8242011-07-12RePEc:oup:biomet
article
Most-predictive design points for functional data predictors
We suggest a way of reducing the very high dimension of a functional predictor, X, to a low number of dimensions chosen so as to give the best predictive performance. Specifically, if X is observed on a fine grid of design points t<sub>1</sub>,…, t<sub>r</sub>, we propose a method for choosing a small subset of these, say t<sub>i<sub>1</sub></sub>,…, t<sub>i<sub>k</sub></sub>, to optimize the prediction of a response variable, Y. The values t<sub>i<sub>j</sub></sub> are referred to as the most predictive design points, or covariates, for a given value of k, and are computed using information contained in a set of independent observations (X<sub>i</sub>, Y<sub>i</sub>) of (X, Y). The algorithm is based on local linear regression, and calculations can be accelerated using linear regression to preselect the design points. Boosting can be employed to further improve the predictive performance. We illustrate the usefulness of our ideas through simulations and examples drawn from chemometrics, and we develop theoretical arguments showing that the methodology can be applied successfully in a range of settings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
807
824
http://hdl.handle.net/10.1093/biomet/asq058
application/pdf
Access to full text is restricted to subscribers.
F. Ferraty
P. Hall
P. Vieu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:325-3402011-07-12RePEc:oup:biomet
article
Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
325
340
http://hdl.handle.net/10.1093/biomet/asq083
application/pdf
Access to full text is restricted to subscribers.
M. H. Maathuis
M. G. Hudgens
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:341-3542011-07-12RePEc:oup:biomet
article
Time-dependent cross ratio estimation for bivariate failure times
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox's partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
341
354
http://hdl.handle.net/10.1093/biomet/asr005
application/pdf
Access to full text is restricted to subscribers.
Tianle Hu
Bin Nan
Xihong Lin
James M. Robins
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:49-632011-07-12RePEc:oup:biomet
article
Bootstrap inference for mean reflection shape and size-and-shape with three-dimensional landmark data
Working within the framework of a multi-dimensional scaling approach to shape analysis, we develop bootstrap methods for inference about mean reflection shape and size-and-shape based on labelled landmark data. The approach is developed in general dimensions though we focus on the three-dimensional case. We consider two pivotal statistics which we use to construct bootstrap confidence regions for the mean reflection shape or size-and-shape, and present simulation results which show that these statistics perform well in a variety of examples. We also suggest regularized versions of the test statistics that are suitable for more challenging cases where sample size is not sufficiently large in relation to the number of landmarks and present numerical results confirming that regularization indeed leads to better performance. An algorithm for producing a graphical representation of the confidence region for the mean reflection shape is presented and applied in an example involving molecular dynamics simulation data. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
49
63
http://hdl.handle.net/10.1093/biomet/asq065
application/pdf
Access to full text is restricted to subscribers.
S. P. Preston
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:961-9682011-07-12RePEc:oup:biomet
article
Probability-based Latin hypercube designs for slid-rectangular regions
Existing space-filling designs are based on the assumption that the experimental region is rectangular, while in practice this assumption can be violated. Motivated by a data centre thermal management study, a class of probability-based Latin hypercube designs is proposed to accommodate a specific type of irregular region. A heuristic algorithm is proposed to search efficiently for optimal designs. Unbiased estimators are proposed, their variances are given and their performances are compared empirically. The proposed method is applied to obtain an optimal sensor placement plan to monitor and study the thermal distribution in a data centre. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
961
968
http://hdl.handle.net/10.1093/biomet/asq051
application/pdf
Access to full text is restricted to subscribers.
Ying Hung
Yasuo Amemiya
Chien-Fu Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:81-902011-07-12RePEc:oup:biomet
article
A self-normalized confidence interval for the mean of a class of nonstationary processes
We construct an asymptotic confidence interval for the mean of a class of nonstationary processes with constant mean and time-varying variances. Due to the large number of unknown parameters, traditional approaches based on consistent estimation of the limiting variance of sample mean through moving block or non-overlapping block methods are not applicable. Under a block-wise asymptotically equal cumulative variance assumption, we propose a self-normalized confidence interval that is robust against the nonstationarity and dependence structure of the data. We also apply the same idea to construct an asymptotic confidence interval for the mean difference of nonstationary processes with piecewise constant means. The proposed methods are illustrated through simulations and an application to global temperature series. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
81
90
http://hdl.handle.net/10.1093/biomet/asq076
application/pdf
Access to full text is restricted to subscribers.
Zhibiao Zhao
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:489-4942011-07-12RePEc:oup:biomet
article
The dimple in Gneiting's spatial-temporal covariance model
Gneiting (2002) proposed a nonseparable covariance model for spatial-temporal data. In the present paper we show that in certain circumstances his model possesses a counterintuitive dimple. In some cases, the magnitude of the dimple can be nontrivial. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
489
494
http://hdl.handle.net/10.1093/biomet/asr006
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Mohsen Mohammadzadeh
Ali M. Mosammam
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:481-4882011-07-12RePEc:oup:biomet
article
On the likelihood function of Gaussian max-stable processes
We derive a closed form expression for the likelihood function of a Gaussian max-stable process indexed by ℝ-super-d at p≤d+1 sites, d≥1. We demonstrate the gain in efficiency in the maximum composite likelihood estimators of the covariance matrix from p=2 to p=3 sites in ℝ-super-2 by means of a Monte Carlo simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asr020
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Yanyuan Ma
Huiyan Sang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:893-9042011-07-12RePEc:oup:biomet
article
Consistent selection of the number of clusters via crossvalidation
In cluster analysis, one of the major challenges is to estimate the number of clusters. Most existing approaches attempt to minimize some distance-based dissimilarity measure within clusters. This article proposes a novel selection criterion that is applicable to all kinds of clustering algorithms, including distance based or non-distance based algorithms. The key idea is to select the number of clusters that minimizes the algorithm's instability, which measures the robustness of any given clustering algorithm against the randomness in sampling.Anovel estimation scheme for clustering instability is developed based on crossvalidation. The proposed selection criterion's effectiveness is demonstrated on a variety of numerical experiments, and its asymptotic selection consistency is established when the dataset is properly split. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asq061
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:177-1862011-07-12RePEc:oup:biomet
article
Nonparametric estimation for length-biased and right-censored data
This paper considers survival data arising from length-biased sampling, where the survival times are left truncated by uniformly distributed random truncation times. We propose a nonparametric estimator that incorporates the information about the length-biased sampling scheme. The new estimator retains the simplicity of the truncation product-limit estimator with a closed-form expression, and has a small efficiency loss compared with the nonparametric maximum likelihood estimator, which requires an iterative algorithm. Moreover, the asymptotic variance of the proposed estimator has a closed form, and a variance estimator is easily obtained by plug-in methods. Numerical simulation studies with practical sample sizes are conducted to compare the performance of the proposed method with its competitors. A data analysis of the Canadian Study of Health and Aging is conducted to illustrate the methods and theory. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
177
186
http://hdl.handle.net/10.1093/biomet/asq069
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:215-2242011-07-12RePEc:oup:biomet
article
Assessing the validity of weighted generalized estimating equations
The inverse probability weighted generalized estimating equations approach (Robins et al. 1994; Robins et al. 1995), effectively removes bias and provides valid statistical inference for regression parameter estimation in marginal models when longitudinal data contain missing values. The validity of the weighted generalized estimating equations regarding consistent estimation depends on whether the underlying missing data process is properly modelled. However, there is little work available to examine whether or not this condition holds. In this paper we propose a test constructed from two sets of estimating equations: one set is known to be unbiased, but the other set is not known. We utilize the quadratic inference function (Qu et al. 2000) method to assess their compatibility, which is equivalent to testing for the validity of the weighted generalized estimating equations approach. We conduct simulation studies to assess the performance of the proposed method. The test procedure is illustrated through a real data example. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
215
224
http://hdl.handle.net/10.1093/biomet/asq078
application/pdf
Access to full text is restricted to subscribers.
A. Qu
G. Y. Yi
P. X.-K. Song
P. Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:371-3802011-07-12RePEc:oup:biomet
article
Sure independence screening and compressed random sensing
Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness structure of random sensing matrices to greatly boost computation speed. When using sub-Gaussian sensing matrices, which include the Gaussian and Bernoulli sensing matrices as special cases, our proposal has the exact recovery property with overwhelming probability. We also consider sparse recovery with noise and explicitly reveal the impact of noise-to-signal ratio on the probability of sure screening. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asr010
application/pdf
Access to full text is restricted to subscribers.
Lingzhou Xue
Hui Zou
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1013-10132011-07-12RePEc:oup:biomet
article
Amendments and Corrections
4
2010
97
Biometrika
1013
1013
http://hdl.handle.net/10.1093/biomet/asq052
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:17-342011-07-12RePEc:oup:biomet
article
The multivariate beta process and an extension of the Polya tree model
We introduce a novel stochastic process that we term the multivariate beta process. The process is defined for modelling-dependent random probabilities and has beta marginal distributions. We use this process to define a probability model for a family of unknown distributions indexed by covariates. The marginal model for each distribution is a Polya tree prior. An important feature of the proposed prior is the easy centring of the nonparametric model around any parametric regression model. We use the model to implement nonparametric inference for survival distributions. The nonparametric model that we introduce can be adopted to extend the support of prior distributions for parametric regression models. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
17
34
http://hdl.handle.net/10.1093/biomet/asq072
application/pdf
Access to full text is restricted to subscribers.
Lorenzo Trippa
Peter Müller
Wesley Johnson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:147-1622011-07-12RePEc:oup:biomet
article
Estimation of covariate effects in generalized linear mixed models with informative cluster sizes
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
147
162
http://hdl.handle.net/10.1093/biomet/asq066
application/pdf
Access to full text is restricted to subscribers.
John M. Neuhaus
Charles E. McCulloch
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:107-1182011-07-12RePEc:oup:biomet
article
Horvitz--Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove, under mild regularity conditions, that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional central limit theorem and obtain asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule, considering a mean variance criterion. These techniques are illustrated by a test population of N=18 902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared with simple random sampling without replacement. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
107
118
http://hdl.handle.net/10.1093/biomet/asq070
application/pdf
Access to full text is restricted to subscribers.
Hervé Cardot
Etienne Josserand
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:187-1982011-07-12RePEc:oup:biomet
article
Variance estimation for generalized Cavalieri estimators
The precision of stereological estimators based on systematic sampling is of great practical importance. This paper presents methods of data-based variance estimation for generalized Cavalieri estimators where errors in sampling positions may occur. Variance estimators are derived under perturbed systematic sampling, systematic sampling with cumulative errors and systematic sampling with random dropouts. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
187
198
http://hdl.handle.net/10.1093/biomet/asq064
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Eva B. Vedel Jensen
Karl-Anton Dorph-Petersen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:773-7892011-07-12RePEc:oup:biomet
article
On the behaviour of marginal and conditional AIC in linear mixed models
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, <sc>aic</sc>, have been used, based either on the marginal or on the conditional distribution. We show that the marginal <sc>aic</sc> is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional <sc>aic</sc>, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional <sc>aic</sc>, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package (R Development Core Team, 2010) is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
773
789
http://hdl.handle.net/10.1093/biomet/asq042
application/pdf
Access to full text is restricted to subscribers.
Sonja Greven
Thomas Kneib
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:997-10012011-07-12RePEc:oup:biomet
article
A note on overadjustment in inverse probability weighted estimation
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asq049
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Lingling Li
Xiaochun Li
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1006-10122011-07-12RePEc:oup:biomet
article
Marginal log-linear parameterization of conditional independence models
Models defined by a set of conditional independence restrictions play an important role in statistical theory and applications, especially, but not only, in graphical modelling. In this paper we identify a subclass of these consisting of hierarchical marginal log-linear models, as defined by Bergsma & Rudas (2002a). Such models are smooth, which implies the applicability of standard asymptotic theory and simplifies interpretation. Furthermore, we give a marginal log-linear parameterization and a minimal specification of the models in the subclass, which implies the applicability of standard methods to compute maximum likelihood estimates and simplifies the calculation of the degrees of freedom of chi-squared statistics to test goodness-of-fit. The utility of the results is illustrated by applying them to block-recursive Markov models associated with chain graphs. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1006
1012
http://hdl.handle.net/10.1093/biomet/asq037
application/pdf
Access to full text is restricted to subscribers.
Tamás Rudas
Wicher P. Bergsma
Renáta Németh
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:947-9602011-07-12RePEc:oup:biomet
article
Enhancing the sample average approximation method with U designs
Many computational problems in statistics can be cast as stochastic programs that are optimization problems whose objective functions are multi-dimensional integrals. The sample average approximation method is widely used for solving such a problem, which first constructs a sampling-based approximation to the objective function and then finds the solution to the approximated problem. Independent and identically distributed sampling is a prevailing choice for constructing such approximations. Recently it was found that the use of Latin hypercube designs can improve sample average approximations. In computer experiments, U designs are known to possess better space-filling properties than Latin hypercube designs. Inspired by this fact, we propose to use U designs to further enhance the accuracy of the sample average approximation method. Theoretical results are derived to show that sample average approximations with U designs can significantly outperform those with Latin hypercube designs. Numerical examples are provided to corroborate the developed theoretical results. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asq046
application/pdf
Access to full text is restricted to subscribers.
Qi Tang
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:881-8922011-07-12RePEc:oup:biomet
article
Bootstrap confidence intervals and hypothesis tests for extrema of parameters
The bootstrap provides effective and accurate methodology for a wide variety of statistical problems which might not otherwise enjoy practicable solutions. However, there still exist important problems where standard bootstrap estimators are not consistent, and where alternative approaches, for example the m-out-of-n bootstrap and asymptotic methods, also face significant challenges. One of these is the problem of constructing confidence intervals or hypothesis tests for extrema of parameters, for example for the maximum of p parameters where each has to be estimated from data. In the present paper we suggest approaches to solving this problem. We use the bootstrap to construct an accurate estimator of the joint distribution of centred parameter estimators, and we base the procedure, either a confidence interval or a hypothesis test, on that distribution estimator. Our methodology is designed so that it errs on the side of conservatism, modulo the small inaccuracy of the bootstrap step. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
881
892
http://hdl.handle.net/10.1093/biomet/asq045
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Hugh Miller
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:977-9842011-07-12RePEc:oup:biomet
article
On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process
The Voronoi estimator may be defined for any location as the inverse of the area of the corresponding Voronoi cell. We investigate the statistical properties of this estimator for the intensity of an inhomogeneous Poisson process, and demonstrate it is approximately unbiased with a gamma sampling distribution. We also introduce the centroidal Voronoi estimator, a simple extension based on spatial regularization of the point pattern. Simulations show the Voronoi estimator has remarkably low bias, while the centroidal Voronoi estimator has slightly more bias but is much less variable. The performance is compared to kernel estimators using two simulated datasets and a dataset consisting of earthquakes within the continental United States. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asq047
application/pdf
Access to full text is restricted to subscribers.
C. D. Barr
F. P. Schoenberg
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:459-4712011-07-12RePEc:oup:biomet
article
On balanced random imputation in surveys
Random imputation methods are often used in practice because they tend to preserve the distribution of the variable being imputed, which is an important property when the goal is to estimate population quantiles. However, this type of imputation method introduces additional variability, the imputation variance, due to the random selection of residuals. In this paper, we propose a class of random balanced imputation methods under which the imputation variance is eliminated while the distribution of the variable being imputed is preserved. The rationale behind balanced imputation is to select residuals at random so that appropriate constraints are satisfied. We describe an algorithm for selecting the random residuals that can be viewed as an adaptation of the cube algorithm proposed in the context of balanced sampling (Deville & Tille, 2004). Results of a simulation study support our findings. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/asr011
application/pdf
Access to full text is restricted to subscribers.
G. Chauvet
J.-C. Deville
D. Haziza
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:381-3902011-07-12RePEc:oup:biomet
article
Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control
Testing a low-dimensional null hypothesis against a high-dimensional alternative in a generalized linear model may lead to a test statistic that is a quadratic form in the residuals under the null model. Using asymptotic arguments, we show that the distribution of such a test statistic can be approximated by a ratio of quadratic forms in normal variables, for which algorithms are readily available. For generalized linear models, the asymptotic distribution shows good control of type I error for moderate to small samples, even when the number of covariates in the model far exceeds the sample size. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
381
390
http://hdl.handle.net/10.1093/biomet/asr016
application/pdf
Access to full text is restricted to subscribers.
Jelle J. Goeman
Hans C. van Houwelingen
Livio Finos
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:355-3702011-07-12RePEc:oup:biomet
article
Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation
For longitudinal data, when the within-subject covariance is misspecified, the semiparametric regression estimator may be inefficient. We propose a method that combines the efficient semiparametric estimator with nonparametric covariance estimation, and is robust against misspecification of covariance models. We show that kernel covariance estimation provides uniformly consistent estimators for the within-subject covariance matrices, and the semiparametric profile estimator with substituted nonparametric covariance is still semiparametrically efficient. The finite sample performance of the proposed estimator is illustrated by simulation. In an application to CD4 count data from an AIDS clinical trial, we extend the proposed method to a functional analysis of the covariance model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/asq080
application/pdf
Access to full text is restricted to subscribers.
Yehua Li
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:921-9342011-07-12RePEc:oup:biomet
article
Estimation of controlled direct effects on a dichotomous outcome using logistic structural direct effect models
We consider the problem of assessing whether an exposure affects a dichotomous outcome other than by modifying a given mediator. The standard approach, logistic regression adjusting for both exposure and the mediator, is known to be biased in the presence of confounders for the mediator-outcome relationship. Because additional regression adjustment for such confounders is only justified when they are not affected by the exposure, inverse probability weighting has been advocated, but is not ideally tailored to mediators that are continuous or have strong measured predictors. We overcome this limitation by developing inference for a novel class of causal models that are closely related to Robins' logistic structural direct effect models, but do not inherit their difficulties of estimation. We study identification and efficient estimation under the assumption that all confounders for the exposure-outcome and mediator-outcome relationships have been measured, and find adequate performance in simulation studies. We discuss extensions to case-control studies and relevant implications for the generic problem of adjustment for time-varying confounding. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
921
934
http://hdl.handle.net/10.1093/biomet/asq053
application/pdf
Access to full text is restricted to subscribers.
Stijn Vansteelandt
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:990-9962011-07-12RePEc:oup:biomet
article
On the equivalence of prospective and retrospective likelihood methods in case-control studies
We present new approaches to analyzing case-control studies using prospective likelihood methods. In the classical framework, we extend the equality of the profile likelihoods to the Barndorff-Nielsen modified profile likelihoods for prospective and retrospective models. This enables simple and accurate approximate conditional inference for stratified case-control studies of moderate stratum size. In the Bayesian framework, we provide sufficient conditions on priors for the prospective model parameters to yield a prospective marginal posterior density equal to its retrospective counterpart. Our results extend the prospective-retrospective equivalence in the Bayesian paradigm with a more general class of priors than has previously been investigated. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
990
996
http://hdl.handle.net/10.1093/biomet/asq054
application/pdf
Access to full text is restricted to subscribers.
Ana-Maria Staicu
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1002-10052011-07-12RePEc:oup:biomet
article
Parameter redundancy with covariates
We show how to determine the parameter redundancy status of a model with covariates from that of the same model without covariates, thereby simplifying the calculation considerably. A matrix decomposition is necessary to ensure that the symbolic computation computer programmes return correct results. The paper is illustrated by mark-recovery and latent-class models, with associated Maple code. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asq041
application/pdf
Access to full text is restricted to subscribers.
Diana J. Cole
Byron J. T. Morgan
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:839-8502011-07-12RePEc:oup:biomet
article
Censored quantile regression with partially functional effects
Quantile regression offers a flexible approach to analyzing survival data, allowing each covariate effect to vary with quantiles. In practice, constancy is often found to be adequate for some covariates. In this paper, we study censored quantile regression tailored to the partially functional effect setting with a mixture of varying and constant effects. Such a model can offer a simpler view regarding covariate-survival association and, moreover, can enable improvement in estimation efficiency. We propose profile estimating equations and present an iterative algorithm that can be readily and stably implemented. Asymptotic properties of the resultant estimators are established. A simple resampling-based inference procedure is developed and justified. Extensive simulation studies demonstrate efficiency gains of the proposed method over a naive two-stage procedure. The proposed method is illustrated via an application to a recent renal dialysis study. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
839
850
http://hdl.handle.net/10.1093/biomet/asq050
application/pdf
Access to full text is restricted to subscribers.
Jing Qian
Limin Peng
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:119-1322011-07-12RePEc:oup:biomet
article
Parametric fractional imputation for missing data analysis
Parametric fractional imputation is proposed as a general tool for missing data analysis. Using fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Computational efficiency can be achieved using the idea of importance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also provides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is discussed and results from a limited simulation study are presented. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asq073
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:35-482011-07-12RePEc:oup:biomet
article
Bayesian geostatistical modelling with informative sampling locations
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
35
48
http://hdl.handle.net/10.1093/biomet/asq067
application/pdf
Access to full text is restricted to subscribers.
D. Pati
B. J. Reich
D. B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:243-2502011-07-12RePEc:oup:biomet
article
Testing a linear time series model against its threshold extension
This paper derives the asymptotic null distribution of a quasilikelihood ratio test statistic for an autoregressive moving average model against its threshold extension. The null hypothesis is that of no threshold, and the error term could be dependent. The asymptotic distribution is rather complicated, and all existing methods for approximating a distribution in the related literature fail to work. Hence, a novel bootstrap approximation based on stochastic permutation is proposed in this paper. Besides being robust to the assumptions on the error term, our method enjoys more flexibility and needs less computation when compared with methods currently used in the literature. Monte Carlo experiments give further support to the new approach, and an illustration is reported. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
243
250
http://hdl.handle.net/10.1093/biomet/asq074
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:403-4162011-07-12RePEc:oup:biomet
article
Maximum smoothed likelihood for multivariate mixtures
We introduce an algorithm for estimating the parameters in a finite mixture of completely unspecified multivariate components in at least three dimensions under the assumption of conditionally independent coordinate dimensions. We prove that this algorithm, based on a majorization-minimization idea, possesses a desirable descent property just as any <sc>em</sc> algorithm does. We discuss the similarities between our algorithm and a related one, the so-called nonlinearly smoothed <sc>em</sc> algorithm for the non-mixture setting. We also demonstrate via simulation studies that the new algorithm gives very similar results to another algorithm that has been shown empirically to be effective but that does not satisfy any descent property. We provide code for implementing the new algorithm in a publicly available R package. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
403
416
http://hdl.handle.net/10.1093/biomet/asq079
application/pdf
Access to full text is restricted to subscribers.
M. Levine
D. R. Hunter
D. Chauveau
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:163-1752011-07-12RePEc:oup:biomet
article
A unified framework for studying parameter identifiability and estimation in biased sampling designs
Based on the odds ratio representation of a joint density, we propose a unified framework to study parameter identifiability in biased sampling designs. It is shown that most of these designs encountered in practice can be reformulated within the proposed framework and, as a result, the question of parameter identifiability can be largely clarified. Estimation of the identifiable parameters is considered and traditional results on the equivalence of the prospective and retrospective likelihoods are extended. Information contained in data on certain identifiable parameters is often very limited. Such parameters can be poorly estimated by the likelihood approach with practically attainable sample sizes, which can substantially affect the estimates of parameters of primary interest. A partially penalized likelihood approach is proposed to address this. Simulation results suggest that the proposed approach has good performance. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
163
175
http://hdl.handle.net/10.1093/biomet/asq059
application/pdf
Access to full text is restricted to subscribers.
Hua Yun Chen
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:65-802011-07-12RePEc:oup:biomet
article
Particle approximations of the score and observed information matrix in state space models with application to parameter estimation
Particle methods are popular computational tools for Bayesian inference in nonlinear non-Gaussian state space models. For this class of models, we present two particle algorithms to compute the score vector and observed information matrix recursively. The first algorithm is implemented with computational complexity <inline-formula><inline-graphic xlink:href="ASQ062IM1" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the second with complexity <inline-formula><inline-graphic xlink:href="ASQ062IM2" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where N is the number of particles. Although cheaper, the performance of the <inline-formula><inline-graphic xlink:href="ASQ062IM3" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method degrades quickly, as it relies on the approximation of a sequence of probability distributions whose dimension increases linearly with time. In particular, even under strong mixing assumptions, the variance of the estimates computed with the <inline-formula><inline-graphic xlink:href="ASQ062IM4" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method increases at least quadratically in time. The more expensive <inline-formula><inline-graphic xlink:href="ASQ062IM5" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> method relies on a nonstandard particle implementation and does not suffer from this rapid degradation. It is shown how both methods can be used to perform batch and recursive parameter estimation. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
65
80
http://hdl.handle.net/10.1093/biomet/asq062
application/pdf
Access to full text is restricted to subscribers.
George Poyiadjis
Arnaud Doucet
Sumeetpal S. Singh
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:449-4582011-07-12RePEc:oup:biomet
article
Optimal design for additive partially nonlinear models
We develop optimal design theory for additive partially nonlinear regression models, showing that Bayesian and standardized maximin D-optimal designs can be found as the products of the corresponding optimal designs in one dimension. A sufficient condition under which analogous results hold for D<sub>s</sub>-optimality is derived to accommodate situations in which only a subset of the model parameters is of interest. To facilitate prediction of the response at unobserved locations, we prove similar results for Q-optimality in the class of all product designs. The usefulness of this approach is demonstrated through an application from the automotive industry, where optimal designs for least squares regression splines are determined and compared with designs commonly used in practice. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
449
458
http://hdl.handle.net/10.1093/biomet/asr001
application/pdf
Access to full text is restricted to subscribers.
S. Biedermann
H. Dette
D. C. Woods
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:391-4012011-07-12RePEc:oup:biomet
article
The union closure method for testing a fixed sequence of families of hypotheses
Statistical analyses often involve testing multiple hypotheses that are naturally grouped into a fixed sequence of families. An effective approach to control the familywise error rate is to prioritize the importance of prespecification in the testing order. A gatekeeping testing procedure examines the first family with no multiple adjustment and then examines the subsequent family depending on the decision made with respect to the previous one. In this paper, we describe the union closure method that can be used to design gatekeeping procedures. A bipolar disorder trial with three primary and two secondary outcomes is presented as an example. Power comparisons based on the bipolar disorder trial show that the proposed gatekeeping procedures under the union closure framework are more powerful than competing methods. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
391
401
http://hdl.handle.net/10.1093/biomet/asr015
application/pdf
Access to full text is restricted to subscribers.
Han-Joo Kim
A. Richard Entsuah
Justine Shults
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:433-4482011-07-12RePEc:oup:biomet
article
Maximum likelihood estimation of a generalized threshold stochastic regression model
There is hardly any literature on modelling nonlinear dynamic relations involving nonnormal time series data. This is a serious lacuna because nonnormal data are far more abundant than normal ones, for example, time series of counts and positive time series. While there are various forms of nonlinearities, the class of piecewise-linear models is particularly appealing for its relative ease of tractability and interpretation. We propose to study the generalized threshold model which specifies that the conditional probability distribution of the response variable belongs to an exponential family, and the conditional mean response is linked to some piecewise-linear stochastic regression function. We introduce a likelihood-based estimation scheme, and the consistency and limiting distribution of the maximum likelihood estimator are derived. We illustrate the proposed approach with an analysis of a hare abundance time series, which gives new insights on how phase-dependent predator-prey-climate interactions shaped the ten-year hare population cycle. A simulation study is conducted to examine the finite-sample performance of the proposed estimation method. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
433
448
http://hdl.handle.net/10.1093/biomet/asr008
application/pdf
Access to full text is restricted to subscribers.
Noelle I. Samia
Kung-Sik Chan
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:417-4312011-07-12RePEc:oup:biomet
article
Distribution estimators and confidence intervals for stereological volumes
Assessing the precision of volume estimates from systematic samples is a question of great practical importance, but statistically a challenging task due to the strong spatial dependence of the data and typically small sample sizes. The approach taken in this paper is more ambitious than earlier methodologies, the goal of which was estimation of the variance of a volume estimator v̂, rather than estimation of the distribution of v̂. We shall show that bootstrap methods yield consistent estimators of the distribution of v̂, and also suggest a variety of confidence intervals for the true volume. Our new methodology covers cases where serial sections are exactly periodic, as well as instances where the physical slicing procedure introduces errors in the placement of the sampling points. Measurement errors within sections are also taken into account. The performance of the method is illustrated by a simulation study with synthetic data, and also applied to real datasets. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
417
431
http://hdl.handle.net/10.1093/biomet/asr012
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Johanna Ziegel
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:291-3062011-07-12RePEc:oup:biomet
article
Sparse Bayesian infinite factor models
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
291
306
http://hdl.handle.net/10.1093/biomet/asr013
application/pdf
Access to full text is restricted to subscribers.
A. Bhattacharya
D. B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:273-2902011-07-12RePEc:oup:biomet
article
Sample size and power analysis for sparse signal recovery in genome-wide association studies
Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondisease-associated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
273
290
http://hdl.handle.net/10.1093/biomet/asr003
application/pdf
Access to full text is restricted to subscribers.
Jichun Xie
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:307-3232011-07-12RePEc:oup:biomet
article
Bayesian influence analysis: a geometric approach
In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
307
323
http://hdl.handle.net/10.1093/biomet/asr009
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:225-2302011-07-12RePEc:oup:biomet
article
Data-driven selection of the spline dimension in penalized spline regression
A number of criteria exist to select the penalty in penalized spline regression, but the selection of the number of spline basis functions has received much less attention in the literature. We propose a likelihood-based criterion to select the number of basis functions in penalized spline regression. The criterion is easy to apply and we describe its theoretical and practical properties. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
225
230
http://hdl.handle.net/10.1093/biomet/asq081
application/pdf
Access to full text is restricted to subscribers.
Göran Kauermann
Jean D. Opsomer
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:825-8382011-07-12RePEc:oup:biomet
article
Noncrossing quantile regression curve estimation
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
825
838
http://hdl.handle.net/10.1093/biomet/asq048
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
Brian J. Reich
Huixia Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:495-5012011-07-12RePEc:oup:biomet
article
An Akaike-type information criterion for model selection under inequality constraints
The Akaike information criterion for model selection presupposes that the parameter space is not subject to order restrictions or inequality constraints. Anraku (1999) proposed a modified version of this criterion, called the order-restricted information criterion, for model selection in the one-way analysis of variance model when the population means are monotonic. We propose a generalization of this to the case when the population means may be restricted by a mixture of linear equality and inequality constraints. If the model has no inequality constraints, then the generalized order-restricted information criterion coincides with the Akaike information criterion. Thus, the former extends the applicability of the latter to model selection in multi-way analysis of variance models when some models may have inequality constraints while others may not. Simulation shows that the information criterion proposed in this paper performs well in selecting the correct model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
495
501
http://hdl.handle.net/10.1093/biomet/asr002
application/pdf
Access to full text is restricted to subscribers.
R. M. Kuiper
H. Hoijtink
M. J. Silvapulle
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:791-8052011-07-12RePEc:oup:biomet
article
Additive modelling of functional gradients
We consider the problem of estimating functional derivatives and gradients in the framework of a regression setting where one observes functional predictors and scalar responses. Derivatives are then defined as functional directional derivatives that indicate how changes in the predictor function in a specified functional direction are associated with corresponding changes in the scalar response. For a model-free approach, navigating the curse of dimensionality requires the imposition of suitable structural constraints. Accordingly, we develop functional derivative estimation within an additive regression framework. Here, the additive components of functional derivatives correspond to derivatives of nonparametric one-dimensional regression functions with the functional principal components of predictor processes as arguments. This approach requires nothing more than estimating derivatives of one-dimensional nonparametric regressions, and thus is computationally very straightforward to implement, while it also provides substantial flexibility, fast computation and consistent estimation. We illustrate the consistent estimation and interpretation of the resulting functional derivatives and functional gradient fields in a study of the dependence of lifetime fertility of flies on early life reproductive trajectories. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
791
805
http://hdl.handle.net/10.1093/biomet/asq056
application/pdf
Access to full text is restricted to subscribers.
Hans-Georg Müller
Fang Yao
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:473-4802011-07-12RePEc:oup:biomet
article
Empirical likelihood for small area estimation
Current methodologies in small area estimation are mostly either parametric or heavily dependent on the assumed linearity of the estimators of the small area means. We discuss an alternative empirical likelihood-based Bayesian approach, which neither requires a parametric likelihood nor assumes linearity of the estimators, and can handle both discrete and continuous data in a unified manner. Empirical likelihoods for both area- and unit-level models are introduced. We discuss the suitability of the proposed likelihoods in Bayesian inference and illustrate their performances on a real dataset and a simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr004
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Malay Ghosh
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:305-3192010-09-29RePEc:oup:biomet
article
Semiparametric dimension reduction estimation for mean response with missing data
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
305
319
http://hdl.handle.net/10.1093/biomet/asq005
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Jing Qin
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:631-6452010-09-29RePEc:oup:biomet
article
Detecting simultaneous changepoints in multiple sequences
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
631
645
http://hdl.handle.net/10.1093/biomet/asq025
application/pdf
Access to full text is restricted to subscribers.
Nancy R. Zhang
David O. Siegmund
Hanlee Ji
Jun Z. Li
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:551-5662010-09-29RePEc:oup:biomet
article
Penalized Bregman divergence for large-dimensional regression and classification
Regularization methods are characterized by loss functions measuring data fits and penalty terms constraining model parameters. The commonly used quadratic loss is not suitable for classification with binary responses, whereas the loglikelihood function is not readily applicable to models where the exact distribution of observations is unknown or not fully specified. We introduce the penalized Bregman divergence by replacing the negative loglikelihood in the conventional penalized likelihood with Bregman divergence, which encompasses many commonly used loss functions in the regression analysis, classification procedures and machine learning literature. We investigate new statistical properties of the resulting class of estimators with the number p<sub>n</sub> of parameters either diverging with the sample size n or even nearly comparable with n, and develop statistical inference tools. It is shown that the resulting penalized estimator, combined with appropriate penalties, achieves the same oracle property as the penalized likelihood estimator, but asymptotically does not rely on the complete specification of the underlying distribution. Furthermore, the choice of loss function in the penalized classifiers has an asymptotically relatively negligible impact on classification performance. We illustrate the proposed method for quasilikelihood regression and binary classification with simulation evaluation and real-data application. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
551
566
http://hdl.handle.net/10.1093/biomet/asq033
application/pdf
Access to full text is restricted to subscribers.
Chunming Zhang
Yuan Jiang
Yi Chai
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:621-6302010-09-29RePEc:oup:biomet
article
Accurate and robust tests for indirect inference
In this paper we propose accurate parameter and over-identification tests for indirect inference. Under the null hypothesis the new tests are asymptotically χ-super-2-distributed with a relative error of order n-super- - 1. They exhibit better finite sample accuracy than classical tests for indirect inference, which have the same asymptotic distribution but an absolute error of order n-super- - 1-2. Robust versions of the tests are also provided. We illustrate their accuracy in nonlinear regression, Poisson regression with overdispersion and diffusion models. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
621
630
http://hdl.handle.net/10.1093/biomet/asq040
application/pdf
Access to full text is restricted to subscribers.
Veronika Czellar
Elvezio Ronchetti
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:405-4182010-09-29RePEc:oup:biomet
article
Interval estimation for drop-the-losers designs
In the first stage of a two-stage, drop-the-losers design, a candidate for the best treatment is selected. At the second stage, additional observations are collected to decide whether the candidate is actually better than the control. The design also allows the investigator to stop the trial for ethical reasons at the end of the first stage if there is already strong evidence of futility or superiority. Two types of tests have recently been developed, one based on the combined means and the other based on the combined p-values, but corresponding interval estimators are unavailable except in special cases. The problem is that, in most cases, the interval estimators depend on the mean configuration of all treatments in the first stage, which is unknown. In this paper, we prove a basic stochastic ordering lemma that enables us to bridge the gap between hypothesis testing and interval estimation. The proposed confidence intervals achieve the nominal confidence level in certain special cases. Simulations show that decisions based on our intervals are usually more powerful than those based on existing methods. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
405
418
http://hdl.handle.net/10.1093/biomet/asq003
application/pdf
Access to full text is restricted to subscribers.
Samuel S. Wu
Weizhen Wang
Mark C. K. Yang
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:513-5182010-09-29RePEc:oup:biomet
article
Optimal designs for the emax, log-linear and exponential models
We derive locally D- and ED<sub>p</sub>-optimal designs for the exponential, log-linear and three-parameter emax models. For each model the locally D- and ED<sub>p</sub>-optimal designs are supported at the same set of points, while the corresponding weights are different. This indicates that for a given model, D-optimal designs are efficient for estimating the smallest dose that achieves 100p% of the maximum effect in the observed dose range. Conversely, ED<sub>p</sub>-optimal designs also yield good D-efficiencies. We illustrate the results using several examples and demonstrate that locally D- and ED<sub>p</sub>-optimal designs for the emax, log-linear and exponential models are relatively robust with respect to misspecification of the model parameters. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
513
518
http://hdl.handle.net/10.1093/biomet/asq020
application/pdf
Access to full text is restricted to subscribers.
H. Dette
C. Kiss
M. Bevanda
F. Bretz
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:603-6202010-09-29RePEc:oup:biomet
article
On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, <inline-formula><inline-graphic xlink:href="asq031ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, <inline-formula><inline-graphic xlink:href="asq031ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, based on the pseudolikelihood <inline-formula><inline-graphic xlink:href="asq031ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asq031ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> is a consistent estimator of <inline-formula><inline-graphic xlink:href="asq031ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, the nuisance parameter. We show that the asymptotic distribution of T under H<sub>0</sub> is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, <inline-formula><inline-graphic xlink:href="asq031ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, or the true value of the nuisance parameter, <inline-formula><inline-graphic xlink:href="asq031ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
603
620
http://hdl.handle.net/10.1093/biomet/asq031
application/pdf
Access to full text is restricted to subscribers.
Yong Chen
Kung-Yee Liang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:765-7722010-09-29RePEc:oup:biomet
article
Strictly stationary solutions of autoregressive moving average equations
Necessary and sufficient conditions for the existence of a strictly stationary solution of the equations defining an autoregressive moving average process driven by an independent and identically distributed noise sequence are determined. No moment assumptions on the driving noise sequence are made. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
765
772
http://hdl.handle.net/10.1093/biomet/asq034
application/pdf
Access to full text is restricted to subscribers.
Peter J. Brockwell
Alexander Lindner
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:447-4642010-09-29RePEc:oup:biomet
article
A sequential smoothing algorithm with linear computational cost
In this paper we propose a new particle smoother that has a computational complexity of O(N), where N is the number of particles. This compares favourably with the O(N-super-2) computational cost of most smoothers. The new method also overcomes some degeneracy problems in existing algorithms. Through simulation studies we show that substantial gains in efficiency are obtained for practical amounts of computational cost. It is shown both through these simulation studies, and by the analysis of an athletics dataset, that our new method also substantially outperforms the simple filter-smoother, the only other smoother with computational cost that is O(N). Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
447
464
http://hdl.handle.net/10.1093/biomet/asq013
application/pdf
Access to full text is restricted to subscribers.
Paul Fearnhead
David Wyncoll
Jonathan Tawn
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:741-7552010-09-29RePEc:oup:biomet
article
Properties of nested sampling
Nested sampling is a simulation method for approximating marginal likelihoods. We establish that nested sampling has an approximation error that vanishes at the standard Monte Carlo rate and that this error is asymptotically Gaussian. It is shown that the asymptotic variance of the nested sampling approximation typically grows linearly with the dimension of the parameter. We discuss the applicability and efficiency of nested sampling in realistic problems, and compare it with two current methods for computing marginal likelihood. Finally, we propose an extension that avoids resorting to Markov chain Monte Carlo simulation to obtain the simulated points. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/asq021
application/pdf
Access to full text is restricted to subscribers.
Nicolas Chopin
Christian P. Robert
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:567-5842010-09-29RePEc:oup:biomet
article
Shape curves and geodesic modelling
A family of shape curves is introduced that is useful for modelling the changes in shape in a series of geometrical objects. The relationship between the preshape sphere and the shape space is used to define a general family of curves based on horizontal geodesics on the preshape sphere. Methods for fitting geodesics and more general curves in the non-Euclidean shape space of point sets are discussed, based on minimizing sums of squares of Procrustes distances. Likelihood-based inference is considered. We illustrate the ideas by carrying out statistical analysis of two-dimensional landmarks on rats' skulls at various times in their development and three-dimensional landmarks on lumbar vertebrae from three primate species. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
567
584
http://hdl.handle.net/10.1093/biomet/asq027
application/pdf
Access to full text is restricted to subscribers.
Kim Kenobi
Ian L. Dryden
Huiling Le
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:321-3322010-09-29RePEc:oup:biomet
article
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
321
332
http://hdl.handle.net/10.1093/biomet/asq006
application/pdf
Access to full text is restricted to subscribers.
D. Y. Lin
D. Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:333-3452010-09-29RePEc:oup:biomet
article
Evidence factors in observational studies
Some experiments involve more than one random assignment of treatments to units. An analogous situation arises in certain observational studies, although randomization is not used, so each assignment may be biased. If each assignment is suspect, it is natural to ask whether there are separate pieces of information, dependent upon different assumptions, and perhaps whether conclusions about treatment effects are not critically dependent upon one or another suspect assumption. The design of an observational study contains evidence factors if it permits several statistically independent tests of the same null hypothesis about treatment effects, where these tests rely on different assumptions about treatment assignments at several levels of assignment. Two designs and two empirical examples are considered, one example of each design. In the dose-control design, there are matched pairs of a treated subject and an untreated control, and doses of treatment vary between pairs for treated subjects; this yields two evidence factors. In the varied intensity design, there are matched sets with two treated subjects and one or more untreated controls, where the two treated subjects within the same matched set receive different doses of treatment, and in a technically different way, the design yields two evidence factors. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
333
345
http://hdl.handle.net/10.1093/biomet/asq019
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:497-5042010-09-29RePEc:oup:biomet
article
Objective Bayes and conditional inference in exponential families
Objective Bayes methodology is considered for conditional frequentist inference about a canonical parameter in a multi-parameter exponential family. A condition is derived under which posterior Bayes quantiles match the conditional frequentist coverage to a higher-order approximation in terms of the sample size. This condition is on the model, not on the prior, and it ensures that any first-order probability matching prior in the unconditional sense automatically yields higher-order conditional probability matching. Objective Bayes methods are compared to parametric bootstrap and analytic methods for higher-order conditional frequentist inference. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
497
504
http://hdl.handle.net/10.1093/biomet/asq002
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:757-7642010-09-29RePEc:oup:biomet
article
Empirical likelihood methods for two-dimensional shape analysis
We consider empirical likelihood for the mean similarity shape of objects in two dimensions described by labelled landmarks. The restriction to two dimensions permits the representation of preshapes as complex unit vectors. We focus on the use of empirical likelihood techniques for the construction of confidence regions for the mean shape and for testing the hypothesis of a common mean shape across several populations. Theoretical properties and computational details are discussed and the results of a simulation study are presented. Our results show that bootstrap calibrated empirical likelihood performs well in practice in the planar shape setting. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
757
764
http://hdl.handle.net/10.1093/biomet/asq028
application/pdf
Access to full text is restricted to subscribers.
Getulio J. A. Amaral
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:683-6982010-09-29RePEc:oup:biomet
article
Analysis of cohort studies with multivariate and partially observed disease classification data
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
683
698
http://hdl.handle.net/10.1093/biomet/asq036
application/pdf
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Samiran Sinha
W. Ryan Diver
Heather Spencer Feigelson
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:419-4332010-09-29RePEc:oup:biomet
article
Efficient scalable schemes for monitoring a large number of data streams
The sequential changepoint detection problem is studied in the context of global online monitoring of a large number of independent data streams. We are interested in detecting an occurring event as soon as possible, but we do not know when the event will occur, nor do we know which subset of data streams will be affected by the event. A family of scalable schemes is proposed based on the sum of the local cumulative sum, <sc>cusum</sc>, statistics from each individual data stream, and is shown to asymptotically minimize the detection delays for each and every possible combination of affected data streams, subject to the global false alarm constraint. The usefulness and limitations of our asymptotic optimality results are illustrated by numerical simulations and heuristic arguments. The Appendices contain a probabilistic result on the first epoch to simultaneous record values for multiple independent random walks. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
419
433
http://hdl.handle.net/10.1093/biomet/asq010
application/pdf
Access to full text is restricted to subscribers.
Y. Mei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:585-6012010-09-29RePEc:oup:biomet
article
A class of grouped Brunk estimators and penalized spline estimators for monotone regression
We study a class of monotone univariate regression estimators. We use B-splines to approximate an underlying regression function and estimate spline coefficients based on grouped data. We investigate asymptotic properties of two monotone estimators: a grouped Brunk estimator and a penalized monotone estimator. These estimators are consistent at the boundary and their mean square errors achieve optimal convergence rates under suitable assumptions of the true regression function. Asymptotic distributions are developed and are shown to be independent of spline degrees and the number of knots. Simulation results and car data illustrate performance of the proposed estimators. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asq029
application/pdf
Access to full text is restricted to subscribers.
Xiao Wang
Jinglai Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:435-4462010-09-29RePEc:oup:biomet
article
Estimating linear dependence between nonstationary time series using the locally stationary wavelet model
Large volumes of neuroscience data comprise multiple, nonstationary electrophysiological or neuroimaging time series recorded from different brain regions. Accurately estimating the dependence between such neural time series is critical, since changes in the dependence structure are presumed to reflect functional interactions between neuronal populations. We propose a new dependence measure, derived from a bivariate locally stationary wavelet time series model. Since wavelets are localized in both time and scale, this approach leads to a natural, local and multi-scale estimate of nonstationary dependence. Our methodology is illustrated by application to a simulated example, and to electrophysiological data relating to interactions between the rat hippocampus and prefrontal cortex during working memory and decision making. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
435
446
http://hdl.handle.net/10.1093/biomet/asq007
application/pdf
Access to full text is restricted to subscribers.
J. Sanderson
P. Fryzlewicz
M. W. Jones
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:727-7402010-09-29RePEc:oup:biomet
article
Estimating species richness by a Poisson-compound gamma model
We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/asq026
application/pdf
Access to full text is restricted to subscribers.
Ji-Ping Wang
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:361-3742010-09-29RePEc:oup:biomet
article
Efficient estimation in multi-phase case-control studies
In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979), work by Breslow & Cain (1988), Scott & Wild (1991), Breslow & Holubkov (1997) and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations used in a Newton--Raphson or Fisher-scoring maximization of the prospective loglikelihood. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
361
374
http://hdl.handle.net/10.1093/biomet/asq009
application/pdf
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:347-3602010-09-29RePEc:oup:biomet
article
A theory for testing hypotheses under covariate-adaptive randomization
The covariate-adaptive randomization method was proposed for clinical trials long ago but little theoretical work has been done for statistical inference associated with it. Practitioners often apply test procedures available for simple randomization, which is controversial since procedures valid under simple randomization may not be valid under other randomization schemes. In this paper, we provide some theoretical results for testing hypotheses after covariate-adaptive randomization. We show that one way to obtain a valid test procedure is to use a correct model between outcomes and covariates, including those used in randomization. We also show that the simple two sample t-test, without using any covariate, is conservative under covariate-adaptive biased coin randomization in terms of its Type I error, and that a valid bootstrap t-test can be constructed. The powers of several tests are examined theoretically and empirically. Our study provides guidance for applications and sheds light on further research in this area. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
347
360
http://hdl.handle.net/10.1093/biomet/asq014
application/pdf
Access to full text is restricted to subscribers.
Jun Shao
Xinxin Yu
Bob Zhong
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:505-5122010-09-29RePEc:oup:biomet
article
Copula inference under censoring
This paper discusses copula model selection procedures and goodness-of-fit tests under censoring. The proposed methodology is based on a comparison of nonparametric and model-based estimators of the probability integral transformation, K. New weighted estimators for K are introduced. The resulting tests are compared to an existing approach by simulation and illustrated with an example involving bleeding changes in a woman's reproductive history. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
505
512
http://hdl.handle.net/10.1093/biomet/asq011
application/pdf
Access to full text is restricted to subscribers.
M. L. Lakhal-Chaieb
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:647-6592010-09-29RePEc:oup:biomet
article
Sufficient cause interactions for categorical and ordinal exposures with three levels
Definitions are given for weak and strong sufficient cause interactions in settings in which the outcome is binary and in which there are two exposures of interest that are categorical or ordinal. Weak sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes some other value. Strong sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes any other value. Empirical conditions are derived for such interactions when exposures have two or three levels and are related to regression coefficients in linear and log-linear models. When the exposures are binary, the notions of a weak and a strong sufficient cause interaction coincide, but not when the exposures are categorical or ordinal. The results are applied to examples concerning gene-gene and gene-environment interactions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asq030
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:539-5502010-09-29RePEc:oup:biomet
article
A new approach to Cholesky-based covariance regularization in high dimensions
In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well-known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretation always results in a positive definite estimator. In particular, one can obtain a positive definite banded estimator of the covariance matrix at the same computational cost as the popular banded estimator of Bickel & Levina (2008b), which is not guaranteed to be positive definite. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint, and compare the numerical performance of several methods in simulations and on a sonar data example. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
539
550
http://hdl.handle.net/10.1093/biomet/asq022
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
Elizaveta Levina
Ji Zhu
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:261-2782010-09-29RePEc:oup:biomet
article
Variable selection in high-dimensional linear models: partially faithful distributions and the <sc>pc</sc>-simple algorithm
We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the <sc>pc</sc> algorithm (Spirtes et al., 2000), which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
261
278
http://hdl.handle.net/10.1093/biomet/asq008
application/pdf
Access to full text is restricted to subscribers.
P. Bühlmann
M. Kalisch
M. H. Maathuis
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:389-4042010-09-29RePEc:oup:biomet
article
Calibrating parametric subject-specific risk estimation
For modern evidence-based medicine, decisions on disease prevention or management strategies are often guided by a risk index system. For each individual, the system uses his/her baseline information to estimate the risk of experiencing a future disease-related clinical event. Such a risk scoring scheme is usually derived from an overly simplified parametric model. To validate a model-based procedure, one may perform a standard global evaluation via, for instance, a receiver operating characteristic analysis. In this article, we propose a method to calibrate the risk index system at a subject level. Specifically, we developed point and interval estimation procedures for t-year mortality rates conditional on the estimated parametric risk score. The proposals are illustrated with a dataset from a large clinical trial with post-myocardial infarction patients. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
389
404
http://hdl.handle.net/10.1093/biomet/asq012
application/pdf
Access to full text is restricted to subscribers.
T. Cai
L. Tian
Hajime Uno
Scott D. Solomon
L. J. Wei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:699-7122010-09-29RePEc:oup:biomet
article
A semiparametric additive rate model for recurrent events with an informative terminal event
We propose a semiparametric additive rate model for modelling recurrent events in the presence of a terminal event. The dependence between recurrent events and terminal event is nonparametric. A general transformation model is used to model the terminal event. We construct an estimating equation for parameter estimation and derive the asymptotic distributions of the proposed estimators. Simulation studies demonstrate that the proposed inference procedure performs well in realistic settings. Application to a medical study is presented. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
699
712
http://hdl.handle.net/10.1093/biomet/asq039
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Jianwen Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:375-3882010-09-29RePEc:oup:biomet
article
Risk-adjusted monitoring of time to event
Recently there has been interest in risk-adjusted cumulative sum charts, <sc>CUSUMs</sc>, to monitor the performance of e.g. hospitals, taking into account the heterogeneity of patients. Even though many outcomes involve time, only conventional regression models are commonly used. In this article we investigate how time to event models may be used for monitoring purposes. We consider monitoring using <sc>CUSUMs</sc> based on the partial likelihood ratio between an out-of-control state and an in-control state. We consider both proportional and non-proportional alternatives, as well as a head start. Against proportional alternatives, we present an analytic method of computing the expected number of observed events before stopping or the probability of stopping before a given observed number of events. In a stationary set-up, the former is roughly proportional to the average run length in calendar time. Adding a head start changes the threshold only slightly if the expected number of events until hitting is used as a criterion. However, it changes the threshold substantially if a false alarm probability is used. In simulation studies, charts based on survival analysis perform better than simpler monitoring schemes. We present one example from retail finance and one medical application. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
375
388
http://hdl.handle.net/10.1093/biomet/asq004
application/pdf
Access to full text is restricted to subscribers.
A. Gandy
J. T. Kvaløy
A. Bottle
F. Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:465-4802010-09-29RePEc:oup:biomet
article
The horseshoe estimator for sparse signals
This paper proposes a new approach to sparsity, called the horseshoe estimator, which arises from a prior based on multivariate-normal scale mixtures. We describe the estimator's advantages over existing approaches, including its robustness, adaptivity to different sparsity patterns and analytical tractability. We prove two theorems: one that characterizes the horseshoe estimator's tail robustness and the other that demonstrates a super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using both real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers obtained by Bayesian model averaging under a point-mass mixture prior. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
465
480
http://hdl.handle.net/10.1093/biomet/asq017
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Nicholas G. Polson
James G. Scott
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:295-3042010-09-29RePEc:oup:biomet
article
Sufficient dimension reduction through discretization-expectation estimation
In the context of sufficient dimension reduction, the goal is to parsimoniously recover the central subspace of a regression model. Many inverse regression methods use slicing estimation to recover the central subspace. The efficacy of slicing estimation depends heavily upon the number of slices. However, the selection of the number of slices is an open and long-standing problem. In this paper, we propose a discretization-expectation estimation method, which avoids selecting the number of slices, while preserving the integrity of the central subspace. This generic method assures root-n consistency and asymptotic normality of slicing estimators for many inverse regression methods, and can be applied to regressions with multivariate responses. A <sc>BIC</sc>-type criterion for the dimension of the central subspace is proposed. Comprehensive simulations and an illustrative application show that our method compares favourably with existing estimators. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
295
304
http://hdl.handle.net/10.1093/biomet/asq018
application/pdf
Access to full text is restricted to subscribers.
Liping Zhu
Tao Wang
Lixing Zhu
Louis Ferré
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:661-6822010-09-29RePEc:oup:biomet
article
Bounded, efficient and doubly robust estimation with inverse weighting
Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on a previous nonparametric likelihood approach and propose new doubly robust estimators, which have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare the new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
661
682
http://hdl.handle.net/10.1093/biomet/asq035
application/pdf
Access to full text is restricted to subscribers.
Zhiqiang Tan
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:519-5382010-09-29RePEc:oup:biomet
article
Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
519
538
http://hdl.handle.net/10.1093/biomet/asq038
application/pdf
Access to full text is restricted to subscribers.
Ali Shojaie
George Michailidis
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:481-4962010-09-29RePEc:oup:biomet
article
Likelihood ratio statistics based on an integrated likelihood
An integrated likelihood depends only on the parameter of interest and the data, so it can be used as a standard likelihood function for likelihood-based inference. In this paper, the higher-order asymptotic properties of the signed integrated likelihood ratio statistic for a scalar parameter of interest are considered. These results are used to construct a modified integrated likelihood ratio statistic and to suggest a class of prior densities to use in forming the integrated likelihood. The properties of the integrated likelihood ratio statistic are compared to those of the standard likelihood ratio statistic. Several examples show that the integrated likelihood ratio statistic can be a useful alternative to the standard likelihood ratio statistic. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
481
496
http://hdl.handle.net/10.1093/biomet/asq015
application/pdf
Access to full text is restricted to subscribers.
T. A. Severini
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:279-2942010-09-29RePEc:oup:biomet
article
Dimension reduction for non-elliptically distributed predictors: second-order methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify first-order methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to second-order methods, such as sliced average variance estimation and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment-based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asq016
application/pdf
Access to full text is restricted to subscribers.
Yuexiao Dong
Bing Li
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:713-7262010-09-29RePEc:oup:biomet
article
Attributable fraction functions for censored event times
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
713
726
http://hdl.handle.net/10.1093/biomet/asq023
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:71-842012-05-01RePEc:oup:biomet
article
Optimal fractions of two-level factorials under a baseline parameterization
Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/asr071
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:15-282012-05-01RePEc:oup:biomet
article
Factor profiled sure independence screening
We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
15
28
http://hdl.handle.net/10.1093/biomet/asr074
application/pdf
Access to full text is restricted to subscribers.
H. Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:238-2442012-05-01RePEc:oup:biomet
article
On robust estimation via pseudo-additive information
We consider a robust parameter estimator minimizing an empirical approximation to the q-entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
238
244
http://hdl.handle.net/10.1093/biomet/asr061
application/pdf
Access to full text is restricted to subscribers.
Davide Ferrari
Davide La Vecchia
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:230-2372012-05-01RePEc:oup:biomet
article
Estimating overdispersion when fitting a generalized linear model to sparse data
We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, φ, is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of φ on Pearson's lack-of-fit statistic, with or without Farrington's modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
230
237
http://hdl.handle.net/10.1093/biomet/asr083
application/pdf
Access to full text is restricted to subscribers.
D. J. Fletcher
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:43-552012-05-01RePEc:oup:biomet
article
Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds
A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
43
55
http://hdl.handle.net/10.1093/biomet/asr078
application/pdf
Access to full text is restricted to subscribers.
Emma F. Eastoe
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:141-1502012-05-01RePEc:oup:biomet
article
A moving average Cholesky factor model in covariance modelling for longitudinal data
We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have a moving average and log-innovation interpretation and are modelled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the coefficient estimates. Furthermore, we study a local search algorithm, computationally more efficient than traditional all subset selection, based on <sc>bic</sc> for model selection, and show its model selection consistency. Thus, a conjecture of Pan & MacKenzie (2003) is verified. We demonstrate the finite-sample performance of the method via analysis of data on CD4 trajectories and through simulations. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
141
150
http://hdl.handle.net/10.1093/biomet/asr068
application/pdf
Access to full text is restricted to subscribers.
Weiping Zhang
Chenlei Leng
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:151-1652012-05-01RePEc:oup:biomet
article
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
151
165
http://hdl.handle.net/10.1093/biomet/asr076
application/pdf
Access to full text is restricted to subscribers.
Grace Y. Yi
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:245-2512012-05-01RePEc:oup:biomet
article
Optimality of group testing in the presence of misclassification
Several optimality properties of Dorfman's (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/asr064
application/pdf
Access to full text is restricted to subscribers.
Aiyi Liu
Chunling Liu
Zhiwei Zhang
Paul S. Albert
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:57-692012-05-01RePEc:oup:biomet
article
Conservative hypothesis tests and confidence intervals using importance sampling
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
57
69
http://hdl.handle.net/10.1093/biomet/asr079
application/pdf
Access to full text is restricted to subscribers.
Matthew T. Harrison
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:211-2222012-05-01RePEc:oup:biomet
article
A proportional likelihood ratio model
We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
211
222
http://hdl.handle.net/10.1093/biomet/asr060
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:85-1002012-05-01RePEc:oup:biomet
article
Combining data from two independent surveys: a model-assisted approach
Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
85
100
http://hdl.handle.net/10.1093/biomet/asr063
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:127-1402012-05-01RePEc:oup:biomet
article
Bayesian analysis of multistate event history data: beta-Dirichlet process prior
Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
127
140
http://hdl.handle.net/10.1093/biomet/asr067
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Lancelot James
Rafael Weissbach
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:185-1972012-05-01RePEc:oup:biomet
article
Mean residual life models with time-dependent coefficients under right censoring
The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
185
197
http://hdl.handle.net/10.1093/biomet/asr065
application/pdf
Access to full text is restricted to subscribers.
Liuquan Sun
Xinyuan Song
Zhigang Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:167-1842012-05-01RePEc:oup:biomet
article
Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
167
184
http://hdl.handle.net/10.1093/biomet/asr062
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Ming-Hui Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:115-1262012-05-01RePEc:oup:biomet
article
Directed acyclic graphs with edge-specific bounds
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
115
126
http://hdl.handle.net/10.1093/biomet/asr059
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
Zhiqiang Tan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:29-422012-05-01RePEc:oup:biomet
article
A direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
29
42
http://hdl.handle.net/10.1093/biomet/asr066
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
Ming Yuan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:1-142012-05-01RePEc:oup:biomet
article
Studies in the history of probability and statistics, L: Karl Pearson and the Rule of Three
Karl Pearson's role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson's work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson's statistical work, and his three major achievements are briefly described. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asr046
application/pdf
Access to full text is restricted to subscribers.
Stephen M. Stigler
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:199-2102012-05-01RePEc:oup:biomet
article
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
199
210
http://hdl.handle.net/10.1093/biomet/asr072
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:101-1132012-05-01RePEc:oup:biomet
article
Optimal allocation to maximize the power of two-sample tests for binary response
We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
101
113
http://hdl.handle.net/10.1093/biomet/asr077
application/pdf
Access to full text is restricted to subscribers.
D. Azriel
M. Mandel
Y. Rinott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:223-2292012-05-01RePEc:oup:biomet
article
Proportional likelihood ratio models for mean regression
The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
223
229
http://hdl.handle.net/10.1093/biomet/asr075
application/pdf
Access to full text is restricted to subscribers.
Alan Huang
Paul J. Rathouz
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:975-9822009-12-01RePEc:oup:biomet
article
Maximum likelihood estimation using composite likelihoods for closed exponential families
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
975
982
http://hdl.handle.net/10.1093/biomet/asp056
application/pdf
Access to full text is restricted to subscribers.
Kanti V. Mardia
John T. Kent
Gareth Hughes
Charles C. Taylor
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:805-8202009-12-01RePEc:oup:biomet
article
Inference on population size in binomial detectability models
Many models for biological populations, including simple mark-recapture models and distance sampling models, involve a binomially distributed number, n, of observations x<sub>1</sub>, …, x<sub>n</sub> on members of a population of size N. Two popular estimators of (N, θ), where θ is a vector parameter, are the maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and the conditional maximum likelihood estimator <inline-formula><inline-graphic xlink:href="asp051ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> based on the conditional distribution of x<sub>1</sub>, …, x<sub>n</sub> given n. We derive the large-N asymptotic distributions of <inline-formula><inline-graphic xlink:href="asp051ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>and <inline-formula><inline-graphic xlink:href="asp051ilm4.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, and give formulae for the biases of <inline-formula><inline-graphic xlink:href="asp051ilm5.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> and <inline-formula><inline-graphic xlink:href="asp051ilm6.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>. We show that the difference <inline-formula><inline-graphic xlink:href="asp051ilm7.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>is, remarkably, of order 1 and we give a simple formula for the leading part of this difference. Simulations indicate that in many cases this formula is very accurate and that confidence intervals based on the asymptotic distribution have excellent coverage. An extension to product-binomial models is given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
805
820
http://hdl.handle.net/10.1093/biomet/asp051
application/pdf
Access to full text is restricted to subscribers.
R. M. Fewster
P. E. Jupp
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:957-9702009-12-01RePEc:oup:biomet
article
Nested Latin hypercube designs
We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
957
970
http://hdl.handle.net/10.1093/biomet/asp045
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:847-8602009-12-01RePEc:oup:biomet
article
Generalized fiducial inference for wavelet regression
We apply Fisher's fiducial idea to wavelet regression, first developing a general methodology for handling model selection problems within the fiducial framework. We propose fiducial-based methods for wavelet curve estimation and the construction of pointwise confidence intervals. We show that these confidence intervals have asymptotically correct coverage. Simulations demonstrate that they possess promising empirical properties. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
847
860
http://hdl.handle.net/10.1093/biomet/asp050
application/pdf
Access to full text is restricted to subscribers.
Jan Hannig
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:761-7802009-12-01RePEc:oup:biomet
article
Sinh-arcsinh distributions
We introduce the sinh-arcsinh transformation and hence, by applying it to a generating distribution with no parameters other than location and scale, usually the normal, a new family of sinh-arcsinh distributions. This four-parameter family has symmetric and skewed members and allows for tailweights that are both heavier and lighter than those of the generating distribution. The central place of the normal distribution in this family affords likelihood ratio tests of normality that are superior to the state-of-the-art in normality testing because of the range of alternatives against which they are very powerful. Likelihood ratio tests of symmetry are also available and are very successful. Three-parameter symmetric and asymmetric subfamilies of the full family are also of interest. Heavy-tailed symmetric sinh-arcsinh distributions behave like Johnson S<sub>U</sub> distributions, while their light-tailed counterparts behave like sinh-normal distributions, the sinh-arcsinh family allowing a seamless transition between the two, via the normal, controlled by a single parameter. The sinh-arcsinh family is very tractable and many properties are explored. Likelihood inference is pursued, including an attractive reparameterization. Illustrative examples are given. A multivariate version is considered. Options and extensions are discussed. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
761
780
http://hdl.handle.net/10.1093/biomet/asp053
application/pdf
Access to full text is restricted to subscribers.
M. C. Jones
Arthur Pewsey
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:873-8862009-12-01RePEc:oup:biomet
article
Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach
To estimate the lifetime distribution of right-censored length-biased data, we propose a pseudo-partial likelihood approach that allows us to derive two nonparametric estimators. With its closed-form estimators and explicit limiting variances, this approach retains the simplicity of conditional analysis, and has only a small efficiency loss compared with the unconditional analysis. Under some regularity conditions, we show that the two estimators are uniformly consistent and converge weakly to Gaussian processes. A simulation study demonstrates that the proposed estimators have satisfactory finite-sample performance. Application to an Alzheimer's disease study is reported. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
873
886
http://hdl.handle.net/10.1093/biomet/asp064
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:983-9902009-12-01RePEc:oup:biomet
article
Adaptive approximate Bayesian computation
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
983
990
http://hdl.handle.net/10.1093/biomet/asp052
application/pdf
Access to full text is restricted to subscribers.
Mark A. Beaumont
Jean-Marie Cornuet
Jean-Michel Marin
Christian P. Robert
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:781-7922009-12-01RePEc:oup:biomet
article
A new look at time series of counts
This paper proposes a simple new model for stationary time series of integer counts. Previous work has focused on thinning methods and classical time series autoregressive moving-average difference equations; in contrast, our methods use a renewal process to generate a correlated sequence of Bernoulli trials. By superpositioning independent copies of such processes, stationary series with binomial, Poisson, geometric or any other discrete marginal distribution can be readily constructed. The model class proposed is parsimonious, non-Markov and readily generates series with either short- or long-memory autocovariances. The model can be fitted with linear prediction techniques for stationary series. As an example, a stationary series with binomial marginal distributions is fitted to the number of rainy days in 210 consecutive weeks at Key West, Florida. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
781
792
http://hdl.handle.net/10.1093/biomet/asp057
application/pdf
Access to full text is restricted to subscribers.
Yunwei Cui
Robert Lund
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:917-9322009-12-01RePEc:oup:biomet
article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
917
932
http://hdl.handle.net/10.1093/biomet/asp041
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:945-9562009-12-01RePEc:oup:biomet
article
Sliced space-filling designs
We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
945
956
http://hdl.handle.net/10.1093/biomet/asp044
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:991-9972009-12-01RePEc:oup:biomet
article
Semiparametric methods for evaluating risk prediction markers in case-control studies
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
991
997
http://hdl.handle.net/10.1093/biomet/asp040
application/pdf
Access to full text is restricted to subscribers.
Ying Huang
Margaret Sullivan Pepe
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1005-10112009-12-01RePEc:oup:biomet
article
A note on automatic variable selection using smooth-threshold estimating equations
This paper develops smooth-threshold estimating equations that can automatically eliminate irrelevant parameters by setting them as zero. The resulting estimator enjoys the oracle property in the sense of Fan & Li (2001), even in estimators for which the covariance assumption of Wang & Leng (2007) is violated, such as the Buckley--James estimator. Furthermore, the estimator can be obtained without solving a convex optimization problem. A <sc>bic</sc>-type criterion for tuning parameter selection is also proposed. It is shown that the criterion achieves consistent model selection. A numerical study confirms the performance of the method. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1005
1011
http://hdl.handle.net/10.1093/biomet/asp060
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:998-10042009-12-01RePEc:oup:biomet
article
A note on the variance of doubly-robust G-estimators
A recursive variance calculation is derived for doubly-robust G-estimators for dynamic treatment regimes in a multi-interval setting. Treatment decision parameters are not assumed to be shared across treatment intervals; this independence of parameters permits sequential estimation of the G-estimators' variance when G-estimation is performed in a sequential fashion. The recursive variance calculation is both natural and computationally feasible. This development can easily be adapted to other complex estimating procedures that require nuisance parameter estimation. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
998
1004
http://hdl.handle.net/10.1093/biomet/asp043
application/pdf
Access to full text is restricted to subscribers.
E. E. M. Moodie
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:861-8722009-12-01RePEc:oup:biomet
article
Nonparametric estimation of the probability of illness in the illness-death model under cross-sectional sampling
Cross-sectional sampling is an attractive design that saves resources but results in biased data. For proper inference, one should first discover the bias function and then weigh observations appropriately. We consider cross-sectioning of the illness-death model with the aim of estimating the probability of visiting the illness state before death. We develop simple consistent and asymptotically normal estimators under various assumptions on the model and data collection and, in particular, compare designs with and without a follow-up. These designs are common in surveillance of hospital acquired infections, but estimators currently in use do not properly correct the bias. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asp046
application/pdf
Access to full text is restricted to subscribers.
M. Mandel
R. Fluss
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1024-10242009-12-01RePEc:oup:biomet
article
'Generalized method of moments estimation for linear regression with clustered failure time data'
4
2009
96
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/asp061
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:933-9442009-12-01RePEc:oup:biomet
article
Some design properties of a rejective sampling procedure
Occasionally, a selected probability sample may appear undesirable with respect to the available auxiliary information. In such a situation, the practitioner might consider rejecting the sample and selecting a new set of sample elements. We consider a procedure in which the probability sample is rejected unless the sample mean of an auxiliary vector is within a specified distance of the population mean. It is proven that the large sample mean and variance of the regression estimator for the rejective sample are the same as those of the regression estimator for the original selection procedure. Likewise, the usual estimator of variance for the regression estimator is appropriate for the rejective sample. In a Monte Carlo experiment, the large sample properties hold for relatively small samples and the Monte Carlo results are in agreement with the theoretical orders of approximation. The efficiency effect of the described rejective sampling is o(n<sub>N</sub>-super- - 1, where n<sub>N</sub> is the expected sample size, but the effect can be important for particular samples. For example, rejective sampling can be used to eliminate those samples that give negative weights for the regression estimator. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
933
944
http://hdl.handle.net/10.1093/biomet/asp042
application/pdf
Access to full text is restricted to subscribers.
Wayne A. Fuller
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:793-8042009-12-01RePEc:oup:biomet
article
Bias reduction in exponential family nonlinear models
In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
793
804
http://hdl.handle.net/10.1093/biomet/asp055
application/pdf
Access to full text is restricted to subscribers.
Ioannis Kosmidis
David Firth
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:903-9152009-12-01RePEc:oup:biomet
article
Tests and confidence intervals for secondary endpoints in sequential clinical trials
In a sequential clinical trial whose stopping rule depends on the primary endpoint, inference on secondary endpoints is an important long-standing problem. Ignoring the possibility of early stopping based on the primary endpoint may result in substantial bias. To address this problem, a commonly used approach is to develop bias correction by estimating the bias in the case of bivariate normal outcomes and appealing to joint asymptotic normality of the statistics associated with the primary and secondary endpoints. We propose herein a new approach that uses resampling and a novel ordering scheme in the sample space of sequential statistics observed up to a stopping time. This approach is shown to provide accurate inference in complex clinical trials, including time-sequential trials with survival endpoints and covariates. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
903
915
http://hdl.handle.net/10.1093/biomet/asp063
application/pdf
Access to full text is restricted to subscribers.
Tze Leung Lai
Mei-Chiung Shih
Zheng Su
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1012-10182009-12-01RePEc:oup:biomet
article
A note on adaptive Bonferroni and Holm procedures under dependence
Hochberg & Benjamini (1990) first presented adaptive procedures for controlling familywise error rate. However, until now, it has not been proved that these procedures control the familywise error rate. We introduce a simplified version of Hochberg & Benjamini's adaptive Bonferroni and Holm procedures. Assuming a conditional dependence model, we prove that the former procedure controls the familywise error rate in finite samples while the latter controls it approximately. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1012
1018
http://hdl.handle.net/10.1093/biomet/asp048
application/pdf
Access to full text is restricted to subscribers.
Wenge Guo
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:887-9012009-12-01RePEc:oup:biomet
article
Marginal hazards model for case-cohort studies with multiple disease outcomes
Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
887
901
http://hdl.handle.net/10.1093/biomet/asp059
application/pdf
Access to full text is restricted to subscribers.
S. Kang
J. Cai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:971-9742009-12-01RePEc:oup:biomet
article
Construction of orthogonal Latin hypercube designs
Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/asp058
application/pdf
Access to full text is restricted to subscribers.
Fasheng Sun
Min-Qian Liu
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:835-8452009-12-01RePEc:oup:biomet
article
Bayesian lasso regression
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
835
845
http://hdl.handle.net/10.1093/biomet/asp047
application/pdf
Access to full text is restricted to subscribers.
Chris Hans
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:821-8342009-12-01RePEc:oup:biomet
article
Bayesian analysis of matrix normal graphical models
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
821
834
http://hdl.handle.net/10.1093/biomet/asp049
application/pdf
Access to full text is restricted to subscribers.
Hao Wang
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1019-10232009-12-01RePEc:oup:biomet
article
A note on a conjectured sharpness principle for probabilistic forecasting with calibration
This note proves a weak sharpness principle as conjectured by Gneiting et al. (2007) in connection with probabilistic forecasting subject to calibration constraints. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1019
1023
http://hdl.handle.net/10.1093/biomet/asp054
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:497-5122009-09-29RePEc:oup:biomet
article
Objective Bayesian model selection in Gaussian graphical models
This paper presents a default model-selection procedure for Gaussian graphical models that involves two new developments. First, we develop a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyper-inverse Wishart g-prior, and show how it corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors. Second, we apply a class of priors that automatically handles the problem of multiple hypothesis testing. We demonstrate our methods on a variety of simulated examples, concluding with a real example analyzing covariation in mutual-fund returns. These studies reveal that the combined use of a multiplicity-correction prior on graphs and fractional Bayes factors for computing marginal likelihoods yields better performance than existing Bayesian methods. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
497
512
http://hdl.handle.net/10.1093/biomet/asp017
application/pdf
Access to full text is restricted to subscribers.
C. M. Carvalho
J. G. Scott
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:691-7092009-09-29RePEc:oup:biomet
article
Use of functionals in linearization and composite estimation with application to two-sample survey data
An important problem associated with two-sample surveys is the estimation of nonlinear functions of finite population totals such as ratios, correlation coefficients or measures of income inequality. Computation and estimation of the variance of such complex statistics are made more difficult by the existence of overlapping units. In one-sample surveys, the linearization method based on the influence function approach is a powerful tool for variance estimation. We introduce a two-sample linearization technique that can be viewed as a generalization of the one-sample influence function approach. Our technique is based on expressing the parameters of interest as multivariate functionals of finite and discrete measures and then using partial influence functions to compute the linearized variables. Under broad assumptions, the asymptotic variance of the substitution estimator, derived from Deville (1999), is shown to be the variance of a weighted sum of the linearized variables. The paper then focuses on a general class of composite substitution estimators, and from this class the optimal estimator for minimizing the asymptotic variance is obtained. The efficiency of the optimal composite estimator is demonstrated through an empirical study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
691
709
http://hdl.handle.net/10.1093/biomet/asp039
application/pdf
Access to full text is restricted to subscribers.
C. Goga
J.-C. Deville
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:645-6612009-09-29RePEc:oup:biomet
article
Markov models for accumulating mutations
We introduce and analyze a waiting time model for the accumulation of genetic changes. The continuous-time conjunctive Bayesian network is defined by a partially ordered set of mutations and by the rate of fixation of each mutation. The partial order encodes constraints on the order in which mutations can fixate in the population, shedding light on the mutational pathways underlying the evolutionary process. We study a censored version of the model and derive equations for an <sc>em</sc> algorithm to perform maximum likelihood estimation of the model parameters. We also show how to select the maximum likelihood partially ordered set. The model is applied to genetic data from cancer cells and from drug resistant human immunodeficiency viruses, indicating implications for diagnosis and treatment. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
645
661
http://hdl.handle.net/10.1093/biomet/asp023
application/pdf
Access to full text is restricted to subscribers.
N. Beerenwinkel
S. Sullivant
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:529-5442009-09-29RePEc:oup:biomet
article
Asymptotic properties of penalized spline estimators
We study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines, with knots equal to the data points and a penalty controlling the roughness of the fit. Depending on the number of knots, sample size and penalty, we show that the theoretical properties of penalized regression spline estimators are either similar to those of regression splines or to those of smoothing splines, with a clear breakpoint distinguishing the cases. We prove that using fewer knots results in better asymptotic rates than when using a large number of knots. We obtain expressions for bias and variance and asymptotic rates for the number of knots and penalty parameter. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
529
544
http://hdl.handle.net/10.1093/biomet/asp035
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Tatyana Krivobokova
Jean D. Opsomer
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:591-6002009-09-29RePEc:oup:biomet
article
Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models
A semiparametric transformation model comprises a parametric component for covariate effects and a nonparametric component for the baseline hazard/intensity. The Breslow-type estimator has been proposed for estimating the nonparametric component in some inefficient estimation procedures. We show that introducing weights into this estimator leads to nonparametric maximum likelihood estimation, with the weights depending on the martingale residuals. The weighted Breslow-type estimator suggests an iterative reweighting algorithm for nonparametric maximum likelihood estimation, which can be implemented by a weighted variant of the existing algorithms for inefficient estimation, and can be computationally more efficient than an <sc>em</sc>-type algorithm. The weighting idea is further extended to semiparametric transformation models with mismeasured covariates. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
591
600
http://hdl.handle.net/10.1093/biomet/asp032
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:751-7602009-09-29RePEc:oup:biomet
article
A Student t-mixture autoregressive model with applications to heavy-tailed financial data
We introduce the class of Student t-mixture autoregressive models, which is promising for financial time series modelling. The model is able to capture serial correlations, time-varying means and volatilities, and the shape of the conditional distributions can be time varied from short-tailed to long-tailed, or from unimodal to multimodal. The use of t-distributed errors in each component of the model allows conditional leptokurtic distributions that account for the commonly observed excess unconditional kurtosis in financial data. Methods of parameter estimation and model selection are given. Finally, the proposed modelling procedure is illustrated through a real example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
751
760
http://hdl.handle.net/10.1093/biomet/asp031
application/pdf
Access to full text is restricted to subscribers.
C. S. Wong
W. S. Chan
P. L. Kam
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:545-5582009-09-29RePEc:oup:biomet
article
Empirical Bayes estimation for additive hazards regression models
We develop a novel empirical Bayesian framework for the semiparametric additive hazards regression model. The integrated likelihood, obtained by integration over the unknown prior of the nonparametric baseline cumulative hazard, can be maximized using standard statistical software. Unlike the corresponding full Bayes method, our empirical Bayes estimators of regression parameters, survival curves and their corresponding standard errors have easily computed closed-form expressions and require no elicitation of hyperparameters of the prior. The method guarantees a monotone estimator of the survival function and accommodates time-varying regression coefficients and covariates. To facilitate frequentist-type inference based on large-sample approximation, we present the asymptotic properties of the semiparametric empirical Bayes estimates. We illustrate the implementation and advantages of our methodology with a reanalysis of a survival dataset and a simulation study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
545
558
http://hdl.handle.net/10.1093/biomet/asp024
application/pdf
Access to full text is restricted to subscribers.
Debajyoti Sinha
M. Brent McHenry
Stuart R. Lipsitz
Malay Ghosh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:723-7342009-09-29RePEc:oup:biomet
article
Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
723
734
http://hdl.handle.net/10.1093/biomet/asp033
application/pdf
Access to full text is restricted to subscribers.
Weihua Cao
Anastasios A. Tsiatis
Marie Davidian
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:559-5752009-09-29RePEc:oup:biomet
article
Improving point and interval estimators of monotone functions by rearrangement
Suppose that a target function is monotonic and an available original estimate of this target function is not monotonic. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original confidence interval, which covers the target function with probability at least 1-α, is defined by an upper and lower endpoint functions that are not monotonic. Then the rearranged confidence interval, defined by the rearranged upper and lower endpoint functions, is monotonic, shorter in length in common norms than the original interval, and covers the target function with probability at least 1-α. We illustrate the results with a growth chart example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
559
575
http://hdl.handle.net/10.1093/biomet/asp030
application/pdf
Access to full text is restricted to subscribers.
V. Chernozhukov
I. Fernández-Val
A. Galichon
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:663-6762009-09-29RePEc:oup:biomet
article
Gaussian process emulation of dynamic computer codes
Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. The approach is less straightforward for dynamic codes, which represent time-evolving systems. We develop a novel iterative system to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
663
676
http://hdl.handle.net/10.1093/biomet/asp028
application/pdf
Access to full text is restricted to subscribers.
S. Conti
J. P. Gosling
J. E. Oakley
A. O'Hagan
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:577-5902009-09-29RePEc:oup:biomet
article
Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data
This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton--Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
577
590
http://hdl.handle.net/10.1093/biomet/asp025
application/pdf
Access to full text is restricted to subscribers.
Lynn M. Johnson
Robert L. Strawderman
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:513-5272009-09-29RePEc:oup:biomet
article
Adaptive regularization using the entire solution surface
Several sparseness penalties have been suggested for delivery of good predictive performance in automatic variable selection within the framework of regularization. All assume that the true model is sparse. We propose a penalty, a convex combination of the L<sub>1</sub>- and L<sub>∞</sub>-norms, that adapts to a variety of situations including sparseness and nonsparseness, grouping and nongrouping. The proposed penalty performs grouping and adaptive regularization. In addition, we introduce a novel homotopy algorithm utilizing subgradients for developing regularization solution surfaces involving multiple regularizers. This permits efficient computation and adaptive tuning. Numerical experiments are conducted using simulation. In simulated and real examples, the proposed penalty compares well against popular alternatives. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
513
527
http://hdl.handle.net/10.1093/biomet/asp038
application/pdf
Access to full text is restricted to subscribers.
S. Wu
X. Shen
C. J. Geyer
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:677-6902009-09-29RePEc:oup:biomet
article
Optimal repeated measurement designs for a model with partial interactions
We consider crossover designs for a model with partial interactions. In this model, the carryover effect depends on whether the treatment is preceded by itself or not. When the aim of the experiment is to study the total effects corresponding to a single treatment, we obtain approximate optimal symmetric designs, within the competing class of circular designs, by generalizing the method introduced by Kushner (1997) and Kunert & Martin (2000). This generalization places the method proposed by Bailey & Druilhet (2004) into Kushner's context. The optimal designs obtained are not binary, as in Kunert & Martin (2000). We also propose efficient designs generated by only one sequence. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
677
690
http://hdl.handle.net/10.1093/biomet/asp034
application/pdf
Access to full text is restricted to subscribers.
P. Druilhet
W. Tinsson
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:617-6332009-09-29RePEc:oup:biomet
article
Pseudo-partial likelihood estimators for the Cox regression model with missing covariates
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
617
633
http://hdl.handle.net/10.1093/biomet/asp027
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
Qiang Xu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:711-7222009-09-29RePEc:oup:biomet
article
Effects of data dimension on empirical likelihood
We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (2008). Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
711
722
http://hdl.handle.net/10.1093/biomet/asp037
application/pdf
Access to full text is restricted to subscribers.
Song Xi Chen
Liang Peng
Ying-Li Qin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:601-6152009-09-29RePEc:oup:biomet
article
Pseudo-partial likelihood for proportional hazards models with biased-sampling data
We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
601
615
http://hdl.handle.net/10.1093/biomet/asp026
application/pdf
Access to full text is restricted to subscribers.
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:635-6442009-09-29RePEc:oup:biomet
article
Approximating the α-permanent
The standard matrix permanent is the solution to a number of combinatorial and graph-theoretic problems, and the α-weighted permanent is the density function for a class of Cox processes called boson processes. The exact computation of the ordinary permanent is known to be #P-complete, and the same appears to be the case for the α-permanent for most values of α. At present, the lack of a satisfactory algorithm for approximating the α-permanent is a formidable obstacle to the use of boson processes in applied work. This paper proposes an importance-sampling estimator using nonuniform random permutations generated in a cycle format. Empirical investigation reveals that the estimator works well for the sorts of matrices that arise in point-process applications, involving up to a few hundred points. We conclude with a numerical illustration of the Bayes estimate of the intensity function of a boson point process, which is a ratio of α-permanents. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
635
644
http://hdl.handle.net/10.1093/biomet/asp036
application/pdf
Access to full text is restricted to subscribers.
S. C. Kou
P. McCullagh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:735-7492009-09-29RePEc:oup:biomet
article
A negative binomial model for time series of counts
We study generalized linear models for time series of counts, where serial dependence is introduced through a dependent latent process in the link function. Conditional on the covariates and the latent process, the observation is modelled by a negative binomial distribution. To estimate the regression coefficients, we maximize the pseudolikelihood that is based on a generalized linear model with the latent process suppressed. We show the consistency and asymptotic normality of the generalized linear model estimator when the latent process is a stationary strongly mixing process. We extend the asymptotic results to generalized linear models for time series, where the observation variable, conditional on covariates and a latent process, is assumed to have a distribution from a one-parameter exponential family. Thus, we unify in a common framework the results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
735
749
http://hdl.handle.net/10.1093/biomet/asp029
application/pdf
Access to full text is restricted to subscribers.
Richard A. Davis
Rongning Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:383-3982013-03-04RePEc:oup:biomet
article
Nonparametric additive regression for repeatedly measured data
We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
383
398
http://hdl.handle.net/10.1093/biomet/asp015
application/pdf
Access to full text is restricted to subscribers.
Raymond J. Carroll
Arnab Maity
Enno Mammen
Kyusang Yu
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:579-5892013-03-04RePEc:oup:biomet
article
A diagnostic procedure based on local influence
Cook's (1986) normal curvature measure is useful for sensitivity analysis of model assumptions in statistical models. However, there is no rigorous approach based on the normal curvature for addressing two fundamental issues: to assess the extent of discrepancy between an assumed model and the underlying model from which the data are generated, and to identify suspicious data points for which the discrepancy is most evident. Our purpose is to establish a theoretically sound procedure for resolving these issues for case-weight perturbation under the framework of independent distributions. We show that the local influence measure, Cook's distance and likelihood distance are asymptotically equivalent. A diagnostic procedure, based on local influence, is proposed for evaluating model misspecification and for detecting influential points simultaneously. We analyse two real datasets. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
579
589
Hongtu Zhu
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:403-1142013-03-04RePEc:oup:biomet
article
Nonparametric estimation of age-at-onset distributions from censored kin-cohort data
We present a nonparametric estimator of genotype-specific age-at-onset distributions from kin-cohort data. Standard error calculations are derived and the methodology is illustrated through an analysis of the influence of mutations of the Parkin gene on Parkinson's disease. Semiparametric efficiency considerations are briefly discussed. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
403
114
http://hdl.handle.net/10.1093/biomet/asm027
application/pdf
Access to full text is restricted to subscribers.
Yuanjia Wang
Lorraine N. Clark
Karen Marder
Daniel Rabinowitz
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:489-4902013-03-04RePEc:oup:biomet
article
A counterexample to a claim about stochastic simulations
Engen & Lillegård (1997) presented a general method for doing Monte Carlo simulations conditioned on a sufficient statistic. The basic idea was to adjust the parameter values in the corresponding unconditional simulation so that the actual value of the sufficient statistic is obtained, and the claim was that if this adjustment is unique then the modified simulation is from the conditional distribution. Unfortunately the claim is not correct, as shown by a counterexample. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
489
490
Bo Henry Lindqvist
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:147-1612013-03-04RePEc:oup:biomet
article
On least-squares regression with censored data
The semiparametric accelerated failure time model relates the logarithm of the failure time linearly to the covariates while leaving the error distribution unspecified. The present paper describes simple and reliable inference procedures based on the least-squares principle for this model with right-censored data. The proposed estimator of the vector-valued regression parameter is an iterative solution to the Buckley--James estimating equation with a preliminary consistent estimator as the starting value. The estimator is shown to be consistent and asymptotically normal. A novel resampling procedure is developed for the estimation of the limiting covariance matrix. Extensions to marginal models for multivariate failure time data are considered. The performance of the new inference procedures is assessed through simulation studies. Illustrations with medical studies are provided. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
147
161
http://hdl.handle.net/10.1093/biomet/93.1.147
text/html
Access to full text is restricted to subscribers.
Zhezhen Jin
D. Y. Lin
Zhiliang Ying
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:503-5182013-03-04RePEc:oup:biomet
article
Sample size formulae for two-stage randomized trials with survival outcomes
Two-stage randomized trials are growing in importance in developing adaptive treatment strategies, i.e. treatment policies or dynamic treatment regimes. Usually, the first stage involves randomization to one of the several initial treatments. The second stage of treatment begins when an early nonresponse criterion or response criterion is met. In the second-stage, nonresponding subjects are re-randomized among second-stage treatments. Sample size calculations for planning these two-stage randomized trials with failure time outcomes are challenging because the variances of common test statistics depend in a complex manner on the joint distribution of time to the early nonresponse criterion or response criterion and the primary failure time outcome. We produce simple, albeit conservative, sample size formulae by using upper bounds on the variances. The resulting formulae only require the working assumptions needed to size a standard single-stage randomized trial and, in common settings, are only mildly conservative. These sample size formulae are based on either a weighted Kaplan--Meier estimator of survival probabilities at a fixed time-point or a weighted version of the log-rank test. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
503
518
http://hdl.handle.net/10.1093/biomet/asr019
application/pdf
Access to full text is restricted to subscribers.
Zhiguo Li
Susan A. Murphy
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:617-6342013-03-04RePEc:oup:biomet
article
Estimation of the failure time distribution in the presence of informative censoring
We present a method for estimating the survival curve of a continuous failure time random variable from right-censored data. Our method allows adjustment for informative censoring due to measured prognostic factors for time-to-event and censoring while simultaneously quantifying the sensitivity of the inference to residual dependence between failure and censoring due to unmeasured factors. We present the results of a simulation study and illustrate our approach using data from the AIDS Clinical Trial Group 175 study. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
617
634
Daniel O. Scharfstein
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:197-2122013-03-04RePEc:oup:biomet
article
Adaptive two-stage test procedures to find the best treatment in clinical trials
A main objective in clinical trials is to find the best treatment in a given finite class of competing treatments and then to show superiority of this treatment against a control treatment. The traditional procedure estimates the best treatment in a first trial. Then in an independent second trial superiority of this treatment, estimated as best in the first trial, is to be shown against the control treatment by a size α test. In this paper we investigate these two trials of this traditional procedure as a two-stage test procedure. Additionally we introduce competing two-stage group-sequential test procedures. Then we derive formulae for the expected number of patients. These formulae depend on unknown parameters. When we have a prior for the unknown parameters we can determine the two-stage test procedure of size α and power β that is optimal, in that it needs a minimal number of observations. The results are illustrated by a numerical example, which indicates the superiority of the group-sequential procedures. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
197
212
http://hdl.handle.net/10.1093/biomet/92.1.197
text/html
Access to full text is restricted to subscribers.
Wolfgang Bischoff
Frank Miller
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:967-9782013-03-04RePEc:oup:biomet
article
Blocking, efficiency and weighted optimality
Optimal blocking is explored for experiments, such as those incorporating one or more controls, where not all treatment comparisons are of equal interest. Weighted optimality functions are employed in gaining both analytic and enumerative results; a catalogue of smaller optimal designs is provided. It is shown how design selection based on functions of variances, and on functions of efficiency factors, are both subsumed by the weighted approach. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
967
978
http://hdl.handle.net/10.1093/biomet/asr042
application/pdf
Access to full text is restricted to subscribers.
Xiaowei Wang
J. P. Morgan
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:847-8582013-03-04RePEc:oup:biomet
article
Estimating equations for spatially correlated data in multi-dimensional space
We use the quasilikelihood concept to propose an estimating equation for spatial data with correlation across the study region in a multi-dimensional space. With appropriate mixing conditions, we develop a central limit theorem for a random field under various L<sub>p</sub> metrics. The consistency and asymptotic normality of quasilikelihood estimators can then be derived. We also conduct simulations to evaluate the performance of the proposed estimating equation, and a dataset from East Lansing Woods is used to illustrate the method. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
847
858
http://hdl.handle.net/10.1093/biomet/asn046
application/pdf
Access to full text is restricted to subscribers.
Pei-Sheng Lin
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Optimal blocking of two-level factorial designs
Blocking of two-level factorial designs is considered for block sizes 2 and 4 using the method of fractional partial confounding. A-, D- and E-optimal designs are obtained for block size 2 within the class of orthogonal designs for which main effects and two-factor interactions are all orthogonal to each other before allowing for blocking. A-, D- and E-optimal designs are obtained for block size 4 within the class of orthogonal designs with main effects orthogonal to blocks. The designs obtained also have other favourable properties including orthogonal estimation of effects and orthogonality to superblocks. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
289
302
http://hdl.handle.net/10.1093/biomet/93.2.289
text/html
Access to full text is restricted to subscribers.
Neil A. Butler
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:791-8072013-03-04RePEc:oup:biomet
article
Posterior propriety and computation for the Cox regression model with applications to missing covariates
In this paper, we carry out an in-depth theoretical investigation of Bayesian inference for the Cox regression model. We establish necessary and sufficient conditions for posterior propriety of the regression coefficient, β, in Cox's partial likelihood, which can be obtained as the limiting marginal posterior distribution of β through the specification of a gamma process prior for the cumulative baseline hazard and a uniform improper prior for β. We also examine necessary and sufficient conditions for posterior propriety of the regression coefficients, β, using full likelihood Bayesian approaches in which a gamma process prior is specified for the cumulative baseline hazard. We examine characterisation of posterior propriety under completely observed data settings as well as for settings involving missing covariates. Latent variables are introduced to facilitate a straightforward Gibbs sampling scheme in the Bayesian computation. A real dataset is presented to illustrate the proposed methodology. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
791
807
http://hdl.handle.net/10.1093/biomet/93.4.791
text/html
Access to full text is restricted to subscribers.
Ming-Hui Chen
Joseph G. Ibrahim
Qi-Man Shao
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:437-4492013-03-04RePEc:oup:biomet
article
Nonparametric variance estimation in the analysis of microarray data: a measurement error approach
We investigate the effects of measurement error on the estimation of nonparametric variance functions. We show that either ignoring measurement error or direct application of the simulation extrapolation, SIMEX, method leads to inconsistent estimators. Nevertheless, the direct SIMEX method can reduce bias relative to a naive estimator. We further propose a permutation SIMEX method that leads to consistent estimators in theory. The performance of both the SIMEX methods depends on approximations to the exact extrapolants. Simulations show that both the SIMEX methods perform better than ignoring measurement error. The methodology is illustrated using microarray data from colon cancer patients. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
437
449
http://hdl.handle.net/10.1093/biomet/asn017
application/pdf
Access to full text is restricted to subscribers.
Raymond J. Carroll
Yuedong Wang
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:691-7012013-03-04RePEc:oup:biomet
article
Diagnostic checking for time series models with conditional heteroscedasticity estimated by the least absolute deviation approach
The recent paper by Peng & Yao (2003) gave an interesting extension of least absolute deviation estimation to generalised autoregressive conditional heteroscedasticity, GARCH, time series models. The asymptotic distributions of absolute residual autocorrelations and squared residual autocorrelations from the GARCH model estimated by the least absolute deviation method are derived in this paper. These results lead to two useful diagnostic tools which can be used to check whether or not a GARCH model fitted by using the least absolute deviation method is adequate. Some simulation experiments give further support to the asymptotic theory and a real data example is also reported. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
691
701
http://hdl.handle.net/10.1093/biomet/92.3.691
text/html
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:569-5842013-03-04RePEc:oup:biomet
article
Dimension reduction in regression without matrix inversion
Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient dimension reduction provides a promising approach to such problems, by restricting attention to d < n linear combinations of the original p predictors. However, standard methods of sufficient dimension reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
569
584
http://hdl.handle.net/10.1093/biomet/asm038
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Bing Li
Francesca Chiaromonte
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:187-1992013-03-04RePEc:oup:biomet
article
Dealing with limited overlap in estimation of average treatment effects
Estimation of average treatment effects under unconfounded or ignorable treatment assignment is often hampered by lack of overlap in the covariate distributions between treatment groups. This lack of overlap can lead to imprecise estimates, and can make commonly used estimators sensitive to the choice of specification. In such cases researchers have often used ad hoc methods for trimming the sample. We develop a systematic approach to addressing lack of overlap. We characterize optimal subsamples for which the average treatment effect can be estimated most precisely. Under some conditions, the optimal selection rules depend solely on the propensity score. For a wide range of distributions, a good approximation to the optimal rule is provided by the simple rule of thumb to discard all units with estimated propensity scores outside the range [0.1,0.9]. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
187
199
http://hdl.handle.net/10.1093/biomet/asn055
application/pdf
Access to full text is restricted to subscribers.
Richard K. Crump
V. Joseph Hotz
Guido W. Imbens
Oscar A. Mitnik
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:337-3502013-03-04RePEc:oup:biomet
article
On the identification of path analysis models with one hidden variable
We study criteria for identifiability of path analysis models with one hidden variable. We first derive sufficient criteria for identification of models in which marginalisation is carried out over the hidden variable. The sufficient criteria are based on the structure of the directed acyclic graph associated with the path analysis model and can be derived from the graph. We treat further the identification of models when the hidden variable is conditioned on and establish connections with the extended skew-normal distribution. Finally it is shown that the derived conditions extend the existing graphical criteria for identification. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
337
350
http://hdl.handle.net/10.1093/biomet/92.2.337
text/html
Access to full text is restricted to subscribers.
Elena Stanghellini
Nanny Wermuth
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:991-9942013-03-04RePEc:oup:biomet
article
A note on 'Testing the number of components in a normal mixture'
In a recent paper, Lo et al. (2001) propose a test for the likelihood ratio statistic based on the Kullback--Leibler information criterion when testing the null hypothesis that a random sample is drawn from a mixture of k-sub-0 normal components against the alternative hypothesis of a mixture with k-sub-1 normal components with k-sub-0 less than k-sub-1. However, this result requires conditions that are generally not met when the null hypothesis holds. Consequently, the result is not proven and simulations suggest that it may not be correct. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
991
994
Neal O. Jeffries
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:149-1672013-03-04RePEc:oup:biomet
article
Probability estimation for large-margin classifiers
Large margin classifiers have proven to be effective in delivering high predictive accuracy, particularly those focusing on the decision boundaries and bypassing the requirement of estimating the class probability given input for discrimination. As a result, these classifiers may not directly yield an estimated class probability, which is of interest itself. To overcome this difficulty, this article proposes a novel method for estimating the class probability through sequential classifications, by using features of interval estimation of large-margin classifiers. The method uses sequential classifications to bracket the class probability to yield an estimate up to the desired level of accuracy. The method is implemented for support vector machines and ψ-learning, in addition to an estimated Kullback--Leibler loss for tuning. A solution path of the method is derived for support vector machines to reduce further its computational cost. Theoretical and numerical analyses indicate that the method is highly competitive against alternatives, especially when the dimension of the input greatly exceeds the sample size. Finally, an application to leukaemia data is described. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
149
167
http://hdl.handle.net/10.1093/biomet/asm077
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
Xiaotong Shen
Yufeng Liu
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:411-4262013-03-04RePEc:oup:biomet
article
Non-finite Fisher information and homogeneity: an EM approach
Even simple examples of finite mixture models can fail to fulfil the regularity conditions that are routinely assumed in standard parametric inference problems. Many methods have been investigated for testing for homogeneity in finite mixture models, for example, but all rely on regularity conditions including the finiteness of the Fisher information and the space of the mixing parameter being a compact subset of some Euclidean space. Very simple examples where such assumptions fail include mixtures of two geometric distributions and two exponential distributions, and, more generally, mixture models in scale distribution families. To overcome these difficulties, we propose and study an <sc>em</sc>-test statistic, which has a simple limiting distribution for examples in this paper. Simulations show that the <sc>em</sc>-test has accurate Type I errors and is more efficient than existing methods when they are applicable. A real example is included. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
411
426
http://hdl.handle.net/10.1093/biomet/asp011
application/pdf
Access to full text is restricted to subscribers.
P. Li
J. Chen
P. Marriott
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:987-9942013-03-04RePEc:oup:biomet
article
Likelihood analysis of the binary instrumental variable model
Instrumental variables are widely used for the identification of the causal effect of one random variable on another under unobserved confounding. The distribution of the observable variables for a discrete instrumental variable model satisfies certain inequalities but no conditional independence relations. Such models are usually tested by checking whether the relative frequency estimators of the parameters satisfy the constraints. This ignores sampling uncertainty in the data. Using the observable constraints for the instrumental variable model, a likelihood analysis is conducted. A significance test for its validity is developed, and a bootstrap algorithm for computing confidence intervals for the causal effect is proposed. Applications are given to illustrate the advantage of the suggested approach. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
987
994
http://hdl.handle.net/10.1093/biomet/asr040
application/pdf
Access to full text is restricted to subscribers.
R. R. Ramsahai
S. L. Lauritzen
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:461-4702013-03-04RePEc:oup:biomet
article
Efficient Robbins--Monro procedure for binary data
The Robbins--Monro procedure does not perform well in the estimation of extreme quantiles, because the procedure is implemented using asymptotic results, which are not suitable for binary data. Here we propose a modification of the Robbins--Monro procedure and derive the optimal procedure for binary data under some reasonable approximations. The improvement obtained by using the optimal procedure for the estimation of extreme quantiles is substantial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
461
470
V. Roshan Joseph
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:813-8292013-03-04RePEc:oup:biomet
article
Testing the covariance structure of multivariate random fields
There is an increasing wealth of multivariate spatial and multivariate spatio-temporal data appearing. For such data, an important part of model building is an assessment of the properties of the underlying covariance function describing variable, spatial and temporal correlations. In this paper, we propose a methodology to evaluate the appropriateness of several types of common assumptions on multivariate covariance functions in the spatio-temporal context. The methodology is based on the asymptotic joint normality of the sample space-time cross-covariance estimators. Specifically, we address the assumptions of symmetry, separability and linear models of coregionalization. We conduct simulation experiments to evaluate the sizes and powers of our tests and illustrate our methodology on a trivariate spatio-temporal dataset of pollutants over California. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
813
829
http://hdl.handle.net/10.1093/biomet/asn053
application/pdf
Access to full text is restricted to subscribers.
Bo Li
Marc G. Genton
Michael Sherman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:655-6682013-03-04RePEc:oup:biomet
article
Spherical regression
Methods are introduced for regressing points on the surface of one sphere on points on another. Complex variables and stereographic projection are used to deal with theoretical problems of directional statistics much as they have been used historically to deal with problems in non-Euclidean geometry. The complex plane harbours the group of Möbius transformations, and stereographic projection is used as a bridge to map these Möbius transforms to regression link functions on the surface of a unit sphere. A special form for these links is introduced which employs the complex plane and stereographic projection to effect angular scale changes on the sphere. The family of special forms is closed under orthogonal transformations of the dependent variable and Möbius transformations of the independent variable, and incorporates independence and proper and improper rotations as special cases. Parameter estimation and inference are exemplified using the von Mises--Fisher spherical distribution and vectorcardiogram data. All statistical results and calculations have been formulated in the real domain. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
655
668
T. D. Downs
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:211-2182013-03-04RePEc:oup:biomet
article
Contiguity of the Whittle measure for a Gaussian time series
For a stationary time series, Whittle constructed a likelihood for the spectral density based on the approximate independence of the discrete Fourier transforms of the data at certain frequencies. Whittle's likelihood has been widely used in the literature for constructing estimators. In this paper, we show that, for a Gaussian time series, the Whittle measure is mutually contiguous with the actual distribution of the data. As a consequence, most asymptotic properties of estimators and test statistics derived under the Whittle measure can be carried over to the actual distribution. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
211
218
Nidhan Choudhuri
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:647-6662013-03-04RePEc:oup:biomet
article
The accelerated gap times model
This paper develops a new semiparametric model for the effect of covariates on the conditional intensity of a recurrent event counting process. The model is a transparent extension of the accelerated failure time model for univariate survival data. Estimation of the regression parameter is motivated by semiparametric efficiency considerations, extending the class of weighted log-rank estimating functions originally proposed in Prentice (1978) and subsequently studied in detail by Tsiatis (1990) and Ritov (1990). A novel rank-based one-step estimator for the regression parameter is proposed. An Aalen-type estimator for the baseline intensity function is obtained. Asymptotics are handled with empirical process methods, and finite sample properties are studied via simulation. Finally, the new model is applied to the bladder tumour data of Byar (1980). Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
647
666
http://hdl.handle.net/10.1093/biomet/92.3.647
text/html
Access to full text is restricted to subscribers.
Robert L. Strawderman
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:735-7462013-03-04RePEc:oup:biomet
article
Conditionally specified continuous distributions
A distribution is conditionally specified when its model constraints are expressed conditionally. For example, Besag's (1974) spatial model was specified conditioned on the neighbouring states, and pseudolikelihood is intended to approximate the likelihood using conditional likelihoods. There are three issues of interest: existence, uniqueness and computation of a joint distribution. In the literature, most results and proofs are for discrete probabilities; here we exclusively study distributions with continuous state space. We examine all three issues using the dependence functions derived from decomposition of the conditional densities. We show that certain dependence functions of the joint density are shared with its conditional densities. Therefore, two conditional densities involving the same set of variables are compatible if their overlapping dependence functions are identical. We prove that the joint density is unique when the set of dependence functions is both compatible and complete. In addition, a joint density, apart from a constant, can be computed from the dependence functions in closed form. Since all of the results are expressed in terms of dependence functions, we consider our approach to be dependence-based, whereas methods in the literature are generally density-based. Applications of the dependence-based formulation are discussed. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
735
746
http://hdl.handle.net/10.1093/biomet/asn029
application/pdf
Access to full text is restricted to subscribers.
Yuchung J. Wang
Edward H. Ip
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:977-9842013-03-04RePEc:oup:biomet
article
Miscellanea Kernel-Type Density Estimation on the Unit Interval
We consider kernel-type methods for the estimation of a density on 0,1 which eschew explicit boundary correction. We propose using kernels that are symmetric in their two arguments; these kernels are conditional densities of bivariate copulas. We give asymptotic theory for the version of the new estimator using Gaussian copula kernels and report on simulation comparisons of it with the beta-kernel density estimator of Chen ([1]). We also provide automatic bandwidth selection in the form of 'rule-of-thumb' bandwidths for both estimators. As well as its competitive integrated squared error performance, advantages of the new approach include its greater range of possible values at 0 and 1, the fact that it is a bona fide density and that the individual kernels and resulting estimator are comprehensible in terms of a single simple picture. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asm068
application/pdf
Access to full text is restricted to subscribers.
M.C. Jones
D.A. Henderson
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:633-6462013-03-04RePEc:oup:biomet
article
Bayesian adaptive designs for clinical trials
A Bayesian adaptive design is proposed for a comparative two-armed clinical trial using decision-theoretic approaches. A loss function is specified, based on the cost for each patient and the costs of making incorrect decisions at the end of a trial. At each interim analysis, the decision to terminate or to continue the trial is based on the expected loss function while concurrently incorporating efficacy, futility and cost. The maximum number of interim analyses is determined adaptively by the observed data. We derive explicit connections between the loss function and the frequentist error rates, so that the desired frequentist properties can be maintained for regulatory settings. The operating characteristics of the design can be evaluated on frequentist grounds. Extensive simulations are carried out to compare the proposed design with existing ones. The design is general enough to accommodate both continuous and discrete types of data. We illustrate the methods with an animal study evaluating a medical treatment for cardiac arrest. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
633
646
http://hdl.handle.net/10.1093/biomet/92.3.633
text/html
Access to full text is restricted to subscribers.
Yi Cheng
Yu Shen
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:1-162013-03-04RePEc:oup:biomet
article
Studentization and deriving accurate p-values
We have a statistic for assessing an observed data point relative to a statistical model but find that its distribution function depends on the parameter. To obtain the corresponding p-value, we require the minimally modified statistic that is ancillary; this process is called Studentization. We use recent likelihood theory to develop a maximal third-order ancillary; this gives immediately a candidate Studentized statistic. We show that the corresponding p-value is higher-order Un(0, 1), is equivalent to a repeated bootstrap version of the initial statistic and agrees with a special Bayesian modification of the original statistic. More importantly, the modified statistic and p-value are available by Markov chain Monte Carlo simulations and, in some cases, by higher-order approximation methods. Examples, including the Behrens--Fisher problem, are given to indicate the ease and flexibility of the approach. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
1
16
http://hdl.handle.net/10.1093/biomet/asm093
application/pdf
Access to full text is restricted to subscribers.
D.A.S. Fraser
Judith Rousseau
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:679-6902013-03-04RePEc:oup:biomet
article
Orthogonal bases approach for comparing nonnormal continuous distributions
We present an orthonormal bases approach for detecting general differences among continuous distributions. An unknown density function is represented by a finite vector of its estimated Fourier coefficients with respect to a suitable orthonormal basis. For a wide class of orthonormal bases, we establish asymptotic normality of the vector of estimated Fourier coefficients and propose an unbiased and consistent estimator of its asymptotic covariance matrix. Fourier coeffients are modelled as functions of fixed and possibly random effects. This approach allows simultaneous detection of distributional differences attributable to various factors in clustered and correlated data with suffciently large numbers of observations per each cluster with the same fixed and random effects realisations. This work was motivated by multi-level clustered non-Gaussian datasets from genetic studies. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
679
690
http://hdl.handle.net/10.1093/biomet/92.3.679
text/html
Access to full text is restricted to subscribers.
Inna Chervoneva
Boris Iglewicz
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:401-4092013-03-04RePEc:oup:biomet
article
A type of restricted maximum likelihood estimator of variance components in generalised linear mixed models
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (REML) corrects this problem by using the likelihood of a set of residual contrasts and is generally considered superior. However, this original restricted maximum likelihood definition does not directly extend beyond linear models. We propose a REML-type estimator for generalised linear mixed models by correcting the bias in the profile score function of the variance components. The proposed estimator has the same consistency properties as the maximum likelihood estimator if the number of parameters in the mean and variance components models remains fixed. However, the estimator of the variance components has a smaller finite sample bias. A simulation study with a logistic mixed model shows <?Pub Caret>that the proposed estimator is effective in correcting the downward bias in the maximum likelihood estimator. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
401
409
J. G. Liao
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:943-9542013-03-04RePEc:oup:biomet
article
Statistical inference based on non-smooth estimating functions
When the estimating function for a vector of parameters is not smooth, it is often rather difficult, if not impossible, to obtain a consistent estimator by solving the corresponding estimating equation using standard numerical techniques. In this paper, we propose a simple inference procedure via the importance sampling technique, which provides a consistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates. The new proposal is illustrated and evaluated via two extensive examples with real and simulated datasets. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
943
954
http://hdl.handle.net/10.1093/biomet/91.4.943
text/html
Access to full text is restricted to subscribers.
L. Tian
J. Liu
Y. Zhao
L. J. Wei
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:907-9172013-03-04RePEc:oup:biomet
article
On the asymptotics of marginal regression splines with longitudinal data
There have been studies on how the asymptotic efficiency of a nonparametric function estimator depends on the handling of the within-cluster correlation when nonparametric regression models are used on longitudinal or cluster data. In particular, methods based on smoothing splines and local polynomial kernels exhibit different behaviour. We show that the generalized estimation equations based on weighted least squares regression splines for the nonparametric function have an interesting property: the asymptotic bias of the estimator does not depend on the working correlation matrix, but the asymptotic variance, and therefore the mean squared error, is minimized when the true correlation structure is specified. This property of the asymptotic bias distinguishes regression splines from smoothing splines. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
907
917
http://hdl.handle.net/10.1093/biomet/asn041
application/pdf
Access to full text is restricted to subscribers.
Zhongyi Zhu
Wing K. Fung
Xuming He
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Fully Bayesian spline smoothing and intrinsic autoregressive priors
There is a well-known Bayesian interpretation for function estimation by spline smoothing using a limit of proper normal priors. The limiting prior and the conditional and intrinsic autoregressive priors popular for spatial modelling have a common form, which we call partially informative normal. We derive necessary and sufficient conditions for the propriety of the posterior for this class of partially informative normal priors with noninformative priors on the variance components, a condition crucial for successful implementation of the Gibbs sampler. The results apply for fully Bayesian smoothing splines, thin-plate splines and L-splines, as well as models using intrinsic autoregressive priors. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
289
302
Paul L. Speckman
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:409-4232013-03-04RePEc:oup:biomet
article
Principal Hessian Directions for regression with measurement error
We consider a nonlinear regression problem with predictors with measurement error. We assume that the response is related to unknown linear combinations of a p-dimensional predictor vector through an unknown link function. Instead of observing the predictors, we observe a surrogate vector with the property that its expectation is linearly related to the predictor vector with constant variance. We use an important linear transformation of the surrogates. Based on the transformed variables, we develop the modified Principal Hessian Directions method for estimating the subspace of the effective dimension-reduction space. We derive the asymptotic variances of the modified Principal Hessian Directions estimators. Several examples are reported and comparisons are made with the sliced inverse regression method of Carroll & Li (1992). Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
409
423
Heng-Hui Lue
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:773-7782013-03-04RePEc:oup:biomet
article
A note on conditional <sc>aic</sc> for linear mixed-effects models
The conventional model selection criterion, the Akaike information criterion, <sc>aic</sc>, has been applied to choose candidate models in mixed-effects models by the consideration of marginal likelihood. Vaida & Blanchard (2005) demonstrated that such a marginal <sc>aic</sc> and its small sample correction are inappropriate when the research focus is on clusters. Correspondingly, these authors suggested the use of conditional <sc>aic</sc>. Their conditional <sc>aic</sc> is derived under the assumption that the variance-covariance matrix or scaled variance-covariance matrix of random effects is known. This note provides a general conditional <sc>aic</sc> but without these strong assumptions. Simulation studies show that the proposed method is promising. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
773
778
http://hdl.handle.net/10.1093/biomet/asn023
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Hulin Wu
Guohua Zou
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:221-2272013-03-04RePEc:oup:biomet
article
A note on time-reversibility of multivariate linear processes
We derive some readily verifiable necessary and sufficient conditions for a multivariate non-Gaussian linear process to be time-reversible, under two sets of conditions on the contemporaneous dependence structure of the innovations. One set of conditions concerns the case of independent-component innovations, in which case a multivariate non-Gaussian linear process is time-reversible if and only if the coefficients consist of essentially asymmetric columns with column-specific origins of symmetry or symmetric pairs of columns with pair-specific origins of symmetry. On the other hand, for dependent-component innovations plus other regularity conditions, a multivariate non-Gaussian linear process is time-reversible if and only if the coefficients are essentially symmetric about some origin. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
221
227
http://hdl.handle.net/10.1093/biomet/93.1.221
text/html
Access to full text is restricted to subscribers.
Kung-Sik Chan
Lop-Hing Ho
Howell Tong
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:913-9222013-03-04RePEc:oup:biomet
article
Testing the proportional odds model under random censoring
In practical applications, it is not uncommon for the hazard functions of two groups to converge with time. One approach that allows for converging hazard functions is the proportional odds model. We develop a procedure for testing the proportional odds assumption when the available data consist of two independent random samples of randomly right-censored lifetimes. Asymptotic normality of the test statistic is proved and the procedure is applied to two well-known datasets. The effective significance level and power of the proposed test are assessed through a simulation study. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
913
922
Jean-Yves Dauxois
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:279-2942013-03-04RePEc:oup:biomet
article
On weighted Hochberg procedures
We consider different ways of constructing weighted Hochberg-type step-up multiple test procedures including closed procedures based on weighted Simes tests and their conservative step-up short-cuts, and step-up counterparts of two weighted Holm procedures. It is shown that the step-up counterparts have some serious pitfalls such as lack of familywise error rate control and lack of monotonicity in rejection decisions in terms of p-values. Therefore an exact closed procedure appears to be the best alternative, its only drawback being lack of simple stepwise structure. A conservative step-up short-cut to the closed procedure may be used instead, but with accompanying loss of power. Simulations are used to study the familywise error rate and power properties of the competing procedures for independent and correlated p-values. Although many of the results of this paper are negative, they are useful in highlighting the need for caution when procedures with similar pitfalls may be used. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asn018
application/pdf
Access to full text is restricted to subscribers.
Ajit C. Tamhane
Lingyun Liu
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:743-7502013-03-04RePEc:oup:biomet
article
Nonparametric confidence intervals for receiver operating characteristic curves
We study methods for constructing confidence intervals and confidence bands for estimators of receiver operating characteristics. Particular emphasis is placed on the way in which smoothing should be implemented, when estimating either the characteristic itself or its variance. We show that substantial undersmoothing is necessary if coverage properties are not to be impaired. A theoretical analysis of the problem suggests an empirical, plug-in rule for bandwidth choice, optimising the coverage accuracy of interval estimators. The performance of this approach is explored. Our preferred technique is based on asymptotic approximation, rather than a more sophisticated approach using the bootstrap, since the latter requires a multiplicity of smoothing parameters all of which must be chosen in nonstandard ways. It is shown that the asymptotic method can give very good performance. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
743
750
Peter Hall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:197-2062013-03-04RePEc:oup:biomet
article
Range of correlation matrices for dependent Bernoulli random variables
We say that a pair (p, R) is compatible if there exists a multivariate binary distribution with mean vector p and correlation matrix R. In this paper we study necessary and sufficient conditions for compatibility for structured and unstructured correlation matrices. We give examples of correlation matrices that are incompatible with any p. Using our results we show that the parametric binary models of Emrich & Piedmonte (1991) and Qaqish (2003) allow a good range of correlations between the binary variables. We also obtain necessary and sufficient conditions for a matrix of odds ratios to be compatible with a given p. Our findings support the popular belief that the odds ratios are less constrained and more flexible than the correlations. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
197
206
http://hdl.handle.net/10.1093/biomet/93.1.197
text/html
Access to full text is restricted to subscribers.
N. Rao Chaganty
Harry Joe
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:989-9952013-03-04RePEc:oup:biomet
article
Studies in the history of probability and statistics XLIX On the Matern correlation family
Handcock & Stein (1993) introduced the Matern family of spatial correlations into statistics as a flexible parametric class with one parameter determining the smoothness of the paths of the underlying spatial field. We document the varied history of this family, which includes contributions by eminent physical scientists and statisticians. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
989
995
http://hdl.handle.net/10.1093/biomet/93.4.989
text/html
Access to full text is restricted to subscribers.
Peter Guttorp
Tilmann Gneiting
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Some nonregular designs from the Nordstrom–Robinson code and their statistical properties
The Nordstrom--Robinson code is a well-known nonlinear code in coding theory. This paper explores the statistical properties of this nonlinear code. Many nonregular designs with 32, 64, 128 and 256 runs and 7--16 factors are derived from it. It is shown that these nonregular designs are better than regular designs of the same size in terms of resolution, aberration and projectivity. Furthermore, many of these nonregular designs are shown to have generalised minimum aberration among all possible designs. Seven orthogonal arrays are shown to have unique word-length pattern and four of them are shown to be unique up to isomorphism. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/92.2.385
text/html
Access to full text is restricted to subscribers.
Hongquan Xu
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:845-8582013-03-04RePEc:oup:biomet
article
Adjusted profile estimating function
In settings where the full probability model is not specified, consider a general estimating function g(&thgr;, &lgr;; y) that involves not only the parameters of interest, &thgr;, but also some nuisance parameters, &lgr;. We consider methods for reducing the effects on g of fitting nuisance parameters. We propose Cox--Reid-type adjustment to the profile estimating function, g(&thgr;, &lgr;ˆ-sub-&thgr;; y), that reduces its bias by two orders. Typically, only the first two moments of the response variable are needed to form the adjustment. Important applications of this method include the estimation of the pairwise association and main effects in stratified, clustered data and estimation of the main effects in a matched pair study. A brief simulation study shows that the proposed method considerably reduces the impact of the nuisance parameters. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
845
858
Molin Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:807-8202013-03-04RePEc:oup:biomet
article
Sparse estimation of a covariance matrix
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method's close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
807
820
http://hdl.handle.net/10.1093/biomet/asr054
application/pdf
Access to full text is restricted to subscribers.
Jacob Bien
Robert J. Tibshirani
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:379-3922013-03-04RePEc:oup:biomet
article
Discriminant analysis through a semiparametric model
We consider a semiparametric generalisation of normal-theory discriminant analysis. The semiparametric model assumes that, after unspecified univariate monotone transformations, the class distributions are multivariate normal. We introduce an estimation procedure based on the distribution quantiles, in which the parameters of the semiparametric model are estimated directly without estimating the nonparametric transformations. The procedure is computationally fast and the estimation accuracy is shown to have the usual parametric rate. The relationship between the method and more general nonparametric discriminant analysis is discussed. The semiparametric specification of the class densities is a submodel of the nonparametric log density functional analysis of variance model in which the main effects are completely nonparametric but the interaction terms are specified semiparametrically. Simulations and real examples are used to illustrate the procedure. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
379
392
Y. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:801-8182013-03-04RePEc:oup:biomet
article
Additive hazards model with multivariate failure time data
Marginal additive hazards models are considered for multivariate survival data in which individuals may experience events of several types and there may also be correlation between individuals. Estimators are proposed for the parameters of such models and for the baseline hazard functions. The estimators of the regression coeffcients are shown asymptotically to follow a multivariate normal distribution with a sandwich-type covariance matrix that can be consistently estimated. The estimated baseline and subject-specific cumulative hazard processes are shown to converge weakly to a zero-mean Gaussian random field. The weak convergence properties for the corresponding survival processes are established. A resampling technique is proposed for constructing simultaneous confidence bands for the survival curve of a specific subject. The methodology is extended to a multivariate version of a class of partly parametric additive hazards model. Simulation studies are conducted to assess finite sample properties, and the method is illustrated with an application to development of coronary heart diseases and cardiovascular accidents in the Framingham Heart Study. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
801
818
http://hdl.handle.net/10.1093/biomet/91.4.801
text/html
Access to full text is restricted to subscribers.
Guosheng Yin
Jianwen Cai
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:239-2442013-03-04RePEc:oup:biomet
article
On modelling mean-covariance structures in longitudinal studies
We exploit a reparameterisation of the marginal covariance matrix arising in longitudinal studies (Pourahmadi, 1999, 2000) to model, jointly, the mean and covariance structures in terms of three polynomial functions of time. By reanalysing Kenward's (1987) cattle data, we compare model selection procedures based on regressogram estimation with these based on a global search of the model space. Using a BIC-based model selection criterion to identify the optimum degree triple of the three polynomials, we show that the use of a saturated mean model is not optimal and explain why regressogram-based model estimation may be misleading. We also suggest a new computational method for finding the global optimum based on a criterion involving three pairwise saturated profile likelihoods. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
239
244
Jianxin Pan
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:251-2702013-03-04RePEc:oup:biomet
article
Marginal likelihood, conditional likelihood and empirical likelihood: Connections and applications
Marginal likelihood and conditional likelihood are often used for eliminating nuisance parameters. For a parametric model, it is well known that the full likelihood can be decomposed into the product of a conditional likelihood and a marginal likelihood. This property is less transparent in a nonparametric or semiparametric likelihood setting. In this paper we show that this nice parametric likelihood property can be carried over to the empirical likelihood world. We discuss applications in case-control studies, genetical linkage analysis, genetical quantitative traits analysis, tuberculosis infection data and unordered-paired data, all of which can be treated as semiparametric finite mixture models. We consider the estimation problem in detail in the simplest case of unordered-paired data where we can only observe the minimum and maximum values of two random variables; the identities of the minimum and maximum values are lost. The profile empirical likelihood approach is used for maximum semiparametric likelihood estimation. We present some large-sample results along with a simulation study. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
251
270
http://hdl.handle.net/10.1093/biomet/92.2.251
text/html
Access to full text is restricted to subscribers.
Jing Qin
Biao Zhang
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:491-4962013-03-04RePEc:oup:biomet
article
Nonparametric detection of correlated errors
In regression problems it is hard to detect correlated errors since the errors are not observed. In this paper, a nonparametric method is proposed for the detection of correlated errors when the design points are equally spaced. It turns out that the first-order sample autocovariance of the residuals from the kernel regression estimates provides essential information about correlated errors and its bootstrap is quite effective in implementing such information. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
491
496
Tae Yoon Kim
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:659-6682013-03-04RePEc:oup:biomet
article
Semiparametric analysis of transformation models with censored data
A unified estimation procedure is proposed for the analysis of censored data using linear transformation models, which include the proportional hazards model and the proportional odds model as special cases. This procedure is easily implemented numerically and its validity does not rely on the assumption of independence between the covariates and the censoring variable. The estimator is the same as the Cox partial likelihood estimator in the case of the proportional hazards model. Moreover, the asymptotic variance of the proposed estimator has a closed form and its variance estimator is easily obtained by plug-in rules. The method is illustrated by simulation and is applied to the Veterans' Administration lung cancer data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
659
668
Kani Chen
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:801-8202013-03-04RePEc:oup:biomet
article
Nonparametric maximum likelihood estimation of the structural mean of a sample of curves
A random sample of curves can be usually thought of as noisy realisations of a compound stochastic process X(t) = Z{W(t)}, where Z(t) produces random amplitude variation and W(t) produces random dynamic or phase variation. In most applications it is more important to estimate the so-called structural mean μ(t) = E{Z(t)} than the crosssectional mean E{X(t)}, but this estimation problem is difficult because the process Z(t) is not directly observable. In this paper we propose a nonparametric maximum likelihood estimator of μ(t). This estimator is shown to be √n-consistent and asymptotically normal under the assumed model and robust to model misspecification. Simulations and a realdata example show that the proposed estimator is competitive with landmark registration, often considered the benchmark, and has the advantage of avoiding time-consuming and often infeasible individual landmark identification. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
801
820
http://hdl.handle.net/10.1093/biomet/92.4.801
text/html
Access to full text is restricted to subscribers.
Daniel Gervini
Theo Gasser
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Nonparametric k-sample tests with panel count data
We study the nonparametric k-sample test problem with panel count data. The asymptotic normality of a smooth functional of the nonparametric maximum pseudo-likelihood estimator (Wellner & Zhang, 2000) is established under some mild conditions. We construct a class of easy-to-implement nonparametric tests for comparing mean functions of k populations based on this asymptotic normality. We conduct various simulations to validate and compare the tests. The simulations show that the tests perform quite well and generally have good power to detect differences among the mean functions. The method is illustrated with a real-life example. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
777
790
http://hdl.handle.net/10.1093/biomet/93.4.777
text/html
Access to full text is restricted to subscribers.
Ying Zhang
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:723-7332013-03-04RePEc:oup:biomet
article
Empirical-type likelihoods allowing posterior credible sets with frequentist validity: Higher-order asymptotics
With reference to a general class of empirical-type likelihoods, we develop higher-order asymptotics for the frequentist coverage of Bayesian credible sets based on posterior quantiles and highest posterior density. These asymptotics, in turn, characterise members of the class that allow approximate frequentist validity of such sets. It is seen that the usual empirical likelihood does not enjoy this property up to the order of approximation considered here. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
723
733
http://hdl.handle.net/10.1093/biomet/93.3.723
text/html
Access to full text is restricted to subscribers.
Kai-Tai Fang
Rahul Mukerjee
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:613-6282013-03-04RePEc:oup:biomet
article
Large-sample properties of the periodogram estimator of seasonally persistent processes
Seasonally persistent models were first introduced by Andel (1986) and Gray et al. (1989) to extend autoregressive moving-average and fractionally differenced models and to encompass long-memory quasi-periodic behaviour. These models are, for certain ranges of parameters, stationary, and we prove here that the behaviour of the periodogram and other tapered estimators cannot be simply extended from the work of Kunsch (1986) and Hurvich & Beltrao (1993) on long memory induced by a pole at the origin. We demonstrate that potentially large both positive and negative bias can be found from the same value of the long-memory parameter, and that the new distribution can be easily written down in the case of Gaussian processes. We also consider using both the cosine taper and the sine taper. The extended least squares estimator is also considered in this context. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
613
628
Sofia C. Olhede
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:183-1962013-03-04RePEc:oup:biomet
article
Models and inference for uncertainty in extremal dependence
Conventionally, modelling of multivariate extremes has been based on the class of multivariate extreme value distributions. More recently, other classes have been developed, allowing for the possibility that, whilst dependence is observed at finite levels, the limit distribution is independent. A number of articles have shown this development to be important for accurate estimation of the extremal properties, both of theoretical processes and observed datasets. It has also been shown that, so far as dependence is concerned, the choice between modelling with either asymptotically dependent or asymptotically independent distributions can be far more influential than model choice within either of these two classes. In this paper we explore the issue of modelling across both classes, examining in particular the effect of uncertainty caused by lack of knowledge about the status of asymptotic dependence. This is achieved by new multivariate models whose parameter spaces are such that asymptotic dependence occurs on a boundary. Standard techniques in Bayesian inference, implemented through Markov chain Monte Carlo, enable inferences to be drawn that assign posterior probability mass to the boundary region. The techniques are illustrated on a set of oceanographic data for which previous analyses have shown that it is difficult to resolve the question of asymptotic dependence status, which is however important in model extrapolation. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
183
196
Stuart Coles
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:975-9862013-03-04RePEc:oup:biomet
article
Multivariate distributions with support above the diagonal
A general family of distributions for the empirical modelling of ordered multivariate data is proposed. The family is based on, but greatly extends, the joint distribution of order statistics from an independent and identically distributed univariate sample. General properties, including marginal and conditional distributions, bivariate dependence, limiting distributions and links to the Dirichlet distribution are described. Univariate and bivariate special cases of the multivariate distributions, the latter including an equivalent rotated version, are considered. Two particular tractable special cases are stressed. The models are successfully and usefully fitted, by maximum likelihood, to meteorological data. The models are also applicable to data in which one variable is unconstrained and the other are all nonnegative. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
975
986
http://hdl.handle.net/10.1093/biomet/91.4.975
text/html
Access to full text is restricted to subscribers.
M. C. Jones
P. V. Larsen
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:899-9122013-03-04RePEc:oup:biomet
article
Martingale difference residuals as a diagnostic tool for the Cox model
The proportional hazards model makes two major assumptions: the hazard ratio is constant over time, and the relationship between the hazard and continuous covariates is log-linear. Methods exist for checking and relaxing each of these assumptions, but in both cases the methods rely on the other assumption being true. Problems can occur if neither of the assumptions is appropriate, or even if only one of the assumptions is appropriate but it is not known which. We propose a new kind of residual for checking the two assumptions simultaneously. The smoothed residuals provide a flexible estimate of the hazard ratio, which may deviate from the standard proportional hazards model by having a time-dependent hazard ratio, transformed covariates or both. The methods are illustrated using data from the Medical Research Council's myeloma trials. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
899
912
Peter D. Sasieni
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:729-7372013-03-04RePEc:oup:biomet
article
A note on pseudolikelihood constructed from marginal densities
For likelihood-based inference involving distributions in which high-dimensional dependencies are present it may be useful to use approximate likelihoods based, for example, on the univariate or bivariate marginal distributions. The asymptotic properties of formal maximum likelihood estimators in such cases are outlined. In particular, applications in which only a single qx1 vector of observations is observed are examined. Conditions under which consistent estimators of parameters result from the approximate likelihood using only pairwise joint distributions are studied. Some examples are analysed in detail. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
729
737
D. R. Cox
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:299-3142013-03-04RePEc:oup:biomet
article
Modelling multivariate failure time associations in the presence of a competing risk
There has been much research on analysing multivariate failure times, but little that has accommodated failures that arise in the presence of a competing failure process. This paper studies the problem of describing associations among times to such failures. It proposes a modified conditional hazard ratio measure of association that is tailored to competing risks data, develops frailty models and a nonparametric method for describing the proposed measure, and contrasts estimation by proposed methods with the 'standard' of treating competing risks as independently censoring failure times due to targeted causes. The methods are investigated on simulated and real data. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
299
314
Karen Bandeen-Roche
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:785-8062013-03-04RePEc:oup:biomet
article
Bayesian model discrimination for multiple strata capture-recapture data
Extending the work of Dupuis (1995), we motivate a range of biologically plausible models for multiple-site capture-recapture and show how the original Gibbs sampling algorithm of Dupuis can be extended to obtain posterior model probabilities using reversible jump Markov chain Monte Carlo. This model selection procedure improves upon previous analyses in two distinct ways. First, Bayesian model averaging provides a robust parameter estimation technique which properly incorporates model uncertainty in the resulting intervals. Secondly, by discriminating among perhaps millions of competing models, we are able to discern fine structure within the data and thereby answer questions of primary biological importance. We demonstrate how reversible jump Markov chain Monte Carlo methods provide the only viable method for exploring model spaces of this size. We examine the lizard data discussed in Dupuis (1995) and show that most of the posterior mass is placed upon models not previously considered for these data. We discuss model discrimination and model averaging and focus upon the increased scientific understanding of the data obtained via the Bayesian model comparison procedure. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
785
806
R. King
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:371-3822013-03-04RePEc:oup:biomet
article
Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve
Recent scientific and technological innovations have produced an abundance of potential markers that are being investigated for their use in disease screening and diagnosis. In evaluating these markers, it is often necessary to account for covariates associated with the marker of interest. Covariates may include subject characteristics, expertise of the test operator, test procedures or aspects of specimen handling. In this paper, we propose the covariate-adjusted receiver operating characteristic curve, a measure of covariate-adjusted classification accuracy. Nonparametric and semiparametric estimators are proposed, asymptotic distribution theory is provided and finite sample performance is investigated. For illustration we characterize the age-adjusted discriminatory accuracy of prostate-specific antigen as a biomarker for prostate cancer. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
371
382
http://hdl.handle.net/10.1093/biomet/asp002
application/pdf
Access to full text is restricted to subscribers.
Holly Janes
Margaret S. Pepe
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:875-8892013-03-04RePEc:oup:biomet
article
Pairwise curve synchronization for functional data
Data collected by scientists are increasingly in the form of trajectories or curves. Often these can be viewed as realizations of a composite process driven by both amplitude and time variation. We consider the situation in which functional variation is dominated by time variation, and develop a curve-synchronization method that uses every trajectory in the sample as a reference to obtain pairwise warping functions in the first step. These initial pairwise warping functions are then used to create improved estimators of the underlying individual warping functions in the second step. A truncated averaging process is used to obtain robust estimation of individual warping functions. The method compares well with other available time-synchronization approaches and is illustrated with Berkeley growth data and gene expression data for multiple sclerosis. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
875
889
http://hdl.handle.net/10.1093/biomet/asn047
application/pdf
Access to full text is restricted to subscribers.
Rong Tang
Hans-Georg Müller
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:967-9752013-03-04RePEc:oup:biomet
article
Least absolute deviations estimation for ARCH and GARCH models
Hall & Yao (2003) showed that, for ARCH/GARCH, i.e. autoregressive conditional heteroscedastic/generalised autoregressive conditional heteroscedastic, models with heavy-tailed errors, the conventional maximum quasilikelihood estimator suffers from complex limit distributions and slow convergence rates. In this paper three types of absolute deviations estimator have been examined, and the one based on logarithmic transformation turns out to be particularly appealing. We have shown that this estimator is asymptotically normal and unbiased. Furthermore it enjoys the standard convergence rate of n-super-1/2 regardless of whether the errors are heavy-tailed or not. Simulation lends further support to our theoretical results. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
967
975
Liang Peng
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:303-3172013-03-04RePEc:oup:biomet
article
Bayesian methods for partial stochastic orderings
We discuss two methods of making nonparametric Bayesian inference on probability measures subject to a partial stochastic ordering. The first method involves a nonparametric prior for a measure on partially ordered latent observations, and the second involves rejection sampling. Computational approaches are discussed for each method, and interpretations of prior and posterior information are discussed. An application is presented in which inference is made on the number of independently segregating quantitative trait loci present in an animal population. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
303
317
Peter D. Hoff
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:851-8602013-03-04RePEc:oup:biomet
article
A practical affine equivariant multivariate median
A robust affine equivariant estimator of location for multivariate data is proposed which becomes the univariate median for data of dimension one. The estimator is robust in the sense that it has a bounded influence function, a positive breakdown value and has high efficiency compared to the sample mean for heavy-tailed distributions. Perhaps its greatest strength is that, unlike other affine equivariant multivariate medians, it is easily computed for data in any practical dimension. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
851
860
Thomas P. Hettmansperger
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:335-3492013-03-04RePEc:oup:biomet
article
Multi-parameter automodels and their applications
Motivated by the modelling of non-Gaussian data or positively correlated data on a lattice, extensions of Besag's automodels to exponential families with multi-dimensional parameters have been proposed recently. We provide a multiple-parameter analogue of Besag's one-dimensional result that gives the necessary form of the exponential families for the Markov random field's conditional distributions. We propose estimation of parameters by maximum pseudolikelihood and give a proof of the consistency of the estimators for the multi-parameter automodel. The methodology is illustrated with examples, in particular the building of a cooperative system with beta conditional distributions. We also indicate future applications of these models to the analysis of mixed-state spatial data. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
335
349
http://hdl.handle.net/10.1093/biomet/asn016
application/pdf
Access to full text is restricted to subscribers.
Cécile Hardouin
Jian-Feng Yao
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:251-2672013-03-04RePEc:oup:biomet
article
Decomposability and selection of graphical models for multivariate time series
We derive conditions for decomposition and collapsibility of graphical interaction models for multivariate time series. These properties enable us to perform stepwise model selection under certain restrictions. For illustration, we apply the results to a multivariate time series describing the haemodynamic system as monitored in intensive care. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
251
267
Roland Fried
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:317-3352013-03-04RePEc:oup:biomet
article
A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models
A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M-super-+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M-super-+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
317
335
http://hdl.handle.net/10.1093/biomet/92.2.317
text/html
Access to full text is restricted to subscribers.
Aliye Atay-Kayis
Helène Massam
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:691-7032013-03-04RePEc:oup:biomet
article
Adaptive Lasso for Cox's proportional hazards model
We investigate the variable selection problem for Cox's proportional hazards model, and propose a unified model selection and estimation procedure with desired theoretical properties and computational convenience. The new method is based on a penalized log partial likelihood with the adaptively weighted L<sub>1</sub> penalty on regression coefficients, providing what we call the adaptive Lasso estimator. The method incorporates different penalties for different coefficients: unimportant variables receive larger penalties than important ones, so that important variables tend to be retained in the selection process, whereas unimportant variables are more likely to be dropped. Theoretical properties, such as consistency and rate of convergence of the estimator, are studied. We also show that, with proper choice of regularization parameters, the proposed estimator has the oracle properties. The convex optimization nature of the method leads to an efficient algorithm. Both simulated and real examples show that the method performs competitively. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
691
703
http://hdl.handle.net/10.1093/biomet/asm037
application/pdf
Access to full text is restricted to subscribers.
Hao Helen Zhang
Wenbin Lu
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:573-5862013-03-04RePEc:oup:biomet
article
Differential effects and generic biases in observational studies
There are two treatments, each of which may be applied or withheld, yielding a 2 x 2 factorial arrangement with three degrees of freedom between groups. The differential effect of the two treatments is the effect of applying one treatment in lieu of the other. In randomised experiments, the differential effect is of no more or less interest than other treatment contrasts. Differential effects play a special role in certain observational studies in which treatments are not assigned to subjects at random, where differing outcomes may reflect biased assignments rather than effects caused by the treatments. Differential effects are immune to certain types of unobserved bias, called generic biases, which are associated with both treatments in a similar way. This is explored using several examples and models. Differential effects are not immune to differential biases, whose possible consequences are examined by sensitivity analysis. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
573
586
http://hdl.handle.net/10.1093/biomet/93.3.573
text/html
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Fitting binary regression models with case-augmented samples
In a case-augmented study, measurements on a random sample from a population are augmented by information from an independent sample of cases, that is units with some characteristic of interest. We show that inferences about the effect of the covariates on the probability of being a case can be made by fitting a modified prospective likelihood. We also show that this procedure is fully efficient. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/93.2.385
text/html
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:149-1622013-03-04RePEc:oup:biomet
article
Bayesian nonparametric functional data analysis through density estimation
In many modern experimental settings, observations are obtained in the form of functions and interest focuses on inferences about a collection of such functions. We propose a hierarchical model that allows us simultaneously to estimate multiple curves nonparametrically by using dependent Dirichlet process mixtures of Gaussian distributions to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of conductivity and temperature at depth data in the north Atlantic. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
149
162
http://hdl.handle.net/10.1093/biomet/asn054
application/pdf
Access to full text is restricted to subscribers.
Abel Rodríguez
David B. Dunson
Alan E. Gelfand
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:859-8792013-03-04RePEc:oup:biomet
article
A hybrid estimator in nonlinear and generalised linear mixed effects models
A hybrid method that combines Laplace's approximation and Monte Carlo simulations to evaluate integrals in the likelihood function is proposed for estimation of the parameters in nonlinear mixed effects models that assume a normal parametric family for the random effects. Simulations show that these parametric estimates of fixed effects are close to the nonparametric estimates even though the mixing distribution is far from the assumed normal parametric family. An asymptotic theory of this hybrid method for parametric estimation without requiring the true mixing distribution to belong to the assumed parametric family is developed to explain these results. This hybrid method and its asymptotic theory are also extended to generalised linear mixed effects models. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
859
879
Tze Leung Lai
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:229-2332013-03-04RePEc:oup:biomet
article
An examination of the effect of heterogeneity on the estimation of population size using capture-recapture data
Part of the folklore of capture-recapture experiments is that ignoring heterogeneity of capture probabilities results in a downward bias. This has been based on experience and simulation studies but is often interpreted as being due to individuals with lower capture probabilities. Here estimating equation arguments are used to show that the effect on Horvitz--Thompson-type estimators of ignoring heterogeneity in capture-recapture experiments is to introduce a downward bias. The arguments are extended to continuous-time experiments and to an influence function constructed to determine the effect of a small number of individuals with heterogeneous capture probabilities in an otherwise homogeneous population and the influence function is shown to be negative. The downward bias holds even if the small number of heterogeneous individuals have capture probabilities larger than the homogeneous majority, and this is confirmed by simulations. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
229
233
http://hdl.handle.net/10.1093/biomet/92.1.229
text/html
Access to full text is restricted to subscribers.
Wen-Han Hwang
Richard Huggins
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:809-8252013-03-04RePEc:oup:biomet
article
Generalized Spatial Dirichlet Process Models
Many models for the study of point-referenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zero-mean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probability-weighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stick-breaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
809
825
http://hdl.handle.net/10.1093/biomet/asm071
application/pdf
Access to full text is restricted to subscribers.
Jason A. Duan
Michele Guindani
Alan E. Gelfand
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:83-932013-03-04RePEc:oup:biomet
article
Optimal two-level regular fractional factorial block and split-plot designs
We propose a general and unified approach to the selection of regular fractional factorial designs, which can be applied to experiments that are unblocked, blocked or have a split-plot structure. Our criterion is derived as a good surrogate for the model-robustness criterion of information capacity. In the case of random block effects, it takes the ratio of intra- and interblock variances into account. In most of the cases we have examined, there exist designs that are optimal for all values of that ratio. Examples of optimal designs that depend on the ratio are provided. We also demonstrate that our criterion can further discriminate designs that cannot be distinguished by the existing minimum-aberration criteria. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
83
93
http://hdl.handle.net/10.1093/biomet/asn066
application/pdf
Access to full text is restricted to subscribers.
Ching-Shui Cheng
Pi-Wen Tsai
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:387-4022013-03-04RePEc:oup:biomet
article
Estimating a treatment effect with repeated measurements accounting for varying effectiveness duration
To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured over time for the same subject. The difference in their means may characterize a treatment effect. Since treatment effectiveness lag and saturation times may exist, erosion of treatment effect often occurs during the observation period. Instead of using models based on ad hoc parametric or purely nonparametric time-varying coefficients, we model the treatment effectiveness durations, which are the time intervals between the lag and saturation times. Then we use some mean response models to include such treatment effectiveness durations. Our methodology is demonstrated by simulations and analysis of a landmark <sc>HIV</sc>/<sc>AIDS</sc> clinical trial of short-course nevirapine against mother-to-child <sc>HIV</sc> vertical transmission during labour and delivery. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
387
402
http://hdl.handle.net/10.1093/biomet/asm019
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Chen
J. Yang
S. Cheng
J. B. Jackson
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:381-3972013-03-04RePEc:oup:biomet
article
Simultaneous confidence bands in spectral density estimation
We propose a method for the construction of simultaneous confidence bands for a smoothed version of the spectral density of a Gaussian process based on nonparametric kernel estimators obtained by smoothing the periodogram. A studentized statistic is used to determine the width of the band at each frequency and a frequency-domain bootstrap approach is employed to estimate the distribution of the supremum of this statistic over all frequencies. We prove by means of strong approximations that the bootstrap estimates consistently the distribution of the supremum deviation of interest and, consequently, that the proposed confidence bands achieve asymptotically the desired simultaneous coverage probability. The behaviour of our method in finite-sample situations is investigated by simulations and a real-life data example demonstrates its applicability in time series analysis. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
381
397
http://hdl.handle.net/10.1093/biomet/asn005
application/pdf
Access to full text is restricted to subscribers.
Michael H. Neumann
Efstathios Paparoditis
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:787-8072013-03-04RePEc:oup:biomet
article
Population-Based Reversible Jump Markov Chain Monte Carlo
We present an extension of population-based Markov chain Monte Carlo to the transdimensional case. A major challenge is that of simulating from high- and transdimensional target measures. In such cases, Markov chain Monte Carlo methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods to deal with such problems, and give a result proving the uniform ergodicity of these population algorithms, under mild assumptions. This result is used to demonstrate the superiority, in terms of convergence rate, of a population transition kernel over a reversible jump sampler for a Bayesian variable selection problem. We also give an example of a population algorithm for a Bayesian multivariate mixture model with an unknown number of components. This is applied to gene expression data of 1000 data points in six dimensions and it is demonstrated that our algorithm outperforms some competing Markov chain samplers. In this example, we show how to combine the methods of parallel chains (Geyer, 1991), tempering (Geyer & Thompson, 1995), snooker algorithms (Gilks et al., 1994), constrained sampling and delayed rejection (Green & Mira, 2001). Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
787
807
http://hdl.handle.net/10.1093/biomet/asm069
application/pdf
Access to full text is restricted to subscribers.
Ajay Jasra
David A. Stephens
Christopher C. Holmes
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:953-9662013-03-04RePEc:oup:biomet
article
Inverse probability weighting for clustered nonresponse
Correlated nonresponse within clusters arises in certain survey settings. It is often represented by a random effects model and assumed to be cluster-specific nonignorable, in the sense that survey and nonresponse outcomes are conditionally independent given cluster-level random effects. Two basic forms of inverse probability weights are considered: response propensity weights based on a marginal model, and weights based on predicted random effects. It is shown that both approaches can lead to biased estimation under cluster-specific nonignorable nonresponse, when the cluster sample sizes are small. We propose a new form of weighted estimator based upon conditional logistic regression, which can avoid this bias. An associated estimator of variance and an extension to observational studies with clustered treatment assignment are also described. Properties of the alternative estimators are illustrated in a small simulation study. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
953
966
http://hdl.handle.net/10.1093/biomet/asr058
application/pdf
Access to full text is restricted to subscribers.
C. J. Skinner
D'arrigo
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:363-3822013-03-04RePEc:oup:biomet
article
Estimating vaccine efficacy from small outbreaks
Let C-sub-V and C-sub-0 denote the number of cases among vaccinated and unvaccinated individuals, respectively, and let &ugr; be the proportion of individuals vaccinated. The quantity ê = 1--(1--&ugr;)C-sub-V/(&ugr;C-sub-0) = 1--(relative attack rate) is the most used estimator of the effectiveness of a vaccine to protect against infection. For a wide class of vaccine responses, a family of transmission models and three types of community settings, this paper investigates what ê actually estimates. It does so under the assumption that the community is large and the vaccination coverage is adequate to prevent major outbreaks of the infectious disease, so that only data on minor outbreaks are available. For a community of homogeneous individuals who mix uniformly, it is found that ê estimates a quantity with the interpretation of 1--(mean susceptibility, per contact, of vaccinees relative to unvaccinated individuals). We provide a standard error for ê in this setting. For a community with some heterogeneity ê can be a very misleading estimator of the effectiveness of the vaccine. When individuals have inherent differences, ê estimates a quantity that depends also on the inherent susceptibilities of different types of individual and on the vaccination coverage for different types. For a community of households, ê estimates a quantity that depends on the rate of transmission within households and on the reduction in infectivity induced by the vaccine. In communities that are structured, into households or age-groups, it is possible that ê estimates a value that is negative even when the vaccine reduces both susceptibility and infectivity. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
363
382
Niels G. Becker
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:157-1692013-03-04RePEc:oup:biomet
article
Random effects Cox models: A Poisson modelling approach
We propose a Poisson modelling approach to nested random effects Cox proportional hazards models. An important feature of this approach is that the principal results depend only on the first and second moments of the unobserved random effects. The orthodox best linear unbiased predictor approach to random effects Poisson modelling techniques enables us to justify appropriate consistency and optimality. The explicit expressions for the random effects given by our approach facilitate incorporation of a relatively large number of random effects. The use of the proposed methods is illustrated through the reanalysis of data from a large-scale cohort study of particulate air pollution and mortality previously reported by Pope et al. (1995). Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
157
169
Renjun Ma
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:319-3262013-03-04RePEc:oup:biomet
article
Bayesian empirical likelihood
Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
319
326
Nicole A. Lazar
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:583-5982013-03-04RePEc:oup:biomet
article
Functional mixed effects spectral analysis
In many experiments, time series data can be collected from multiple units and multiple time series segments can be collected from the same unit. This article introduces a mixed effects Cramér spectral representation which can be used to model the effects of design covariates on the second-order power spectrum while accounting for potential correlations among the time series segments collected from the same unit. The transfer function is composed of a deterministic component to account for the population-average effects and a random component to account for the unit-specific deviations. The resulting log-spectrum has a functional mixed effects representation where both the fixed effects and random effects are functions in the frequency domain. It is shown that, when the replicate-specific spectra are smooth, the log-periodograms converge to a functional mixed effects model. A data-driven iterative estimation procedure is offered for the periodic smoothing spline estimation of the fixed effects, penalized estimation of the functional covariance of the random effects, and unit-specific random effects prediction via the best linear unbiased predictor. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
583
598
http://hdl.handle.net/10.1093/biomet/asr032
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
Martica Hall
Wensheng Guo
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:419-4342013-03-04RePEc:oup:biomet
article
Hierarchical models for assessing variability among functions
In many applications of functional data analysis, summarising functional variation based on fits, without taking account of the estimation process, runs the risk of attributing the estimation variation to the functional variation, thereby overstating the latter. For example, the first eigenvalue of a sample covariance matrix computed from estimated functions may be biased upwards. We display a set of estimated neuronal Poisson-process intensity functions where this bias is substantial, and we discuss two methods for accounting for estimation variation. One method uses a random-coefficient model, which requires all functions to be fitted with the same basis functions. An alternative method removes the same-basis restriction by means of a hierarchical Gaussian process model. In a small simulation study the hierarchical Gaussian process model outperformed the randomcoefficient model and greatly reduced the bias in the estimated first eigenvalue that would result from ignoring estimation variability. For the neuronal data the hierarchical Gaussian process estimate of the first eigenvalue was much smaller than the naive estimate that ignored variability due to function estimation. The neuronal setting also illustrates the benefit of incorporating alignment parameters into the hierarchical scheme. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
419
434
http://hdl.handle.net/10.1093/biomet/92.2.419
text/html
Access to full text is restricted to subscribers.
Sam Behseta
Robert E. Kass
Garrick L. Wallstrom
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:497-5052013-03-04RePEc:oup:biomet
article
Posterior probability intervals in Bayesian wavelet estimation
We use saddlepoint approximation to derive credible intervals for Bayesian wavelet regression estimates. Simulations show that the resulting intervals perform better than the best existing method. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
497
505
C. Semadeni
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:831-8402013-03-04RePEc:oup:biomet
article
Estimation in a simple random effects model with nonnormal distributions
A simple structural model is considered involving the addition of two random variables representing between- and within-group variation. Methods for estimating the cumulants of the two components of variation are proposed, based on homogeneous polynomials in the data. Emphasis is placed on situations in which the number of observations per group is quite small. In some cases an essentially unique estimator is available, whereas in others there is a family of possible consistent estimators. The choice of the polynomial is considered. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
831
840
D. R. Cox
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:285-2962013-03-04RePEc:oup:biomet
article
Marginal tests with sliced average variance estimation
We present a new computationally feasible test for the dimension of the central subspace in a regression problem based on sliced average variance estimation. We also provide a marginal coordinate test. Under the null hypothesis, both the test of dimension and the marginal coordinate test involve test statistics that asymptotically have chi-squared distributions given normally distributed predictors, and have a distribution that is a linear combination of chi-squared distributions in general. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
285
296
http://hdl.handle.net/10.1093/biomet/asm021
application/pdf
Access to full text is restricted to subscribers.
Yongwu Shao
R. Dennis Cook
Sanford Weisberg
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:719-7232013-03-04RePEc:oup:biomet
article
Probabilistic model for two dependent circular variables
Motivated by problems in molecular biology and molecular physics, we propose a five-parameter torus analogue of the bivariate normal distribution for modelling the distribution of two circular random variables. The conditional distributions of the proposed distribution are von Mises. The marginal distributions are symmetric around their means and are either unimodal or bimodal. The type of shape depends on the configuration of parameters, and we derive the conditions that ensure a specific shape. The utility of the proposed distribution is illustrated by the modelling of angular variables in a short linear peptide. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
719
723
Harshinder Singh
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:741-7462013-03-04RePEc:oup:biomet
article
Robust variance estimation for rate ratio parameter estimates from individually matched case-control data
The asymptotic variance and robust variance estimators of rate ratios estimated using conditional logistic regression from individually-matched case-control data are derived when the presumed proportional hazards model is misspecified. The robust variance estimators are easily computed using Schoenfeld residuals generated from standard partial likelihood estimation software for failure time data. Simulation studies indicate that the robust variance estimators perform well for typical sizes and that the 'rare disease' version should be adequate for all practical purposes. It was also found that model misspecification must be quite extreme before the model-based, i.e. inverse information, variance is significantly biased and that the robust variance estimators are somewhat more variable than the model-based. We conclude that the model-based variance estimator can be used when model misspecification is not severe. The robust estimator should be used when the presumed model clearly fits the data poorly. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
741
746
Anny Hui Xiang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:197-2102013-03-04RePEc:oup:biomet
article
Spectral methods for nonstationary spatial processes
<?Pub Caret> We propose a nonstationary periodogram and various parametric approaches for estimating the spectral density of a nonstationary spatial process. We also study the asymptotic properties of the proposed estimators via shrinking asymptotics, assuming the distance between neighbouring observations tends to zero as the size of the observation region grows without bound. With this type of asymptotic model we can uniquely determine the spectral density, avoiding the aliasing problem. We also present a new class of nonstationary processes, based on a convolution of local stationary processes. This model has the advantage that the model is simultaneously defined everywhere, unlike 'moving window' approaches, but it retains the attractive property that, locally in small regions, it behaves like a stationary spatial process. Applications include the spatial analysis and modelling of air pollution data provided by the US Environmental Protection Agency. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
197
210
Montserrat Fuentes
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:539-5522013-03-04RePEc:oup:biomet
article
A sequential particle filter method for static models
Particle filter methods are complex inference procedures, which combine importance sampling and Monte Carlo schemes in order to explore consistently a sequence of multiple distributions of interest. We show that such methods can also offer an efficient estimation tool in 'static' set-ups, in which case &pgr;(&thgr; | y-sub-1, …, y-sub-N) (n < N) is the only posterior distribution of interest but the preliminary exploration of partial posteriors &pgr;(&thgr; | y-sub-1, …, y-sub-n) makes it possible to save computing time. A complete algorithm is proposed for independent or Markov models. Our method is shown to challenge other common estimation procedures in terms of robustness and execution time, especially when the sample size is important. Two classes of examples, mixture models and discrete generalised linear models, are discussed and illustrated by numerical results. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
539
552
Nicolas Chopin
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:567-5762013-03-04RePEc:oup:biomet
article
On the geometry of measurement error models
The problem of undertaking inference in the classical linear model when the covariates have been measured with error is investigated from a geometric point of view. Under the assumption that the measurement error is small, relative to the total variation in the data, a new model is proposed which has good inferential properties. An inference technique which exploits the geometric structure is shown to be computationally simple, efficient and robust to measurement error. The method proposed is illustrated by simulation studies. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
567
576
Paul Marriott
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:271-2822013-03-04RePEc:oup:biomet
article
Empirical-likelihood-based semiparametric inference for the treatment effect in the two-sample problem with censoring
To compare two samples of censored data, we propose a unified method of semi-parametric inference for the parameter of interest when the model for one sample is parametric and that for the other is nonparametric. The parameter of interest may represent, for example, a comparison of means, or survival probabilities. The confidence interval derived from the semiparametric inference, which is based on the empirical likelihood principle, improves its counterpart constructed from the common estimating equation. The empirical likelihood ratio is shown to be asymptotically chi-squared. Simulation experiments illustrate that the method based on the empirical likelihood substantially outperforms the method based on the estimating equation. A real dataset is analysed. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
271
282
http://hdl.handle.net/10.1093/biomet/92.2.271
text/html
Access to full text is restricted to subscribers.
Yong Zhou
Hua Liang
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:715-7272013-03-04RePEc:oup:biomet
article
A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials
This paper considers the problem of comparing a new treatment with a control based on multiple endpoints. The hypotheses are formulated with the goal of showing that the treatment is equivalent, i.e. not inferior, on all endpoints and superior on at least one endpoint compared to the control, where thresholds for equivalence and superiority are specified for each endpoint. Roy's (1953) union-intersection and Berger's (1982) intersection-union principles are employed to derive the basic test. It is shown that the critical constants required for the union-intersection test of superiority can be sharpened by a careful analysis of its type I error rate. The composite UI-IU test is illustrated by an example and compared in a simulation study to alternative tests proposed by Bloch et al. (2001) and Perlman & Wu (2004). The Bloch et al. test does not control the type I error rate because of its nonmonotone nature, and is hence not recommended. The UI-IU and the Perlman & Wu tests both control the type I error rate, but the latter test generally has a slightly higher power. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
715
727
Ajit C. Tamhane
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:737-7462013-03-04RePEc:oup:biomet
article
The Stein–James estimator for short- and long-memory Gaussian processes
We investigate the mean squared error of the Stein--James estimator for the mean when the observations are generated from a Gaussian vector stationary process with dimension greater than two. First, assuming that the process is short-memory, we evaluate the mean squared error, and compare it with that for the sample mean. Then a sufficient condition for the Stein--James estimator to improve upon the sample mean is given in terms of the spectral density matrix around the origin. We repeat the analysis for Gaussian vector long-memory processes. Numerical examples clearly illuminate the Stein--James phenomenon for dependent samples. The results have the potential to improve the usual trend estimator in time series regression models. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
737
746
http://hdl.handle.net/10.1093/biomet/92.3.737
text/html
Access to full text is restricted to subscribers.
Masanobu Taniguchi
Junichi Hirukawa
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:367-3782013-03-04RePEc:oup:biomet
article
On the inefficiency of the adaptive design for monitoring clinical trials
Adaptive designs, which allow the sample size to be modified based on sequentially computed observed treatment differences, have been advocated recently for monitoring clinical trials. Although such methods have a great deal of appeal on the surface, we show that such methods are inefficient and that one can improve uniformly on such adaptive designs using standard group-sequential tests based on the sequentially computed likelihood ratio test statistic. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
367
378
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:351-3702013-03-04RePEc:oup:biomet
article
Conditional Akaike information for mixed-effects models
This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, cAIC. The penalty term in cAIC is related to the effective degrees of freedom ρ for a linear mixed model proposed by Hodges & Sargent (2001); ρ reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The cAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data application is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
351
370
http://hdl.handle.net/10.1093/biomet/92.2.351
text/html
Access to full text is restricted to subscribers.
Florin Vaida
Suzette Blanchard
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:389-3992013-03-04RePEc:oup:biomet
article
Analysing longitudinal count data with overdispersion
In many biomedical studies, longitudinal count data comprise repeated responses and a set of multidimensional covariates for a large number of individuals. When the response variable in such models is subject to overdispersion, the overdispersion parameter influences the marginal variance. In such cases, the overdispersion parameter plays a significant role in efficient estimation of the regression parameters. This raises the need for joint estimation of the regression parameters and the overdispersion parameter, the longitudinal correlations being nuisance parameters. In this paper, we develop a generalised estimating equations approach based on a general autocorrelation structure for the repeated overdispersed data. The asymptotic properties of the estimators of the main parameters are discussed, and the estimation methodology is illustrated by analysing data on epileptic seizure counts. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
389
399
Vandna Jowaheer
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:663-6842013-03-04RePEc:oup:biomet
article
Efficient restricted estimators for conditional mean models with missing data
Consider a conditional mean model with missing data on the response or explanatory variables due to two-phase sampling or nonresponse. Robins et al. (1994) introduced a class of augmented inverse-probability-weighted estimators, depending on a vector of functions of explanatory variables and a vector of functions of coarsened data. Tsiatis (2006) studied two classes of restricted estimators, class 1 with both vectors restricted to finite-dimensional linear subspaces and class 2 with the first vector of functions restricted to a finite-dimensional linear subspace. We introduce a third class of restricted estimators, class 3, with the second vector of functions restricted to a finite-dimensional subspace. We derive a new estimator, which is asymptotically optimal in class 1, by the methods of nonparametric and empirical likelihood. We propose a hybrid strategy to obtain estimators that are asymptotically optimal in class 1 and locally optimal in class 2 or class 3. The advantages of the hybrid, likelihood estimator based on classes 1 and 3 are shown in a simulation study and a real-data example. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
663
684
http://hdl.handle.net/10.1093/biomet/asr007
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:601-6112013-03-04RePEc:oup:biomet
article
On recovering a population covariance matrix in the presence of selection bias
This paper considers the problem of using observational data in the presence of selection bias to identify causal effects in the framework of linear structural equation models. We propose a criterion for testing whether or not observed statistical dependencies among variables are generated by conditioning on a common response variable. When the answer is affirmative, we further provide formulations for recovering the covariance matrix of the whole population from that of the selected population. The results of this paper provide guidance for reliable causal inference, based on the recovered covariance matrix obtained from the statistical information with selection bias. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
601
611
http://hdl.handle.net/10.1093/biomet/93.3.601
text/html
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:861-8722013-03-04RePEc:oup:biomet
article
Aalen Additive Hazards Change-Point Model
We study a test comparing the full Aalen additive hazards model and the change-point model, and suggest how to estimate the parameters of the change-point model. We also study a test for no change-point effect. Both tests are provided with large sample properties and a resampling method is applied to obtain p-values. The finite-sample properties of the proposed inference procedures and estimators are assessed through a simulation study. The methods are further applied to a dataset concerning myocardial infarction. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asm054
application/pdf
Access to full text is restricted to subscribers.
Torben Martinussen
Thomas H. Scheike
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:257-2632013-03-04RePEc:oup:biomet
article
Asymptotic inference for a nonstationary double <sc>AR</sc>(1) model
We investigate the nonstationary double <sc>ar(1)</sc> model, <disp-formula><graphic xlink:href="asm084ueq1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></disp-formula> where ω > 0, α > 0, the η<sub>t</sub> are independent standard normal random variables and Elog |φ + η<sub>t</sub>√α| ⩾ 0. We show that the maximum likelihood estimator of (φ, α) is consistent and asymptotically normal. Combination of this result with that in Ling ([11]) for the stationary case gives the asymptotic normality of the maximum likelihood estimator of φ for any φ in the real line, with a root-n rate of convergence. This is in contrast to the results for the classical <sc>ar(1)</sc> model, corresponding to α = 0. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
257
263
http://hdl.handle.net/10.1093/biomet/asm084
application/pdf
Access to full text is restricted to subscribers.
Shiqing Ling
Dong Li
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:760-7662013-03-04RePEc:oup:biomet
article
The high-dimension, low-sample-size geometric representation holds under mild conditions
High-dimension, low-small-sample size datasets have different geometrical properties from those of traditional low-dimensional data. In their asymptotic study regarding increasing dimensionality with a fixed sample size, Hall et al. (2005) showed that each data vector is approximately located on the vertices of a regular simplex in a high-dimensional space. A perhaps unappealing aspect of their result is the underlying assumption which requires the variables, viewed as a time series, to be almost independent. We establish an equivalent geometric representation under much milder conditions using asymptotic properties of sample covariance matrices. We discuss implications of the results, such as the use of principal component analysis in a high-dimensional space, extension to the case of nonindependent samples and also the binary classification problem. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
760
766
http://hdl.handle.net/10.1093/biomet/asm050
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
Keith M. Muller
Yueh-Yun Chi
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:241-2472013-03-04RePEc:oup:biomet
article
A note on path-based variable selection in the penalized proportional hazards model
We propose an efficient and adaptive shrinkage method for variable selection in the Cox model. The method constructs a piecewise-linear regularization path connecting the maximum partial likelihood estimator and the origin. Then a model is selected along the path. We show that the constructed path is adaptive in the sense that, with a proper choice of regularization parameter, the fitted model works as well as if the true underlying submodel were given in advance. A modified algorithm of the least-angle-regression type efficiently computes the entire regularization path of the new estimator. Furthermore, we show that, with a proper choice of shrinkage parameter, the method is consistent in variable selection and efficient in estimation. Simulation shows that the new method tends to outperform the lasso and the smoothly-clipped-absolute-deviation estimators with moderate samples. We apply the methodology to data concerning nursing homes. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
241
247
http://hdl.handle.net/10.1093/biomet/asm083
application/pdf
Access to full text is restricted to subscribers.
Hui Zou
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:329-3422013-03-04RePEc:oup:biomet
article
On the accelerated failure time model for current status and interval censored data
This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time model for current status and interval censored data. The estimator is constructed by inverting a Wald-type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo based resampling method is proposed for obtaining simultaneously the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored datasets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite-sample performance of the new estimators. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
329
342
http://hdl.handle.net/10.1093/biomet/93.2.329
text/html
Access to full text is restricted to subscribers.
Lu Tian
Tianxi Cai
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:893-9042013-03-04RePEc:oup:biomet
article
The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression
We propose a robust curve and surface estimator based on M-type estimators and penalty-based smoothing. This approach also includes an application to wavelet regression. The concept of pseudo data, a transformation of the robust additive model to the one with bounded errors, is used to derive some theoretical properties and also motivate a computational algorithm. The resulting algorithm, termed the es-algorithm, is computationally fast and provides a simple way of choosing the amount of smoothing. Moreover, it is easily described, straightforwardly implemented and can be extended to other wavelet regression settings such as irregularly spaced data and image denoising. Results from a simulation study and real data examples demonstrate the promising empirical properties of the proposed approach. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asm064
application/pdf
Access to full text is restricted to subscribers.
Hee-Seok Oh
Douglas W. Nychka
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:246-2482013-03-04RePEc:oup:biomet
article
A multi-move sampler for estimating non-Gaussian time series models: Comments on Shephard & Pitt (1997)
This note points out a problem in the multi-move sampler as proposed by Shephard & Pitt (1997) and provides an alternative correct formulation. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
246
248
Toshiaki Watanabe
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:135-1482013-03-04RePEc:oup:biomet
article
Discrete-transform approach to deconvolution problems
If Fourier series are used as the basis for inference in deconvolution problems, the effects of the errors factorise out in a way that is easily exploited empirically. This property is the consequence of elementary addition formulae for sine and cosine functions, and is not readily available when one is using methods based on other orthogonal series or on continuous Fourier transforms. It allows relatively simple estimators to be constructed, founded on the addition of finite series rather than on integration. The performance of these methods can be particularly effective when edge effects are involved, since cosine series estimators are quite resistant to boundary problems. In this context we point to the advantages of trigonometric-series methods for density deconvolution; they have better mean squared error performance when edge effects are involved, they are particularly easy to code, and they admit a simple approach to empirical choice of smoothing parameter, in which a version of thresholding, familiar in wavelet-based inference, is used in place of conventional smoothing. Applications to other deconvolution problems are briefly discussed. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
135
148
http://hdl.handle.net/10.1093/biomet/92.1.135
text/html
Access to full text is restricted to subscribers.
Peter Hall
Peihua Qiu
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:667-6782013-03-04RePEc:oup:biomet
article
Nonparametric inference in multivariate mixtures
We consider mixture models in which the components of data vectors from any given subpopulation are statistically independent, or independent in blocks. We argue that if, under this condition of independence, we take a nonparametric view of the problem and allow the number of subpopulations to be quite general, the distributions and mixing proportions can often be estimated root-n consistently. Indeed, we show that, if the data are k-variate and there are p subpopulations, then for each p ⩾ 2 there is a minimal value of k, k-sub-p say, such that the mixture problem is always nonparametrically identifiable, and all distributions and mixture proportions are nonparametrically identifiable when k ⩾ k-sub-p. We treat the case p = 2 in detail, and there we show how to construct explicit distribution, density and mixture-proportion estimators, converging at conventional rates. Other values of p can be addressed using a similar approach, although the methodology becomes rapidly more complex as p increases. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
667
678
http://hdl.handle.net/10.1093/biomet/92.3.667
text/html
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1011-10172013-03-04RePEc:oup:biomet
article
Multivariate logistic models
The multivariate logistic transform is a reparameterisation of cell probabilities in terms of marginal logistic contrasts. It is known that an arbitrary set of logistic contrasts may not correspond to a valid joint distribution. In this paper we present an efficient algorithm for detecting whether or not the inverse transform exists, and for computing it if it does. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1011
1017
http://hdl.handle.net/10.1093/biomet/93.4.1011
text/html
Access to full text is restricted to subscribers.
Bahjat F. Qaqish
Anastasia Ivanova
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:835-8482013-03-04RePEc:oup:biomet
article
Locally efficient semiparametric estimators for functional measurement error models
A class of semiparametric estimators are proposed in the general setting of functional measurement error models. The estimators follow from estimating equations that are based on the semiparametric efficient score derived under a possibly incorrect distributional assumption for the unobserved 'measured with error' covariates. It is shown that such estimators are consistent and asymptotically normal even with misspecification and are efficient if computed under the truth. The methods are demonstrated with a simulation study of a quadratic logistic regression model with measurement error. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
835
848
http://hdl.handle.net/10.1093/biomet/91.4.835
text/html
Access to full text is restricted to subscribers.
Anastasios A. Tsiatis
Yanyuan Ma
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:99-1122013-03-04RePEc:oup:biomet
article
Robust and efficient estimation under data grouping
The minimum Hellinger distance estimator is known to have desirable properties in terms of robustness and efficiency. We propose an approximate minimum Hellinger distance estimator by adapting the approach to grouped data from a continuous distribution. It is easier to compute the approximate version for either the continuous data or the grouped data. Given certain conditions on the model distribution and reasonable grouping rules, the approximate minimum Hellinger distance estimator is shown to be consistent and asymptotically normal. Furthermore, it is robust and can be asymptotically as efficient as the maximum likelihood estimator. The merit of the estimator is demonstrated through simulation studies and real data examples. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
99
112
http://hdl.handle.net/10.1093/biomet/93.1.99
text/html
Access to full text is restricted to subscribers.
Nan Lin
Xuming He
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:481-4882013-03-04RePEc:oup:biomet
article
The prognostic analogue of the propensity score
The propensity score collapses the covariates of an observational study into a single measure summarizing their joint association with treatment conditions; prognostic scores summarize covariates' association with potential responses. As with propensity scores, stratification on prognostic scores brings to uncontrolled studies a concrete and desirable form of balance, a balance that is more familiar as an objective of experimental control. Like propensity scores, prognostic scores can reduce the dimension of the covariate, yet causal inferences conditional on them are as valid as are inferences conditional only on the unreduced covariate. As a method of adjustment unto itself, prognostic scoring has limitations not shared with propensity scoring, but it holds promise as a complement to the propensity score, particularly in certain designs for which unassisted propensity adjustment is difficult or infeasible. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asn004
application/pdf
Access to full text is restricted to subscribers.
Ben B. Hansen
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:573-5852013-03-04RePEc:oup:biomet
article
Influence functions and robust Bayes and empirical Bayes small area estimation
We introduce new robust small area estimation procedures based on area-level models. We first find influence functions corresponding to each individual area-level observation by measuring the divergence between the posterior density functions of regression coefficients with and without that observation. Next, based on these influence functions, properly standardized, we propose some new robust Bayes and empirical Bayes small area estimators. The mean squared errors and estimated mean squared errors of these estimators are also found. A small simulation study compares the performance of the robust and the regular empirical Bayes estimators. When the model variance is larger than the sample variance, the proposed robust empirical Bayes estimators are superior. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
573
585
http://hdl.handle.net/10.1093/biomet/asn030
application/pdf
Access to full text is restricted to subscribers.
Malay Ghosh
Tapabrata Maiti
Ananya Roy
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:613-6272013-03-04RePEc:oup:biomet
article
Bayesian inference for Markov processes with diffusion and discrete components
Data arising in certain radio-tracking experiments consist of both a continuous spatial component and a discrete component related to behaviour. This leads naturally to stochastic models with a state space which is a product of continuous and discrete components. We consider a class of such models in continuous time, which can be thought of as diffusions in random environments. They are related to switching diffusion or hidden Markov models, but observations are made on both components at discrete time points, so that neither component is completely 'hidden'. We describe and illustrate an approach to fully Bayesian inference for these general models. The algorithm used is a hybrid Markov chain Monte Carlo method. The diffusion parameters, the environment parameters and the sample path of the environment process itself are updated separately, in sequence, and the individual steps are a mixture of Gibbs and random walk Metropolis--Hastings types. Some implementation and model checking issues are discussed, and an example using data arising from a radio-tracking experiment is described. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
613
627
P. G. Blackwell
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:799-8122013-03-04RePEc:oup:biomet
article
Covariance reducing models: An alternative to spectral modelling of covariance matrices
We introduce covariance reducing models for studying the sample covariance matrices of a random vector observed in different populations. The models are based on reducing the sample covariance matrices to an informational core that is sufficient to characterize the variance heterogeneity among the populations. They possess useful equivariance properties and provide a clear alternative to spectral models for covariance matrices. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
799
812
http://hdl.handle.net/10.1093/biomet/asn052
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:99-1122013-03-04RePEc:oup:biomet
article
Likelihood inference in nearest-neighbour classification models
Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based inference. Our method takes the form of a generalised linear regression on a set of k-nearest-neighbour autocovariates. By defining the k-nearest-neighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearest-neighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional k-nearest-neighbour algorithm as special cases. Empirical results suggest that the method out-performs the standard k-nearest-neighbour method in terms of misclassification rate on a wide variety of datasets. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
99
112
Christopher C. Holmes
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:491-5122013-03-04RePEc:oup:biomet
article
Expected-posterior prior distributions for model selection
We consider the problem of comparing parametric models using a Bayesian approach. A new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach. The idea is to define the priors for all models from a common underlying predictive distribution, in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques. The approach has subjective Bayesian and default Bayesian implementations, and overcomes the most significant impediment to Bayesian model selection, that of ensuring that prior distributions for the various models are appropriately compatible. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
491
512
Jose M. Perez
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:75-892013-03-04RePEc:oup:biomet
article
Covariate-adjusted regression
We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate-adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/92.1.75
text/html
Access to full text is restricted to subscribers.
Damla Şenturk
Hans-Georg Muller
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:305-3192013-03-04RePEc:oup:biomet
article
Weighted estimating equations for semiparametric transformation models with censored data from a case-cohort design
In a case-cohort design introduced by Prentice (1986), covariates are assembled only for a subcohort randomly selected from the entire cohort, and any additional cases outside the subcohort. Semiparametric transformation models are considered here for failure time data from the case-cohort design. Weighted estimating equations are proposed for estimation of the regression parameters. The estimation procedure of survival probability at given covariate levels is also provided. Asymptotic properties are derived for the estimators using finite population sampling theory, U-statistics theory and martingale convergence results. The finite-sample properties of the proposed estimators, as well as the efficiency relative to the full cohort estimators, are assessed via simulation studies. A case-cohort dataset from the Atherosclerosis Risk in Communities study is used to illustrate the estimating procedure. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
305
319
Lan Kong
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:939-9522013-03-04RePEc:oup:biomet
article
A Hybrid Pairwise Likelihood Method
A modification to the pairwise likelihood method is proposed, which aims to improve the estimation of the marginal distribution parameters. This is achieved by replacing the pairwise likelihood score equations, for estimating such parameters, by the optimal linear combinations of the marginal score functions. A further advantage of the proposed estimator of marginal parameters, over pairwise likelihood, is that it is robust to misspecification of the bivariate distributions as long as the univariate marginal distributions are correctly specified. While alternating logistic regression can be seen as a special case of the proposed method, it is shown that an existing generalization of alternating logistic regression applicable to ordinal data is not the same as and is inferior to the proposed method because it replaces certain conditional densities by pseudodensities that assume working independence. The fitting of the multivariate negative binomial distribution is another scenario involving intractable likelihood that calls for the use of pairwise likelihood methods, and the superiority of the modified method is demonstrated in a simulation study. Two examples, based on the analyses of salamander mating and patient-controlled analgesia data, demonstrate the usefulness of the proposed method. The possibility of combining optimally the pairwise, rather than marginal, scores is also considered and its difficulty and potential are discussed. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
939
952
http://hdl.handle.net/10.1093/biomet/asm051
application/pdf
Access to full text is restricted to subscribers.
Anthony Y. C. Kuk
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:809-8302013-03-04RePEc:oup:biomet
article
Efficient estimation of covariance selection models
A Bayesian method is proposed for estimating an inverse covariance matrix from Gaussian data. The method is based on a prior that allows the off-diagonal elements of the inverse covariance matrix to be zero, and in many applications results in a parsimonious parameterisation of the covariance matrix. No assumption is made about the structure of the corresponding graphical model, so the method applies to both nondecomposable and decomposable graphs. All the parameters are estimated by model averaging using an efficient Metropolis--Hastings sampling scheme. A simulation study demonstrates that the method produces statistically efficient estimators of the covariance matrix, when the inverse covariance matrix is sparse. The methodology is illustrated by applying it to three examples that are high-dimensional relative to the sample size. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
809
830
Frederick Wong
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:613-6252013-03-04RePEc:oup:biomet
article
On optimal crossover designs when carryover effects are proportional to direct effects
There are a number of different models for crossover designs which take account of carryover effects. Since it seems plausible that a treatment with a large direct effect should generally have a larger carryover effect, Kempton et al. (2001) considered a model where the carryover effects are proportional to the direct effects. The advantage of this model lies in the fact that there are fewer parameters to be estimated. Its problem lies in the nonlinearity of the estimators. Kempton et al. (2001) considered the least squares estimator. They point out that this estimator is asymptotically equivalent to the estimator in a linear model which assumes the true parameters to be known. For this estimator they determine optimal designs numerically for some cases. The present paper generalises some of their results. Our results are derived with the help of a generalisation of the methods used in Kunert & Martin (2000). Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
613
625
http://hdl.handle.net/10.1093/biomet/93.3.613
text/html
Access to full text is restricted to subscribers.
R. A. Bailey
J. Kunert
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:745-7542013-03-04RePEc:oup:biomet
article
Empirical supremum rejection sampling
Rejection sampling thins out samples from a candidate density from which it is easy to simulate, to obtain samples from a more awkward target density. A prerequisite is knowledge of the finite supremum of the ratio of the target and candidate densities. This severely restricts application of the method because it can be difficult to calculate the supremum. We use theoretical argument and numerical work to show that a practically perfect sample may be obtained by replacing the exact supremum with the maximum obtained from simulated candidates. We also provide diagnostics for failure of the method caused by a bad choice of candidate distribution. The implication is that essentially no theoretical work is required to apply rejection sampling in many practical cases. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
745
754
Brian S. Caffo
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:221-2282013-03-04RePEc:oup:biomet
article
A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome
Outcome-dependent sampling designs have been shown to be a cost-effective way to enhance study efficiency. We show that the outcome-dependent sampling design with a continuous outcome can be viewed as an extension of the two-stage case-control designs to the continuous-outcome case. We further show that the two-stage outcome-dependent sampling has a natural link with the missing-data and biased-sampling frameworks. Through the use of semiparametric inference and missing-data techniques, we show that a certain semiparametric maximum-likelihood estimator is computationally convenient and achieves the semiparametric efficient information bound. We demonstrate this both theoretically and through simulation. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
221
228
http://hdl.handle.net/10.1093/biomet/asn073
application/pdf
Access to full text is restricted to subscribers.
Rui Song
Haibo Zhou
Michael R. Kosorok
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:997-10012013-03-04RePEc:oup:biomet
article
On consistency of Kendall's tau under censoring
Necessary and sufficient conditions for consistency of a simple estimator of Kendall's tau under bivariate censoring are presented. The results are extended to data subject to bivariate left truncation as well as right censoring. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asn037
application/pdf
Access to full text is restricted to subscribers.
David Oakes
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:279-2882013-03-04RePEc:oup:biomet
article
A construction method for orthogonal Latin hypercube designs
The Latin hypercube design is a popular choice of experimental design when computer simulation is used to study a physical process. These designs guarantee uniform samples for the marginal distribution of each single input. A number of methods have been proposed for extending the uniform sampling to higher dimensions.We show how to construct Latin hypercube designs in which all main effects are orthogonal. Our method can also be used to construct Latin hypercube designs with low correlation of first-order and second-order terms. Our method generates orthogonal Latin hypercube designs that can include many more factors than those proposed by Ye (1998). Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
279
288
http://hdl.handle.net/10.1093/biomet/93.2.279
text/html
Access to full text is restricted to subscribers.
David M. Steinberg
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:291-3032013-03-04RePEc:oup:biomet
article
Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data
Sequentially ordered multivariate failure time data are often observed in biomedical studies and inter-event, or gap, times are often of interest. Generally, standard hazard regression methods cannot be applied to the gap times because of identifiability issues and induced dependent censoring. We propose estimating equations for fitting proportional hazards regression models to the gap times. Model parameters are shown to be consistent and asymptotically normal. Simulation studies reveal the appropriateness of the asymptotic approximations in finite samples. The proposed methods are applied to renal failure data to assess the association between demographic covariates and both time until wait-listing and time from wait-listing to kidney transplantation. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
291
303
Douglas E. Schaubel
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:893-9122013-03-04RePEc:oup:biomet
article
Efficient balanced sampling: The cube method
A balanced sampling design is defined by the property that the Horvitz--Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
893
912
http://hdl.handle.net/10.1093/biomet/91.4.893
text/html
Access to full text is restricted to subscribers.
Jean-Claude Deville
Yves Tille
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:673-6892013-03-04RePEc:oup:biomet
article
Optimal adaptive randomized designs for clinical trials
Optimal decision-analytic designs are deterministic. Such designs are appropriately criticized in the context of clinical trials because they are subject to assignment bias. On the other hand, balanced randomized designs may assign an excessive number of patients to a treatment arm that is performing relatively poorly. We propose a compromise between these two extremes, one that achieves some of the good characteristics of both. We introduce a constrained optimal adaptive design for a fully sequential randomized clinical trial with k arms and n patients. An r-design is one for which, at each allocation, each arm has probability at least r of being chosen, 0 ⩽ r ⩽ 1/k. An optimal design among all r-designs is called r-optimal. An r<sub>1</sub>-design is also an r<sub>2</sub>-design if r<sub>1</sub> ⩾ r<sub>2</sub>. A design without constraint is the special case r = 0 and a balanced randomized design is the special case r = 1/k. The optimization criterion is to maximize the expected overall utility in a Bayesian decision-analytic approach, where utility is the sum over the utilities for individual patients over a 'patient horizon' N. We prove analytically that there exists an r-optimal design such that each patient is assigned to a particular one of the arms with probability 1 − (k − 1)r, and to the remaining arms with probability r. We also show that the balanced design is asymptotically r-optimal for any given r, 0 ⩽ r < 1/k, as N/n → ∞. This implies that every r-optimal design is asymptotically optimal without constraint. Numerical computations using backward induction for k = 2 arms show that, in general, this asymptotic optimality feature for r-optimal designs can be accomplished with moderate trial size n if the patient horizon N is large relative to n. We also show that, in a trial with an r-optimal design, r < 1/2, fewer patients are assigned to an inferior arm than when following a balanced design, even for r-optimal designs having the same statistical power as a balanced design. We discuss extensions to various clinical trial settings. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
673
689
http://hdl.handle.net/10.1093/biomet/asm049
application/pdf
Access to full text is restricted to subscribers.
Yi Cheng
Donald A. Berry
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:775-7892013-03-04RePEc:oup:biomet
article
Nonparametric estimation of the variogram and its spectrum
In the study of intrinsically stationary spatial processes, a new nonparametric variogram estimator is proposed through its spectral representation. The methodology is based on estimation of the variogram's spectrum by solving a regularized inverse problem through quadratic programming. The estimated variogram is guaranteed to be conditionally negative-definite. Simulation shows that our estimator is flexible and generally has smaller mean integrated squared error than the parametric estimator under model misspecification. Our methodology is applied to a spatial dataset of decadal temperature changes. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
775
789
http://hdl.handle.net/10.1093/biomet/asr056
application/pdf
Access to full text is restricted to subscribers.
Chunfeng Huang
Tailen Hsing
Noel Cressie
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:711-7202013-03-04RePEc:oup:biomet
article
Sudoku-based space-filling designs
Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game board is a nine-by-nine grid of numbers from one to nine. Several entries within the grid are provided and the remaining entries must be filled in subject to no row, column, or three-by-three subsquare containing duplicate numbers. By exploiting these three types of uniformity, we propose an approach to constructing a new type of design, called a Sudoku-based space-filling design. Such a design can be divided into groups of subdesigns so that the complete design and each subdesign achieve maximum uniformity in univariate and bivariate margins. Examples are given illustrating the proposed construction method. Applications of such designs include computer experiments with qualitative and quantitative factors, linking parameters in engineering and crossvalidation. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
711
720
http://hdl.handle.net/10.1093/biomet/asr024
application/pdf
Access to full text is restricted to subscribers.
Xu Xu
BEN Haaland
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:1-172013-03-04RePEc:oup:biomet
article
Modelling pairwise dependence of maxima in space
We model pairwise dependence of temporal maxima, such as annual maxima of precipitation, that have been recorded in space, either on a regular grid or at irregularly spaced locations. The construction of our estimators stems from the variogram concept. The asymptotic properties of our pairwise dependence estimators are established through properties of empirical processes. The performance of our approach is illustrated by simulations and by the treatment of a real dataset. In addition to bringing new results about the asymptotic behaviour of copula estimators, the latter being linked to first-order variograms, one main advantage of our approach is to propose a simple connection between extreme value theory and geostatistics. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
1
17
http://hdl.handle.net/10.1093/biomet/asp001
application/pdf
Access to full text is restricted to subscribers.
Philippe Naveau
Armelle Guillou
Daniel Cooley
Jean Diebolt
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:327-3392013-03-04RePEc:oup:biomet
article
Likelihood for component parameters
For a statistical model with data, likelihood for the scalar or vector full parameter &thgr;, of dimension p say, is typically well defined and easily computed. In this paper, we investigate likelihood for a component parameter &psgr;(&thgr;) of dimension d < p and make use of the recent likelihood theory that has been successful in producing highly accurate third-order p-values for scalar parameters of continuous models. The theory leads under moderate regularity to a definitive third-order determination of likelihood for a component parameter &psgr;(&thgr;) of dimension d, where 1 <= d <= p. We use the simple location model on the plane with standard normal errors to motivate the development. The example exhibits most of the key characteristics of the general case and the recent theory then extends the determination of likelihood to the general context. For the scalar interest parameter case with d = 1, the usual determinations are typically of second-order accuracy; the example indicates how the new determination achieves third-order accuracy. The implementation is straightforward and uses familiar ingredients to other determinations, such as the full maximum likelihood value &thgr;ˆ, the constrained value &thgr;˜-sub-&psgr; given &psgr;(&thgr;) = &psgr;, and the observed information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;) for a complementing nuisance parameter &lgr;(&thgr;). It does however require a special version of the nuisance information j-sub-&lgr;&lgr;(&thgr;ˆ-sub-&psgr;), a version calibrated relative to a symmetric choice of the exponential-type reparameterisation &phgr;(&thgr;) underlying the recent theory, but this is easily computed. Various examples are given and the motivating example is discussed in detail. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
327
339
D. A. S. Fraser
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:537-5542013-03-04RePEc:oup:biomet
article
Efficient Bayesian inference for Gaussian copula regression models
A Gaussian copula regression model gives a tractable way of handling a multivariate regression when some of the marginal distributions are non-Gaussian. Our paper presents a general Bayesian approach for estimating a Gaussian copula model that can handle any combination of discrete and continuous marginals, and generalises Gaussian graphical models to the Gaussian copula framework. Posterior inference is carried out using a novel and efficient simulation method. The methods in the paper are applied to simulated and real data. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
537
554
http://hdl.handle.net/10.1093/biomet/93.3.537
text/html
Access to full text is restricted to subscribers.
Michael Pitt
David Chan
Robert Kohn
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:243-2472013-03-04RePEc:oup:biomet
article
Construction of orthogonal and nearly orthogonal Latin hypercubes
We propose a method for constructing orthogonal or nearly orthogonal Latin hypercubes. The method yields a large Latin hypercube by coupling an orthogonal array of index unity with a small Latin hypercube. It is shown that the large Latin hypercube inherits the exact or near orthogonality of the small Latin hypercube. Thus, effort for searching for large Latin hypercubes, that are exactly or nearly orthogonal, can be focussed on finding small Latin hypercubes with the same property. We obtain a useful collection of orthogonal or nearly orthogonal Latin hypercubes, which have a large factor-to-run ratio and the results are often much more economical than existing methods. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
243
247
http://hdl.handle.net/10.1093/biomet/asn064
application/pdf
Access to full text is restricted to subscribers.
C. Devon Lin
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:199-2162013-03-04RePEc:oup:biomet
article
Estimation of a covariance matrix with zeros
We consider estimation of the covariance matrix of a multivariate random vector under the constraint that certain covariances are zero. We first present an algorithm, which we call iterative conditional fitting, for computing the maximum likelihood estimate of the constrained covariance matrix, under the assumption of multivariate normality. In contrast to previous approaches, this algorithm has guaranteed convergence properties. Dropping the assumption of multivariate normality, we show how to estimate the covariance matrix in an empirical likelihood approach. These approaches are then compared via simulation and on an example of gene expression. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
199
216
http://hdl.handle.net/10.1093/biomet/asm007
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Mathias Drton
Thomas S. Richardson
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:679-7012013-03-04RePEc:oup:biomet
article
Principal component models for correlation matrices
Distributional theory regarding principal components is less well developed for correlation matrices than it is for covariance matrices. The intent of this paper is to reduce this disparity. Methods are proposed that enable investigators to fit and to make inferences about flexible principal components models for correlation matrices. The models allow arbitrary eigenvalue multiplicities and allow the distinct eigenvalues to be modelled parametrically or nonparametrically. Local parameterisations and implicit functions are used to construct full-rank unconstrained parameterisations. First-order asymptotic distributions are obtained directly from the theory of estimating functions. Second-order accurate distributions for making inferences under normality are obtained directly from likelihood theory. Simulation studies show that the Bartlett correction is effective in controlling the size of the tests and that first-order approximations to nonnull distributions are reasonably accurate. The methods are illustrated on a dataset. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
679
701
Robert J. Boik
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:767-7672013-03-04RePEc:oup:biomet
article
'Nonparametric inference in multivariate mixtures'<break/>Biometrika (2005), 92, pp. 667–678
The left-hand side of equation (2·8), on p. 671, should read {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) rather than {(1 − π<sub>1</sub>)/π<sub>1</sub>}-super-1/2 (2π<sub>1</sub> − 1). Reflecting this change, the left-hand side of equation (3·1) on the same page should be altered to <inline-formula><mml:math><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mo ver><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>Ȣ 7;</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn>< /mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml :msub><mml:mrow><mml:mover><mml:mrow><mml:mi>π</mml:mi></mml:mrow>< mml:mrow><mml:mi>∧</mml:mi></mml:mrow></mml:mover></mml:mrow><mml:m row><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mo>−</mml:mo>< mml:mn>1/2</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mrow><mml:move r><mml:mrow><mml:mi>π</mml:mi></mml:mrow><mml:mrow><mml:mi>∧ </mml:mi></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></m ml:mrow></mml:msub><mml:mo>−</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, and the formula at the foot of p. 677 should be modified to {π<sub>1</sub> (1 − π<sub>1</sub>)}-super-−1/2 (2π<sub>1</sub> − 1) + O<sub>p</sub>(n-super-−1/2). No other formula is affected, and the left-hand side of (2·8) is still increasing in π<sub>1</sub>. The numerical results, discussed in §4, are influenced in minor ways. In the simulation study, absolute bias is reduced, and variance is either slightly increased or slightly decreased. In the real-data example, using the nonparametric approach to analysis, mean squared error is further reduced, from 0·0011 to 0·0004. We are grateful to Hiro Kasahara and Katsumi Shimotsu for pointing out the error. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
767
767
http://hdl.handle.net/10.1093/biomet/asm042
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:249-2502013-03-04RePEc:oup:biomet
article
'Statistical assessment of bilateral symmetry of shapes'
1
2005
92
March
Biometrika
249
250
http://hdl.handle.net/10.1093/biomet/92.1.249-a
text/html
Access to full text is restricted to subscribers.
K. V. Mardia
F. L. Bookstein
I. J. Moreton
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:791-8082013-03-04RePEc:oup:biomet
article
Exponential functionals and means of neutral-to-the-right priors
The mean of a random distribution chosen from a neutral-to-the-right prior can be represented as the exponential functional of an increasing additive process. This fact is exploited in order to give sufficient conditions for the existence of the mean of a neutral-to-the-right prior and for the absolute continuity of its probability distribution. Moreover, expressions for its moments, of any order, are provided. For illustrative purposes we consider a generalisation of the neutral-to-the-right prior based on the gamma process and the beta-Stacy process. Finally, by resorting to the maximum entropy algorithm, we obtain an approximation to the probability density function of the mean of a neutral-to-the-right prior. The arguments are easily extended to examine means of posterior quantities. The numerical results obtained are compared to those yielded by the application of some well-established simulation algorithms. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
791
808
Ilenia Epifani
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:953-9642013-03-04RePEc:oup:biomet
article
A Jackknife Variance Estimator for Unistage Stratified Samples with Unequal Probabilities
Existing jackknife variance estimators used with sample surveys can seriously overestimate the true variance under unistage stratified sampling without replacement with unequal probabilities. A novel jackknife variance estimator is proposed which is as numerically simple as existing jackknife variance estimators. Under certain regularity conditions, the proposed variance estimator is consistent under stratified sampling without replacement with unequal probabilities. The high entropy regularity condition necessary for consistency is shown to hold for the Rao--Sampford design. An empirical study of three unequal probability sampling designs supports our findings. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
953
964
http://hdl.handle.net/10.1093/biomet/asm072
application/pdf
Access to full text is restricted to subscribers.
Yves G. Berger
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:215-2202013-03-04RePEc:oup:biomet
article
On Bartlett correction of empirical likelihood in the presence of nuisance parameters
Lazar & Mykland (1999) showed that an empirical likelihood defined by two estimating equations with a nuisance parameter need not be Bartlett-correctable. This paper shows that Bartlett correction of empirical likelihood in the presence of a nuisance parameter depends critically on the way the nuisance parameter is removed when formulating the likelihood for the parameter of interest. We establish in the broad framework of estimating functions that the empirical likelihood is still Bartlett-correctable if the nuisance parameter is profiled out given the value of the parameter of interest. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
215
220
http://hdl.handle.net/10.1093/biomet/93.1.215
text/html
Access to full text is restricted to subscribers.
Song Xi Chen
Hengjian Cui
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:603-6122013-03-04RePEc:oup:biomet
article
A modified likelihood ratio statistic for some nonregular models
Higher-order approximations to the distribution of the likelihood ratio statistic are considered for a class of nonregular models in which the maximum likelihood estimator of the parameter of interest is asymptotically distributed according to an exponential, rather than a normal, distribution. Asymptotic behaviour of this type often arises when the boundary of the support of the distributions under consideration depends on &thgr;. A modified likelihood ratio statistic is proposed that follows its asymptotic distribution to a high degree of approximation, and this statistic is illustrated on several examples. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
603
612
Thomas A. Severini
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:529-5422013-03-04RePEc:oup:biomet
article
Integrated likelihood functions for non-Bayesian inference
Consider a model with parameter θ = (ψ, λ), where ψ is the parameter of interest, and let L(ψ, λ) denote the likelihood function. One approach to likelihood inference for ψ is to use an integrated likelihood function, in which λ is eliminated from L(ψ, λ) by integrating with respect to a density function π(λ|ψ). The goal of this paper is to consider the problem of selecting π(λ|ψ) so that the resulting integrated likelihood function is useful for non-Bayesian likelihood inference. The desirable properties of an integrated likelihood function are analyzed and these suggest that π(λ|ψ) should be chosen by finding a nuisance parameter ϕ that is unrelated to ψ and then taking the prior density for ϕ to be independent of ψ. Such an unrelated parameter is constructed and the resulting integrated likelihood is shown to be closely related to the modified profile likelihood. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
529
542
http://hdl.handle.net/10.1093/biomet/asm040
application/pdf
Access to full text is restricted to subscribers.
Thomas A. Severini
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:937-9512013-03-04RePEc:oup:biomet
article
Optimal calibration estimators in survey sampling
We show that the model-calibration estimator for the finite population mean, which was proposed by Wu & Sitter (2001) through an intuitive argument, is optimal among a class of calibration estimators. We also present optimal calibration estimators for the finite population distribution function, the population variance, the variance of a linear estimator and other quadratic finite population functions under a unified framework. The proposed calibration estimators are optimal under the true model but remain design consistent even if the working model is misspecified. A limited simulation study shows that the improvement of these optimal estimators over the conventional ones can be substantial. The question of when and how auxiliary information can be used for both the estimation of the population mean using a generalised regression estimator and the estimation of its variance through calibration is addressed clearly under the proposed general methodology. Some fundamental issues in using auxiliary information from survey data are also addressed in the context of optimal estimation. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
937
951
Changbao Wu
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:231-2422013-03-04RePEc:oup:biomet
article
Optimal sufficient dimension reduction for the conditional mean in multivariate regression
The aim of this article is to develop optimal sufficient dimension reduction methodology for the conditional mean in multivariate regression. The context is roughly the same as that of a related method by Cook & Setodji (2003), but the new method has several advantages. It is asymptotically optimal in the sense described herein and its test statistic for dimension always has a chi-squared distribution asymptotically under the null hypothesis. Additionally, the optimal method allows tests of predictor effects. A comparison of the two methods is provided. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
231
242
http://hdl.handle.net/10.1093/biomet/asm003
application/pdf
Access to full text is restricted to subscribers.
Jae Keun Yoo
R. Dennis Cook
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:485-4912013-03-04RePEc:oup:biomet
article
Generalised minimum aberration construction results for symmetrical orthogonal arrays
Generalised minimum aberration is a recently-established design criterion for the whole class of orthogonal arrays and fractional factorial designs. The criterion is, as its name suggests, a generalisation of minimum aberration for regular designs and of minimum G-sub-2-aberration for twolevel designs. The aim of the criterion is to find designs which minimise in a certain sense the aliasing between main effects and interactions. In this paper, theoretical results are developed for finding symmetrical orthogonal arrays with generalised minimum aberration for more than two factor levels. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
485
491
http://hdl.handle.net/10.1093/biomet/92.2.485
text/html
Access to full text is restricted to subscribers.
Neil A. Butler
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:365-3792013-03-04RePEc:oup:biomet
article
Modelling multiple time series via common factors
We propose a new method for estimating common factors of multiple time series. One distinctive feature of the new approach is that it is applicable to some nonstationary time series. The unobservable, nonstationary factors are identified by expanding the white noise space step by step, thereby solving a high-dimensional optimization problem by several low-dimensional sub-problems. Asymptotic properties of the estimation are investigated. The proposed methodology is illustrated with both simulated and real datasets. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
365
379
http://hdl.handle.net/10.1093/biomet/asn009
application/pdf
Access to full text is restricted to subscribers.
Jiazhu Pan
Qiwei Yao
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:487-4952013-03-04RePEc:oup:biomet
article
Testing goodness-of-fit in logistic case-control studies
We present a goodness-of-fit test for the logistic regression model under case-control sampling. The test statistic is constructed via a discrepancy between two competing kernel density estimators of the underlying conditional distributions given case-control status. The proposed goodness-of-fit test is shown to compare very favourably with previously proposed tests for case-control sampling in terms of power. The test statistic can be easily computed as a quadratic form in the residuals from a prospective logistic regression maximum likelihood fit. In addition, the proposed test is affine invariant and has an alternative representation in terms of empirical characteristic functions. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
487
495
http://hdl.handle.net/10.1093/biomet/asm033
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:359-3742013-03-04RePEc:oup:biomet
article
Permutation tests for equality of distributions in high-dimensional settings
Motivated by applications in high-dimensional settings, we suggest a test of the hypothesis H-sub-0 that two sampled distributions are identical. It is assumed that two independent datasets are drawn from the respective populations, which may be very general. In particular, the distributions may be multivariate or infinite-dimensional, in the latter case representing, for example, the distributions of random functions from one Euclidean space to another. Our test uses a measure of distance between data. This measure should be symmetric but need not satisfy the triangle inequality, so it is not essential that it be a metric. The test is based on ranking the pooled dataset, with respect to the distance and relative to any fixed data value, and repeating this operation for each fixed datum. A permutation argument enables a critical point to be chosen such that the test has concisely known significance level, conditional on the set of all pairwise distances. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
359
374
Peter Hall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:137-1462013-03-04RePEc:oup:biomet
article
Orthogonal arrays robust to nonnegligible two-factor interactions
Regular fractional factorial designs with clear two-factor interactions provide a useful class of designs that are robust to nonnegligible two-factor interactions. In this paper, the concept of clear two-factor interactions is generalised to orthogonal arrays. The new concept leads to a much wider class of designs robust to nonnegligible two-factor interactions. We study the existence and construction of such designs. The designs we construct have a structure that render themselves particularly attractive in the robust parameter design setting. We also discuss an interesting connection between designs with clear two-factor interactions and mixed orthogonal arrays. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
137
146
http://hdl.handle.net/10.1093/biomet/93.1.137
text/html
Access to full text is restricted to subscribers.
Boxin Tang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:859-8742013-03-04RePEc:oup:biomet
article
Bayesian nonparametric inference on stochastic ordering
We consider Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of restricted dependent Dirichlet process priors. These priors have full support in the space of stochastically ordered distributions, and can be used for collections of unknown mixture distributions to obtain a flexible class of mixture models. Theoretical properties are discussed, efficient methods are developed for posterior computation using Markov chain Monte Carlo simulation and the methods are illustrated using data from a study of DNA damage and repair. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
859
874
http://hdl.handle.net/10.1093/biomet/asn043
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
Shyamal D. Peddada
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:367-3832013-03-04RePEc:oup:biomet
article
Nonparametric estimation with left-truncated semicompeting risks data
Nonparametric estimators for competing risks data can be applied to semicompeting risks data, a type of multi-state data where a terminating event may censor a nonterminating event, after forcing the data into the competing risks format. Complications may arise with left truncation of the terminating event, where the competing risks analysis naively truncates the nonterminating event using the left-truncation time for the terminating event, which may lead to large efficiency losses. We propose nonparametric estimators which use all semicompeting risks information and do not require artificial truncation. The uniform consistency and weak convergence of the estimators are established and variance estimators are provided. Simulation studies and an analysis of a diabetes registry demonstrate large efficiency gains over the naive estimators. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
367
383
http://hdl.handle.net/10.1093/biomet/93.2.367
text/html
Access to full text is restricted to subscribers.
L. Peng
J. P. Fine
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:201-2112013-03-04RePEc:oup:biomet
article
On fuzzy familywise error rate and false discovery rate procedures for discrete distributions
Fuzzy multiple comparisons procedures are introduced as a solution to the problem of multiple comparisons for discrete test statistics. The critical function of the randomized p-values is proposed as a measure of evidence against the null hypotheses. The classical concept of randomized tests is extended to multiple comparisons. This approach makes all theory of multiple comparisons developed for continuously distributed statistics automatically applicable to the discrete case. Examples of familywise error rate and false discovery rate procedures are discussed and an application to linkage disequilibrium testing is given. Software for implementing the procedures is available. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
201
211
http://hdl.handle.net/10.1093/biomet/asn061
application/pdf
Access to full text is restricted to subscribers.
Elena Kulinskaya
Alex Lewin
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:861-8752013-03-04RePEc:oup:biomet
article
Influence functions and outlier detection under the common principal components model: A robust approach
The common principal components model for several groups of multivariate observations assumes equal principal axes but different variances along these axes among the groups. Influence functions for plug-in and projection-pursuit estimates under a common principal component model are obtained. Asymptotic variances are derived from them. Outlier detection is possible using partial influence functions. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
861
875
Graciela Boente
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:371-3842013-03-04RePEc:oup:biomet
article
Direction estimation in single-index regressions
We propose a general dimension-reduction method that combines the ideas of likelihood, correlation, inverse regression and information theory. We do not require that the dependence be confined to particular conditional moments, nor do we place restrictions on the predictors or on the regression that are necessary for methods like ordinary least squares and sliced-inverse regression. Although we focus on single-index regressions, the underlying idea is applicable more generally. Illustrative examples are presented. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
371
384
http://hdl.handle.net/10.1093/biomet/92.2.371
text/html
Access to full text is restricted to subscribers.
Xiangrong Yin
R. Dennis Cook
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:486-4892013-03-04RePEc:oup:biomet
article
Understanding nonparametric estimation for clustered data
In this note we give an alternative formulation of the nonparametric estimators of Wang (2003) with the identity link. This results in a closed form of the estimator that has computational advantages and gives insight into the rationale behind the estimator. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
486
489
http://hdl.handle.net/10.1093/biomet/93.2.486
text/html
Access to full text is restricted to subscribers.
Richard Huggins
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:601-6192013-03-04RePEc:oup:biomet
article
Joint modelling of paired sparse functional data using principal components
We propose a modelling framework to study the relationship between two paired longitudinally observed variables. The data for each variable are viewed as smooth curves measured at discrete time-points plus random errors. While the curves for each variable are summarized using a few important principal components, the association of the two longitudinal variables is modelled through the association of the principal component scores. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference. The proposed method can be applied in the difficult case in which the measurement times are irregular and sparse and may differ widely across individuals. Use of functional principal components enhances model interpretation and improves statistical and numerical stability of the parameter estimates. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
601
619
http://hdl.handle.net/10.1093/biomet/asn035
application/pdf
Access to full text is restricted to subscribers.
Lan Zhou
Jianhua Z. Huang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:230-2372013-03-04RePEc:oup:biomet
article
Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys
<?Pub Caret> Design weights in surveys are often adjusted to accommodate auxiliary information and to meet pre-specified range restrictions, typically via some ad hoc algorithmic adjustment to a generalised regression estimator. In this paper, we present a simple solution to this problem using empirical likelihood methods or generalised regression. We first develop algorithms for computing empirical likelihood estimators and model-calibrated empirical likelihood estimators. The first algorithm solves the computational problem of the empirical likelihood method in general, both in survey and non-survey settings, and theoretically guarantees its convergence. The second exploits properties of the model-calibration method and is particularly simple. The algorithms are adapted for handling benchmark constraints and pre-specified range restrictions on the weight adjustments. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
230
237
J. Chen
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:435-4502013-03-04RePEc:oup:biomet
article
Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes
Most methods for analysing cluster-correlated biological data implicitly assume the ignorability of cluster sizes. When this assumption fails, the resulting inferences may be asymptotically invalid. Hoffman et al. (2001) proposed a simple but computationally intensive method, based on a large number of within-cluster resamples and associated separate estimating equations, that leads to asymptotically valid inferences whether the cluster sizes are ignorable or not. We study a simple method, based on a single inverse cluster size-weighted estimating equation, that avoids resampling and yet leads to asymptotically valid inferences. Simulation results are presented to assess the performance of the proposed method. We also propose Wald tests for ignorability of cluster sizes. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
435
450
http://hdl.handle.net/10.1093/biomet/92.2.435
text/html
Access to full text is restricted to subscribers.
E. Benhin
J. N. K. Rao
A. J. Scott
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:425-4382013-03-04RePEc:oup:biomet
article
Effects of the reference set on frequentist inferences
We employ second-order likelihood asymptotics to investigate how ideal frequentist inferences depend on the probability model for the data through more than the likelihood function, referring to this as the effect of the reference set. There are two aspects of higherorder corrections to first-order likelihood methods, namely (i) that involving effects of fitting nuisance parameters and leading to the modified profile likelihood, and (ii) another part pertaining to limitation in adjusted information. Generally, each of these involves a first-order adjustment depending on the reference set. However, we show that, for some important settings, likelihood-irrelevant model specifications have a second-order effect on both of these adjustments; this result includes specification of the censoring model for survival data. On the other hand, for sequential experiments the likelihood-irrelevant specification of the stopping rule has a second-order effect on adjustment (i) but a firstorder effect on adjustment (ii). These matters raise the issue of what are 'ideal' frequentist inferences, since consideration of 'exact' frequentist inferences will not suffice. We indicate that to second order ideal frequentist inferences may be based on the distribution of the ordinary likelihood ratio statistic, without commonly considered adjustments thereto. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
425
438
http://hdl.handle.net/10.1093/biomet/93.2.425
text/html
Access to full text is restricted to subscribers.
Donald A. Pierce
Ruggero Bellio
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:957-9642013-03-04RePEc:oup:biomet
article
The optimal confidence region for a random parameter
Suppose that, under a two-level hierarchical model, the distribution of the vector of random parameters is known or can be estimated well. The data are generated via a fixed, but unobservable, realisation of the vector. We derive the smallest confidence region for a specific component of this random vector under a joint Bayesian/frequentist paradigm. On average this optimal region can be much smaller than the corresponding Bayesian highest posterior density region. The new estimation procedure is especially appealing when one deals with data generated under a highly parallel structure. The new proposal is illustrated with a dataset from a multi-centre clinical study and also with one from a typical microarray experiment. The performance of our procedure is examined via simulation studies. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
957
964
http://hdl.handle.net/10.1093/biomet/92.4.957
text/html
Access to full text is restricted to subscribers.
Hajime Uno
Lu Tian
L. J. Wei
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:457-4612013-03-04RePEc:oup:biomet
article
The sampling properties of conditional independence graphs for structural vector autoregressions
Structural vector autoregressions allow contemporaneous series dependence and assume errors with no contemporaneous correlation. Models of this form, that also have a recursive structure, can be described by a directed acyclic graph. An important tool for identification of these models is the conditional independence graph constructed from the contemporaneous and lagged values of the process. We determine the large-sample properties of statistics used to test for the presence of links in this graph. A simple example illustrates how these results may be applied. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
457
461
Marco Reale
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:37-502013-03-04RePEc:oup:biomet
article
Partial and latent ignorability in missing-data problems
When an assumption of missing at random is untenable, it becomes necessary to model missing-data indicators, which carry information about the parameters of the complete-data population. Within a given application, however, researchers may believe that some aspects of missingness are ignorable but others are not. We argue that there are two different ways to formalize the notion that only part of the missingness is ignorable. These approaches correspond to assumptions that we call partially missing at random and latently missing at random. We explain these concepts and apply them in a latent-class analysis of survey questions with item nonresponse. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
37
50
http://hdl.handle.net/10.1093/biomet/asn069
application/pdf
Access to full text is restricted to subscribers.
Ofer Harel
Joseph L. Schafer
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:383-3922013-03-04RePEc:oup:biomet
article
Multimodality of the likelihood in the bivariate seemingly unrelated regressions model
We analyse the simplest two-equation seemingly unrelated regressions model and demonstrate that its likelihood may have up to five stationary points, and thus there may be up to three local modes. Consequently the estimates obtained via iterative estimation methods may depend on starting values. We further show that the probability of multimodality vanishes asymptotically. Monte Carlo simulations suggest that multimodality rarely occurs if the seemingly unrelated regressions model is true, but can become more frequent if the model is misspecified. The existence of multimodality in the likelihood for seemingly unrelated regressions models contradicts several claims in the literature. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
383
392
Mathias Drton
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:277-2912013-03-04RePEc:oup:biomet
article
Gamma frailty transformation models for multivariate survival times
We propose a class of transformation models for multivariate failure times. The class of transformation models generalize the usual gamma frailty model and yields a marginally linear transformation model for each failure time. Nonparametric maximum likelihood estimation is used for inference. The maximum likelihood estimators for the regression coefficients are shown to be consistent and asymptotically normal, and their asymptotic variances attain the semiparametric efficiency bound. Simulation studies show that the proposed estimation procedure provides asymptotically efficient estimates and yields good inferential properties for small sample sizes. The method is illustrated using data from a cardiovascular study. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
277
291
http://hdl.handle.net/10.1093/biomet/asp008
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:929-9412013-03-04RePEc:oup:biomet
article
A paradox concerning nuisance parameters and projected estimating functions
This paper is concerned with a paradox associated with parameter estimation in the presence of nuisance parameters. In a statistical model with unknown nuisance parameters, the efficiency of an estimator of a parameter usually increases when the nuisance parameters are known. However the opposite phenomenon can sometimes occur. In this paper, we elucidate the occurrence of this paradox by examining estimating functions. In particular, we focus on the projected estimating function, which is defined by the projection of the score function on to a given estimating function. A sufficient condition for the paradox to occur is the orthogonality of the two components of the projected estimating functions corresponding to parameters of interest and nuisance parameters. In addition, a numerical assessment is conducted in the context of a simple model to investigate the improvement of the asymptotic efficiency of estimators. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
929
941
http://hdl.handle.net/10.1093/biomet/91.4.929
text/html
Access to full text is restricted to subscribers.
Masayuki Henmi
Shinto Eguchi
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:399-4102013-03-04RePEc:oup:biomet
article
Optimal testing of multiple hypotheses with common effect direction
We present a theoretical basis for testing related endpoints. Typically, it is known how to construct tests of the individual hypotheses, but not how to combine them into a multiple test procedure that controls the familywise error rate. Using the closure method, we emphasize the role of consonant procedures, from an interpretive as well as a theoretical viewpoint. Surprisingly, even if each intersection test has an optimality property, the overall procedure obtained by applying closure to these tests may be inadmissible. We introduce a new procedure, which is consonant and has a maximin property under the normal model. The results are then applied to PROactive, a clinical trial designed to investigate the effectiveness of a glucose-lowering drug on macrovascular outcomes among patients with type 2 diabetes. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
399
410
http://hdl.handle.net/10.1093/biomet/asp006
application/pdf
Access to full text is restricted to subscribers.
Richard M. Bittman
Joseph P. Romano
Carlos Vallarino
Michael Wolf
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Observation-driven models for Poisson counts
This paper is concerned with a general class of observation-driven models for time series of counts whose conditional distributions given past observations and explanatory variables follow a Poisson distribution. These models provide a flexible framework for modelling a wide range of dependence structures. Conditions for stationarity and ergodicity of these processes are established from which the large-sample properties of the maximum likelihood estimators can be derived. Simulations are provided to give additional insight into the finite-sample behaviour of the estimators. Finally an application to a regression model for daily counts of asthma presentations at a Sydney hospital is described. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
777
790
Richard A. Davis
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:917-9312013-03-04RePEc:oup:biomet
article
Additive hazards models with latent treatment effectiveness lag time
In many clinical trials for evaluating treatment efficacy, it is believed that there may exist latent treatment effectiveness lag times after which medical treatment procedure or chemical compound would be in full effect. In this paper, semiparametric regression models are proposed and studied for estimating the treatment effect accounting for such latent lag times. The new models take advantage of the invariant property of the additive hazards model in marginalising over an additive latent variable; parameters in the models are thus easily estimated and interpreted, while the flexibility of not having to specify the baseline hazard function is preserved. Monte Carlo simulation studies demonstrate the appropriateness of the proposed semiparametric estimation procedure. The methodology is applied to data collected in a randomised clinical trial, which evaluates the efficacy of biodegradable carmustine polymers for treatment of recurrent brain tumours. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
917
931
Y. Q. Chen
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:411-4212013-03-04RePEc:oup:biomet
article
Goodness-of-fit test for complete spatial randomness against mixtures of regular and clustered spatial point processes
A goodness-of-fit test statistic for spatial point processes is proposed and shown to have an asymptotic chi-squared distribution if the underlying point process is Poisson. Simulations demonstrate that the test, when testing for complete spatial randomness, is more sensitive to mixtures of regular and clustered point processes than the tests using the nearest neighbour distance distribution, the second- or third-order characteristics. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
411
421
P. Grabarnik
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:721-7342013-03-04RePEc:oup:biomet
article
Semiparametric model-based inference in the presence of missing responses
We consider a semiparametric model that parameterizes the conditional density of the response, given covariates, but allows the marginal distribution of the covariates to be completely arbitrary. Responses may be missing. A likelihood-based imputation estimator and a semi-empirical-likelihood-based estimator for the parameter vector describing the conditional density are defined and proved to be asymptotically normal. Semi-empirical loglikelihood functions for the parameter vector and the response mean are derived. It is shown that the two semi-empirical loglikelihood functions are distributed asymptotically as weighted χ-super-2 and scaled χ-super-2, respectively. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
721
734
http://hdl.handle.net/10.1093/biomet/asn032
application/pdf
Access to full text is restricted to subscribers.
Qihua Wang
Pengjie Dai
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:763-7752013-03-04RePEc:oup:biomet
article
Analysing panel count data with informative observation times
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximising a conditional likelihood function of observed event counts and solving estimation equations. Large-sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumour study is presented. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
763
775
http://hdl.handle.net/10.1093/biomet/93.4.763
text/html
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Mei-Cheng Wang
Ying Zhang
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:375-3882013-03-04RePEc:oup:biomet
article
Local multiple imputation
Dealing with missing data via parametric multiple imputation methods usually implies stating several strong assumptions both about the distribution of the data and about underlying regression relationships. If such parametric assumptions do not hold, the multiply imputed data are not appropriate and might produce inconsistent estimators and thus misleading results. In this paper, a fully nonparametric and a semiparametric imputation method are studied, both based on local resampling principles. It is shown that the final estimator, based on these local imputations, is consistent under fewer or no parametric assumptions. Asymptotic expressions for bias, variance and mean squared error are derived, showing the theoretical impact of the different smoothing parameters. Simulations illustrate the usefulness and applicability of the method. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
375
388
Marc Aerts
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:23-402013-03-04RePEc:oup:biomet
article
Reference priors for discrete graphical models
The combination of graphical models and reference analysis represents a powerful tool for Bayesian inference in highly multivariate settings. It is typically difficult to derive reference priors in complex problems. In this paper we present a suitable mixed parameterisation for a discrete decomposable graphical model and derive the corresponding reference prior. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
23
40
http://hdl.handle.net/10.1093/biomet/93.1.23
text/html
Access to full text is restricted to subscribers.
Guido Consonni
Valentina Leucari
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:303-3162013-03-04RePEc:oup:biomet
article
Variable selection for multivariate failure time data
In this paper, we propose a penalised pseudo-partial likelihood method for variable selection with multivariate failure time data with a growing number of regression coefficients. Under certain regularity conditions, we show the consistency and asymptotic normality of the penalised likelihood estimators. We further demonstrate that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance. Based on a simple approximation of the penalty function, the proposed method can be easily carried out with the Newton--Raphson algorithm. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures. We illustrate the proposed method by analysing a dataset from the Framingham Heart Study. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
303
316
http://hdl.handle.net/10.1093/biomet/92.2.303
text/html
Access to full text is restricted to subscribers.
Jianwen Cai
Jianqing Fan
Runze Li
Haibo Zhou
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:451-4672013-03-04RePEc:oup:biomet
article
Model diagnosis for parametric regression in high-dimensional spaces
We study tools for checking the validity of a parametric regression model. When the dimension of the regressors is large, many of the existing tests face the curse of dimensionality or require some ordering of the data. Our tests are based on the residual empirical process marked by proper functions of the regressors. They are able to detect local alternatives converging to the null at parametric rates. Parametric and nonparametric alternatives are considered. In the latter case, through a proper principal component decomposition, we are able to derive smooth directional tests which are asymptotically distribution-free under the null model. The new tests take into account precisely the 'geometry of the model'. A simulation study is carried through and an application to a real dataset is illustrated. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
451
467
http://hdl.handle.net/10.1093/biomet/asm095
application/pdf
Access to full text is restricted to subscribers.
W. Stute
W. L. Xu
L. X. Zhu
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:209-2222013-03-04RePEc:oup:biomet
article
Bürmann expansion and test for additivity
We propose a Lagrange multiplier test for additivity based on the Bürmann expansion of a conditional mean function. The asymptotic null distribution of the test is shown to be x-super-2, under some regularity conditions. In contrast, the Lagrange multiplier test proposed by Chen et al. (1995) is based on the Volterra expansion of the conditional mean function. We discuss some desirable advantages of the Bürmann expansion over the Volterra expansion for nonlinear time series modelling. We also reported an empirical study which shows that, in terms of empirical power, the Lagrange multiplier test motivated by the Bürmann expansion outperforms the test of Chen et al. (1995) for the cases for which the Lagrange multiplier test is designed. For other cases for which none of the tests is specifically designed, the empirical powers of the two tests are comparable. Finally, we illustrated the use of the Lagrange multiplier test with a blowfly experimental system. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
209
222
K. S. Chan
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:283-2982013-03-04RePEc:oup:biomet
article
A flexible additive multiplicative hazard model
We present a new additive-multiplicative hazard model which consists of two components. The first component contains additive covariate effects through an additive Aalen model while the second component contains multiplicative covariate effects through a Cox regression model. The Aalen model allows for time-varying covariate effects, while the Cox model allows only a common time-dependence through the baseline. Approximate maximum likelihood estimators are derived by solving the simultaneous score equations for the nonparametric and parametric components of the model. The suggested estimators are provided with large-sample properties and are shown to be efficient. The efficient estimators depend, however, on some estimated weights. We therefore also consider unweighted estimators and describe their large-sample properties. We finally extend the model to allow for time-varying covariate effects in the multiplicative part of the model as well. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
283
298
Torben Martinussen
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:985-9912013-03-04RePEc:oup:biomet
article
Importance Sampling Via the Estimated Sampler
Monte Carlo importance sampling for evaluating numerical integration is discussed. We consider a parametric family of sampling distributions and propose the use of the sampling distribution estimated by maximum likelihood. The proposed method of importance sampling using the estimated sampling distribution is shown to improve the asymptotic variance of the ordinary method using the true sampling distribution. The argument is closely related to the discussion of the paradox in Henmi & Eguchi (2004). We focus on a condition under which the estimated integration value obtained by the proposed method has asymptotic zero variance. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
985
991
http://hdl.handle.net/10.1093/biomet/asm076
application/pdf
Access to full text is restricted to subscribers.
Masayuki Henmi
Ryo Yoshida
Shinto Eguchi
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:705-7222013-03-04RePEc:oup:biomet
article
Empirical Bayes block shrinkage of wavelet coefficients via the noncentral χ-super-2 distribution
Empirical Bayes approaches to the shrinkage of empirical wavelet coefficients have generated considerable interest in recent years. Much of the work to date has focussed on shrinkage of individual wavelet coefficients in isolation. In this paper we propose an empirical Bayes approach to simultaneous shrinkage of wavelet coefficients in a block, based on the block sum of squares. Our approach exploits a useful identity satisfied by the noncentral χ-super-2 density and provides some tractable Bayesian block shrinkage procedures. Our numerical results indicate that the new procedures perform very well. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
705
722
http://hdl.handle.net/10.1093/biomet/93.3.705
text/html
Access to full text is restricted to subscribers.
Xue Wang
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:873-8922013-03-04RePEc:oup:biomet
article
A General Approach to the Predictability Issue in Survival Analysis with Applications
Very often in survival analysis one has to study martingale integrals where the integrand is not predictable and where the counting process theory of martingales is not directly applicable, as for example in nonparametric and semiparametric applications where the integrand is based on a pilot estimate. We call this the predictability issue in survival analysis. The problem has been resolved by approximations of the integrand by predictable functions which have been justified by ad hoc procedures. We present a general approach to the solution of this problem. The usefulness of the approach is shown in three applications. In particular, we argue that earlier ad hoc procedures do not work in higher-dimensional smoothing problems in survival analysis. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
873
892
http://hdl.handle.net/10.1093/biomet/asm062
application/pdf
Access to full text is restricted to subscribers.
Enno Mammen
Jens Perch Nielsen
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:195-2092013-03-04RePEc:oup:biomet
article
Generalised likelihood ratio tests for spectral density
There are few techniques available for testing whether or not a family of parametric times series models fits a set of data reasonably well without serious restrictions on the forms of alternative models. In this paper, we consider generalised likelihood ratio tests of whether or not the spectral density function of a stationary time series admits certain parametric forms. We propose a bias correction method for the generalised likelihood ratio test of Fan et al. (2001). In particular, our methods can be applied to test whether or not a residual series is white noise. Sampling properties of the proposed tests are established. A bootstrap approach is proposed for estimating the null distribution of the test statistics. Simulation studies investigate the accuracy of the proposed bootstrap estimate and compare the power of the various ways of constructing the generalised likelihood ratio tests as well as some classic methods like the Cramer--von Mises and Ljung--Box tests. Our results favour the newly proposed bias reduction method using the local likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
195
209
Jianqing Fan
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:641-6542013-03-04RePEc:oup:biomet
article
Confidence intervals in group sequential trials with random group sizes and applications to survival analysis
A new ordering scheme for defining quantiles of the multivariate distribution of a stopping time and a stopped stochastic process is introduced. This ordering scheme is used in conjunction with resampling methods to construct confidence intervals for a population mean following a group sequential test with random group sizes, and for the regression parameter of a proportional hazards model following a time-sequential clinical trial with censored survival data. It is shown that this approach resolves the long-standing difficulties in inference due to two different time scales in time-sequential trials, and that the confidence intervals thus constructed have coverage probabilities close to the nominal values and provide marked improvements over those based on alternative ordering schemes and normal approximations. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/93.3.641
text/html
Access to full text is restricted to subscribers.
Tze Leung Lai
Wenzhi Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:732-7402013-03-04RePEc:oup:biomet
article
Rank-based regression with repeated measurements data
A rank-based regression method is proposed for repeated measurements data. It is a generalisation of the classical Wilcoxon--Mann--Whitney rank statistic for independent observations. The method is valid under a weak condition on the error terms that can accommodate certain heteroscedasticity and within-subject dependency. The asymptotic normality of the proposed estimator is proved using empirical process theory. A variance estimator, shown to be consistent, is also constructed. The proposed method is illustrated using data from a clinical trial on treating labour pain. Robustness and efficiency of the estimator is demonstrated in simulation studies. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
732
740
Sin-Ho Jung
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:769-7862013-03-04RePEc:oup:biomet
article
Bayesian Nonparametric Estimation of the Probability of Discovering New Species
We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
769
786
http://hdl.handle.net/10.1093/biomet/asm061
application/pdf
Access to full text is restricted to subscribers.
Antonio Lijoi
Ramsés H. Mena
Igor Prünster
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:167-1832013-03-04RePEc:oup:biomet
article
Inference for clustered data using the independence loglikelihood
We use the properties of independence estimating equations to adjust the 'independence' loglikelihood function in the presence of clustering. The proposed adjustment relies on the robust sandwich estimator of the parameter covariance matrix, which is easily calculated. The methodology competes favourably with established techniques based on independence estimating equations; we provide some insight as to why this is so. The adjustment is applied to examples relating to the modelling of wind speed in Europe and annual maximum temperatures in the U.K. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
167
183
http://hdl.handle.net/10.1093/biomet/asm015
application/pdf
Access to full text is restricted to subscribers.
Richard E. Chandler
Steven Bate
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:961-9772013-03-04RePEc:oup:biomet
article
Estimating the false discovery rate using the stochastic approximation algorithm
Testing of multiple hypotheses involves statistics that are strongly dependent in some applications, but most work on this subject is based on the assumption of independence. We propose a new method for estimating the false discovery rate of multiple hypothesis tests, in which the density of test scores is estimated parametrically by minimizing the Kullback--Leibler distance between the unknown density and its estimator using the stochastic approximation algorithm, and the false discovery rate is estimated using the ensemble averaging method. Our method is applicable under general dependence between test statistics. Numerical comparisons between our method and several competitors, conducted on simulated and real data examples, show that our method achieves more accurate control of the false discovery rate in almost all scenarios. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
961
977
http://hdl.handle.net/10.1093/biomet/asn036
application/pdf
Access to full text is restricted to subscribers.
Faming Liang
Jian Zhang
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:891-8982013-03-04RePEc:oup:biomet
article
Minimum aberration construction results for nonregular two-level fractional factorial designs
Nonregular two-level fractional factorial designs are designs which cannot be specified in terms of a set of defining contrasts. The aliasing properties of nonregular designs can be compared by using a generalisation of the minimum aberration criterion called minimum G-sub-2-aberration. Until now, the only nontrivial designs that are known to have minimum G-sub-2-aberration are designs for n runs and m >= n - 5 factors. In this paper, a number of construction results are presented which allow minimum G-sub-2-aberration designs to be found for many of the cases with n = 16, 24, 32, 48, 64 and 96 runs and m >= n/2 - 2 factors. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
891
898
Neil A. Butler
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:717-7232013-03-04RePEc:oup:biomet
article
Estimating subject-specific survival functions under the accelerated failure time model
We use the semiparametric accelerated failure time model to predict the survival function and its related quantities for future subjects with a given set of covariates. We derive the large-sample distribution for the subject-specific cumulative hazard function estimate. We then propose a simple resampling technique for constructing pointwise confidence intervals and simultaneous bands for the corresponding survival function and its quantile function over a properly selected time interval. The new proposals are illustrated with the Mayo primary biliary cirrhosis data. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
717
723
Yuhyun Park
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:303-3132013-03-04RePEc:oup:biomet
article
Linear life expectancy regression with censored data
In the statistical literature, life expectancy is usually characterised by the mean residual life function. Regression models are thus needed to study the association between the mean residual life functions and their covariates. In this paper, we consider a linear mean residual life model and develop inference procedures in the presence of potential censoring. The new model and inference procedures are applied to the Stanford heart transplant data. Semiparametric efficiency calculations and information bounds are also considered. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
303
313
http://hdl.handle.net/10.1093/biomet/93.2.303
text/html
Access to full text is restricted to subscribers.
Y. Q. Chen
S. Cheng
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:237-2422013-03-04RePEc:oup:biomet
article
A note on cause-specific residual life
In medical research, investigators often wish to characterize the distributions of remaining lifetimes. While nonparametric analyses of residual life distributions have been widely studied with independently right-censored data, residual life analysis has not been examined in the competing risks setting, with multiple, potentially dependent, failure types. We define the cause-specific residual life distribution as the residual cumulative incidence function conditionally on survival to a given time. Because of the improper form of the cause-specific distribution, the mean cause-specific residual lifetime does not exist, theoretically. We develop nonparametric inferences for the cause-specific residual life function and its corresponding quantiles, which may exist. Theoretical justification, including uniform consistency and weak convergence, is established. Simulation studies and a breast cancer data analysis demonstrate the practical utility of the methods. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asn063
application/pdf
Access to full text is restricted to subscribers.
J.-H. Jeong
J. P. Fine
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:111-1282013-03-04RePEc:oup:biomet
article
Varying-coefficient models and basis function approximations for the analysis of repeated measurements
<?Pub Caret> A global smoothing procedure is developed using basis function approximations for estimating the parameters of a varying-coefficient model with repeated measurements. Inference procedures based on a resampling subject bootstrap are proposed to construct confidence regions and to perform hypothesis testing. Conditional biases and variances of our estimators and their asymptotic consistency are developed explicitly. Finite sample properties of our procedures are investigated through a simulation study. Application of the proposed approach is demonstrated through an example in epidemiology. In contrast to the existing methods, this approach applies whether or not the covariates are time-invariant and does not require binning of the data when observations are sparse at distinct observation times. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
111
128
Jianhua Z. Huang
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:87-992013-03-04RePEc:oup:biomet
article
The unobserved heterogeneity distribution in duration analysis
In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a general result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
87
99
http://hdl.handle.net/10.1093/biomet/asm013
application/pdf
Access to full text is restricted to subscribers.
Jaap H. Abbring
Gerard J. Van Den Berg
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:667-6782013-03-04RePEc:oup:biomet
article
Additive partial linear models with measurement errors
We consider statistical inference for additive partial linear models when the linear covariate is measured with error. We propose attenuation-to-correction and simulation-extrapolation, simex, estimators of the parameter of interest. It is shown that the first resulting estimator is asymptotically normal and requires no undersmoothing. This is an advantage of our estimator over existing backfitting-based estimators for semiparametric additive models which require undersmoothing of the nonparametric component in order for the estimator of the parametric component to be root-n consistent. This feature stems from a decrease of the bias of the resulting estimator, which is appropriately derived using a profile procedure. A similar characteristic in semiparametric partially linear models was obtained by Wang et al. (2005). We also discuss the asymptotics of the proposed simex approach. Finite-sample performance of the proposed estimators is assessed by simulation experiments. The proposed methods are applied to a dataset from a semen study. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
667
678
http://hdl.handle.net/10.1093/biomet/asn024
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Sally W. Thurston
David Ruppert
Tatiyana Apanasovich
Russ Hauser
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:213-2202013-03-04RePEc:oup:biomet
article
Fast block variance estimation procedures for inhomogeneous spatial point processes
We introduce two new variance estimation procedures that use non-overlapping and overlapping blocks, respectively. The non-overlapping blocks estimator can be viewed as the limit of the thinned block bootstrap estimator recently proposed in Guan Loh (2007), by letting the number of thinned processes and bootstrap samples therein both increase to infinity. The non-overlapping blocks estimator can be obtained quickly since it does not require any thinning or bootstrap steps, and it is more stable. The overlapping blocks estimator further improves the performance of the non-overlapping blocks with a modest increase in computation time. A simulation study demonstrates the superiority of the proposed estimators over the thinned block bootstrap estimator. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
213
220
http://hdl.handle.net/10.1093/biomet/asn072
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:849-8622013-03-04RePEc:oup:biomet
article
A semiparametric changepoint model
A semiparametric changepoint model is considered and the empirical likelihood method is applied to detect the change from a distribution to a weighted distribution in a sequence of independent random variables. The maximum likelihood changepoint estimator is shown to be consistent. The empirical likelihood ratio test statistic is proved to have the same limit null distribution as that with parametric models. A data-based test for the validity of the models is also proposed. Simulation shows the sensitivity and robustness of the semiparametric approach. The methods are applied to some classical datasets such as the Nile River data and stock price data. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
849
862
http://hdl.handle.net/10.1093/biomet/91.4.849
text/html
Access to full text is restricted to subscribers.
Zhong Guan
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:95-1062013-03-04RePEc:oup:biomet
article
Bayesian-inspired minimum aberration two- and four-level designs
Motivated by a Bayesian framework, we propose a new minimum aberration-type criterion for designing experiments with two- and four-level factors. The Bayesian approach helps in overcoming the ad hoc nature of effect ordering in the existing minimum aberration-type criteria. The approach is also capable of distinguishing between qualitative and quantitative factors. Numerous examples are given to demonstrate its advantages. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
95
106
http://hdl.handle.net/10.1093/biomet/asn062
application/pdf
Access to full text is restricted to subscribers.
V. Roshan Joseph
Mingyao AI
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:633-6452013-03-04RePEc:oup:biomet
article
A construction principle for multivariate extreme value distributions
We present a construction principle for the spectral density of a multivariate extreme value distribution. It generalizes the pairwise beta model introduced in the literature recently and may be used to obtain new parametric models from lower dimensional spectral densities. We illustrate the flexibility of this new class of models and apply it to a wind speed dataset. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
633
645
http://hdl.handle.net/10.1093/biomet/asr034
application/pdf
Access to full text is restricted to subscribers.
F. Ballani
M. Schlather
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:489-5072013-03-04RePEc:oup:biomet
article
Diagnostic measures for empirical likelihood of general estimating equations
We develop diagnostic measures for assessing the influence of individual observations when using empirical likelihood with general estimating equations, and we use these measures to construct goodness-of-fit statistics for testing possible misspecification in the estimating equations. Our diagnostics include case-deletion measures, local influence measures and pseudo-residuals. Our goodness-of-fit statistics include the sum of local influence measures and the processes of pseudo-residuals. Simulation studies are conducted to evaluate our methods, and real datasets are analyzed to illustrate the use of our diagnostic measures and goodness-of-fit statistics. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
489
507
http://hdl.handle.net/10.1093/biomet/asm094
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
Heping Zhang
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:455-4632013-03-04RePEc:oup:biomet
article
A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations
We introduce a family of multivariate binary distributions with certain conditional linear property. This family is particularly useful for efficient and easy simulation of correlated binary variables with a given marginal mean vector and correlation matrix. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
455
463
Bahjat F. Qaqish
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:721-7312013-03-04RePEc:oup:biomet
article
Nested orthogonal array-based Latin hypercube designs
We propose two methods for constructing a new type of design, called a nested orthogonal array-based Latin hypercube design, intended for multi-fidelity computer experiments. Such designs are two nested space-filling designs in which the large design achieves stratification in both bivariate and univariate margins and the small design achieves stratification in univariate margins. These designs have better space-filling properties than nested Latin hypercube designs in which the large design possesses uniformity in univariate margins only. The first method expands an ordinary Latin hypercube design to a larger design that achieves uniformity in any one- or two-dimensional projection. The second method uses an orthogonal array with strength two to simultaneously construct a pair of nested orthogonal array-based Latin hypercube designs. Examples are given to illustrate the proposed methods. Sampling properties of the proposed designs are derived. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
721
731
http://hdl.handle.net/10.1093/biomet/asr028
application/pdf
Access to full text is restricted to subscribers.
Xu He
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:738-7422013-03-04RePEc:oup:biomet
article
Measurement exchangeability and normal one-factor models
The one-factor model restricts the covariance structure of the observed variables on the basis of assumptions about their relationship with an unobserved variable. It is hard to justify these assumptions on substantive or empirical grounds. In this paper, alternative measurement models are proposed that are based on exchangeability of variables after admissible scale transformations. They provide an alternative interpretation of the model and do not involve unobserved variables. They also yield a new one-factor model for sum scales. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
738
742
Henk Kelderman
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:487-4932013-03-04RePEc:oup:biomet
article
Some results on D-optimal designs for nonlinear models with applications
Sufficient conditions are established for the locally D$-optimal design for a nonlinear model to have a minimal number of support points. The conditions are applied to obtain locally D-optimal designs for a one-compartment pharmacokinetic model and a Poisson regression model. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
487
493
http://hdl.handle.net/10.1093/biomet/asp004
application/pdf
Access to full text is restricted to subscribers.
Gang Li
Dibyen Majumdar
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:952-9572013-03-04RePEc:oup:biomet
article
On an exact probability matching property of right-invariant priors
The paper considers priors which are right invariant with respect to the Haar measure. It is shown that the posterior coverage probabilities of certain invariant Bayesian predictive regions exactly match the corresponding frequentist probabilities. Several examples are given to illustrate the main result. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
952
957
Thomas A. Severini
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:15-252013-03-04RePEc:oup:biomet
article
Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies
The natural likelihood to use for a case-control study is a 'retrospective' likelihood, i.e. a likelihood based on the probability of exposure given disease status. Prentice & Pyke (1979) showed that, when a logistic regression form is assumed for the probability of disease given exposure, the maximum likelihood estimators and asymptotic covariance matrix of the log odds ratios obtained from the retrospective likelihood are the same as those obtained from the 'prospective' likelihood, i.e. that based on probability of disease given exposure. We prove a similar result for the posterior distribution of the log odds ratios in a Bayesian analysis. This means that the Bayesian analysis of case-control studies may be done using a relatively simple model, the logistic regression model, which treats data as though generated prospectively and which does not involve nuisance parameters for the exposure distribution. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
15
25
Shaun R. Seaman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:551-5662013-03-04RePEc:oup:biomet
article
Generalised structured models
We present a general class of nonlinear regression and time series models that we call generalised structured models. The class is a natural generalisation of generalised additive models, and it includes generalised interaction models, structured volatility models, visual GARCH, generalised autoregressive conditional heteroscedasticity, models and varying coefficient models. We discuss estimation principles including smoothing splines and a generalisation of the projection approach of Mammen et al. (1999). We finish the paper with some theoretical considerations about the asymptotic performance of the estimator for the general class of generalised structured models. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
551
566
Enno Mammen
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:65-802013-03-04RePEc:oup:biomet
article
Quasi-variances
In statistical models of dependence, the effect of a categorical variable is typically described by contrasts among parameters. For reporting such effects, quasi-variances provide an economical and intuitive method which permits approximate inference on any contrast by subsequent readers. Applications include generalised linear models, generalised additive models and hazard models. The present paper exposes the generality of quasi-variances, emphasises the need to control relative errors of approximation, gives simple methods for obtaining quasi-variances and bounds on the approximation error involved, and explores the domain of accuracy of the method. Conditions are identified under which the quasi-variance approximation is exact, and numerical work indicates high accuracy in a variety of settings. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
65
80
David Firth
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:95-1122013-03-04RePEc:oup:biomet
article
Small-area estimation based on natural exponential family quadratic variance function models and survey weights
We propose pseudo empirical best linear unbiased estimators of small-area means based on natural exponential family quadratic variance function models when the basic data consist of survey-weighted estimators of these means, area-specific covariates and certain summary measures involving the weights. We also provide explicit approximate mean squared errors of these estimators in the spirit of Prasad & Rao (1990), and these estimators can be readily evaluated. A simulation study is undertaken to evaluate the performance of the proposed inferential procedure. We estimate also the proportion of poor children in the 5--17 years age-group for the different counties in one of the states in the United States. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
95
112
Malay Ghosh
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:283-3012013-03-04RePEc:oup:biomet
article
Additive hazards Markov regression models illustrated with bone marrow transplant data
When there are covariate effects to be considered, multi-state survival analysis is dominated either by parametric Markov regression models or by semiparametric Markov regression models using Cox's (1972) proportional hazards models for transition intensities between the states. The purpose of this research work is to study alternatives to Cox's model in a general finite-state Markov process setting. We shall look at two alternative models, Aalen's (1989) nonparametric additive hazards model and Lin & Ying's (1994) semiparametric additive hazards model. The former allows the effects of covariates to vary freely over time, while the latter assumes that the regression coefficients are constant over time. With the basic tools of the product integral and the functional delta-method, we present an estimator of the transition probability matrix and develop the large-sample theory for the estimator under each of these two models. Data on 1459 HLA identical sibling transplants for acute leukaemia from the International Bone Marrow Transplant Registry serve as illustration. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
283
301
http://hdl.handle.net/10.1093/biomet/92.2.283
text/html
Access to full text is restricted to subscribers.
Youyi Shu
John P. Klein
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:751-7572013-03-04RePEc:oup:biomet
article
Efficient recursions for general factorisable models
Let n S-valued categorical variables be jointly distributed according to a distribution known only up to an unknown normalising constant. For an unnormalised joint likelihood expressible as a product of factors, we give an algebraic recursion which can be used for computing the normalising constant and other summations. A saving in computation is achieved when each factor contains a lagged subset of the components combining in the joint distribution, with maximum computational efficiency as the subsets attain their minimum size. If each subset contains at most r+1 of the n components in the joint distribution, we term this a lag-r model, whose normalising constant can be computed using a forward recursion in O(S-super-r+1) computations, as opposed to O(S-super-n) for the direct computation. We show how a lag-r model represents a Markov random field and allows a neighbourhood structure to be related to the unnormalised joint likelihood. We illustrate the method by showing how the normalising constant of the Ising or autologistic model can be computed. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
751
757
R. Reeves
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:163-1772013-03-04RePEc:oup:biomet
article
Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data
Two-stage randomisation designs are useful in the evaluation of combination therapies where patients are initially randomised to an induction therapy and then, depending upon their response and consent, are randomised to a maintenance therapy. In this paper we derive the best regular asymptotically linear estimator for the survival distribution and related quantities of treatment regimes. We propose an estimator which is easily computable and is more efficient than existing estimators. Large-sample properties of the proposed estimator are derived and comparisons with other estimators are made using simulation. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
163
177
http://hdl.handle.net/10.1093/biomet/93.1.163
text/html
Access to full text is restricted to subscribers.
Abdus S. Wahed
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:249-2492013-03-04RePEc:oup:biomet
article
'Shape, Procrustes tangent projections and bilateral symmetry'
1
2005
92
March
Biometrika
249
249
http://hdl.handle.net/10.1093/biomet/92.1.249
text/html
Access to full text is restricted to subscribers.
J. T. Kent
K. V. Mardia
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:73-842013-03-04RePEc:oup:biomet
article
Confidence regions when the Fisher information is zero
We examine the asymptotic behaviour of confidence regions in identifiable one-dimensional parametric models with smooth likelihood function and information equal to zero at a critical point of the parameter space. Confidence regions are based on inversion of the likelihood ratio test statistic and of some common forms of the score and Wald test statistics. For fixed parameter values other than the critical point, all these statistics have limiting x-super-2-sub-(1) distributions, but for most of them the convergence is not uniform near the critical point. When it is not, confidence regions based on inverting the tests, using the x-super-2-sub-(1) approximation, do not asymptotically have the nominal level. The exception to this lack of locally uniform convergence occurs with the score test standardised by expected, rather than observed, information. For the regions based on the score test standardised by observed information and on the likelihood ratio test, conservative procedures that do not rely on the x-super-2-sub-(1) approximation can be developed, but they are much too conservative near the critical parameter value. The regions based on the Wald tests have asymptotic level less than ½, regardless of the procedure used. Our results suggest that no procedure based solely on the likelihood function will be satisfactory. Whether or not this is the case is an open problem. A simulation study illustrates the results of this paper. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
73
84
Matteo Bottai
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:179-1952013-03-04RePEc:oup:biomet
article
A shrinkage estimator for spectral densities
We propose a shrinkage estimator for spectral densities based on a multilevel normal hierarchical model. The first level captures the sampling variability via a likelihood constructed using the asymptotic properties of the periodogram. At the second level, the spectral density is shrunk towards a parametric time series model. To avoid selecting a particular parametric model for the second level, a third level is added which induces an estimator that averages over a class of parsimonious time series models. The estimator derived from this model, the model averaged shrinkage estimator, is consistent, is shown to be highly competitive with other spectral density estimators via simulations, and is computationally inexpensive. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
179
195
http://hdl.handle.net/10.1093/biomet/93.1.179
text/html
Access to full text is restricted to subscribers.
Carsten H. Botts
Michael J. Daniels
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:599-6142013-03-04RePEc:oup:biomet
article
Testing parametric assumptions of trends of a nonstationary time series
The paper considers testing whether the mean trend of a nonstationary time series is of certain parametric forms. A central limit theorem for the integrated squared error is derived, and a hypothesis-testing procedure is proposed. The method is illustrated in a simulation study, and is applied to assess the mean pattern of lifetime-maximum wind speeds of global tropical cyclones from 1981 to 2006. We also revisit the trend pattern in the central England temperature series. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
599
614
http://hdl.handle.net/10.1093/biomet/asr017
application/pdf
Access to full text is restricted to subscribers.
Ting Zhang
Wei Biao Wu
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:331-3432013-03-04RePEc:oup:biomet
article
On semiparametric transformation cure models
A general class of semiparametric transformation cure models is studied for the analysis of survival data with long-term survivors. It combines a logistic regression for the probability of event occurrence with the class of transformation models for the time of occurrence. Included as special cases are the proportional hazards cure model (Farewell, 1982; Kuk & Chen, 1992; Sy & Taylor, 2000; Peng & Dear, 2000) and the proportional odds cure model. Generalised estimating equations are proposed for parameter estimation. It is shown that the resulting estimators are asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. Simulation studies show that the proposed approach is appropriate for practical use. An application to data from a breast cancer study is given to illustrate the methodology. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
331
343
Wenbin Lu
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:647-6622013-03-04RePEc:oup:biomet
article
Marginal methods for correlated binary data with misclassified responses
Misclassification is a longstanding concern in medical research. Although there has been much research concerning error-prone covariates, relatively little work has been directed to problems with response variables subject to error. In this paper we focus on misclassification in clustered or longitudinal outcomes. We propose marginal analysis methods to handle binary responses which are subject to misclassification. The proposed methods have several appealing features, including simultaneous inference for both marginal mean and association parameters, and they can handle misclassified responses for a number of practical scenarios, such as the case with a validation subsample or replicates. Furthermore, the proposed methods are robust to model misspecification in a sense that no full distributional assumptions are required. Numerical studies demonstrate satisfactory performance of the proposed methods under a variety of settings. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
647
662
http://hdl.handle.net/10.1093/biomet/asr035
application/pdf
Access to full text is restricted to subscribers.
Zhijian Chen
Grace Y. Yi
Changbao Wu
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:533-5492013-03-04RePEc:oup:biomet
article
Modified profile likelihoods in models with stratum nuisance parameters
It is well known, at least through many examples, that when there are many nuisance parameters modified profile likelihoods often perform much better than the profile likelihood. Ordinary asymptotics almost totally fail to deal with this issue. For this reason, we study asymptotic properties of the profile and modified profile likelihoods in models for stratified data in a two-index asymptotics setting. This means that both the sample size of the strata, m, and the dimension of the nuisance parameter, q, may increase to infinity. It is shown that in this asymptotic setting modified profile likelihoods give improvements, with respect to the profile likelihood, in terms of consistency of estimators and of asymptotic distributional properties. In particular, the modified profile likelihood based statistics have the usual asymptotic distribution, provided that 1/m = o(q-super- - 1/3), while the analogous condition for the profile likelihood is 1/m = o(q-super- - 1). Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
533
549
N. Sartori
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10262013-03-04RePEc:oup:biomet
article
Amendments and Corrections
Arising from an omitted term in a calculation in the Appendix, variance formulae in the paper should be adjusted. In particular, the constants in the numerators of equations (2·4) and (2·15) should be 6 rather than 18. Variances are, however, still higher than in the case of least-squares estimators. The changes are implied by the following corrections to the Appendix. On p. 423, 2cδΔ′-sub-cos(ω-super-(k)) should be included within braces on lines 11 and 17, and 2cδΔ′-sub-sin(ω-super-(k)) should be added within braces on lines 12 and 18, leading to the extra term 2cm-super- - 3/2{Δ′-sub-sin(ω-super-(k))Gamma-sub-sin-super-(k) + Δ′-sub-cos(ω-super-(k))Gamma-sub-cos-super-(k)} on line 21. We are grateful to Barry Quinn for drawing our attention to this error. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1026
http://hdl.handle.net/10.1093/biomet/93.4.1025-b
text/html
Access to full text is restricted to subscribers.
Peter Hall
Ming Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:597-6112013-03-04RePEc:oup:biomet
article
Inference about a secondary process following a sequential trial
We consider the following sequential testing problem. A group-sequential or fully-sequential test is carried out for a primary parameter, using a score process or an effective score process to eliminate nuisance parameters. After stopping, the possibility of additional parameters is considered, and appropriate tests and estimators are desired that recognise the sequential stopping rule. We formulate an asymptotic multi-dimensional Gaussian process form of such problems, and then construct tests and confidence procedures. Optimality conditions are given, and an example is summarised. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
597
611
W. J. Hall
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:37-472013-03-04RePEc:oup:biomet
article
Graphical identifiability criteria for causal effects in studies with an unobserved treatment/response variable
We consider the problem of using data in studies with an unobserved treatment/response variable in order to evaluate average causal effects, when cause-effect relationships between variables can be described by a directed acyclic graph and the corresponding recursive factorization of a joint distribution. The paper proposes graphical criteria to test whether average causal effects are identifiable even if a treatment/response variable is unobserved. If the answer is affirmative, we provide further formulations for average causal effects from the observed data. The graphical criteria enable us to evaluate average causal effects when it is difficult to observe a treatment/response variable. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
37
47
http://hdl.handle.net/10.1093/biomet/asm005
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:1002-10052013-03-04RePEc:oup:biomet
article
On an internal method for deriving a summary measure
Some preliminary comments are made about the reasons for combining component observations into composite or derived variables. A method for forming derived variables sensitive to specified changes in the underlying multivariate distribution is described and illustrated by an issue in a study of animal pathology. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asn040
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:471-4772013-03-04RePEc:oup:biomet
article
Estimating ordered binomial proportions with the use of group testing
This paper considers group testing when the probability of response is increasing across the levels of an observed covariate. We illustrate how previously known results in order-restricted inference can be extended to situations wherein data are collected according to a group-testing protocol, and we derive maximum likelihood estimators for proportions under the increasing order restriction and group-testing model. Finally, we show how the use of group testing can dramatically reduce the bias and mean squared error of isotonic regression estimators obtained from one-at-a-time testing. These proposed methods are illustrated using data from an observational HIV study conducted in Houston, Texas. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
471
477
Joshua M. Tebbs
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:821-8302013-03-04RePEc:oup:biomet
article
Forward adaptive banding for estimating large covariance matrices
We propose a simple forward adaptive banding method for estimating large covariance matrices using the modified Cholesky decomposition. This approach requires the fitting of a prespecified set of models due to the adaptive banding structure and can be efficiently implemented. Aside from its computational attractiveness, we propose a novel Bayes information criterion that gives consistent model selection for estimating high dimensional covariance matrices. The method compares favourably to its competitors in simulation study. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
821
830
http://hdl.handle.net/10.1093/biomet/asr045
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Bo Li
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:213-2272013-03-04RePEc:oup:biomet
article
Exploiting occurrence times in likelihood inference for componentwise maxima
Multivariate extreme value distributions arise as the limiting distributions of normalised componentwise maxima. They are often used to model multivariate data that can be regarded as the componentwise maxima of some unobserved underlying multivariate process. In many applications we have extra information. We often know the locations of the maxima within the underlying process. If the process is temporal this knowledge is frequently available through the dates on which the maxima are recorded. We show how to incorporate this extra information into maximum likelihood procedures. Asymptotic and small-sample efficiency results are presented for the dependence parameter in the logistic parametric sub-class of bivariate extreme value distributions. We conclude with an application to sea levels. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
213
227
http://hdl.handle.net/10.1093/biomet/92.1.213
text/html
Access to full text is restricted to subscribers.
Alec Stephenson
Jonathan Tawn
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:555-5712013-03-04RePEc:oup:biomet
article
Using calibration weighting to adjust for nonresponse under a plausible model
When we estimate the population total for a survey variable or variables, calibration forces the weighted estimates of certain covariates to match known or alternatively estimated population totals called benchmarks. Calibration can be used to correct for sample-survey nonresponse, or for coverage error resulting from frame undercoverage or unit duplication. The quasi-randomization theory supporting its use in nonresponse adjustment treats response as an additional phase of random sampling. The functional form of a quasi-random response model is assumed to be known, its parameter values estimated implicitly through the creation of calibration weights. Unfortunately, calibration depends upon known benchmark totals while the covariates in a plausible model for survey response may not be the benchmark covariates. Moreover, it may be prudent to keep the number of covariates in a response model small. We use calibration to adjust for nonresponse when the benchmark model and covariates may differ, provided the number of the former is at least as great as that of the latter. We discuss the estimation of a total for a vector of survey variables that do not include the benchmark covariates, but that may include some of the model covariates. We show how to measure both the additional asymptotic variance due to the nonresponse in a calibration-weighted estimator and the full asymptotic variance of the estimator itself. All variances are determined with respect to the randomization mechanism used to select the sample, the response model generating the subset of sample respondents, or both. Data from the U.S. National Agricultural Statistical Service's 2002 Census of Agriculture and simulations are used to illustrate alternative adjustments for nonresponse. The paper concludes with some remarks about adjustment for coverage error. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
555
571
http://hdl.handle.net/10.1093/biomet/asn022
application/pdf
Access to full text is restricted to subscribers.
Ted Chang
Phillip S. Kott
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:229-2362013-03-04RePEc:oup:biomet
article
A note on profile likelihood for exponential tilt mixture models
Suppose that independent observations are drawn from multiple distributions, each of which is a mixture of two component distributions such that their log density ratio satisfies a linear model with a slope parameter and an intercept parameter. Inference for such models has been studied using empirical likelihood, and mixed results have been obtained. The profile empirical likelihood of the slope and intercept has an irregularity at the null hypothesis so that the two component distributions are equal. We derive a profile empirical likelihood and maximum likelihood estimator of the slope alone, and obtain the usual asymptotic properties for the estimator and the likelihood ratio statistic regardless of the null. Furthermore, we show the maximum likelihood estimator of the slope and intercept jointly is consistent and asymptotically normal regardless of the null. At the null, the joint maximum likelihood estimator falls along a straight line through the origin with perfect correlation asymptotically to the first order. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
229
236
http://hdl.handle.net/10.1093/biomet/asn059
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:724-7272013-03-04RePEc:oup:biomet
article
Identifiability and censored data
It is well known that, without the assumption of independence between two nonnegative random variables X and Y, the survival function of X is not identifiable on the basis of the joint distribution function of Z = min(X, Y) and &dgr; = I(Z = Y). In this paper, we provide a simple condition in the form of conditional distribution of Y given X. We show that our condition is equivalent to the constant-sum condition proposed by Williams & Lagakos (1977). As a result the survival function of X can be identified from the joint distribution of Z and &dgr; and the Kaplan--Meier estimator with Greenwood's formula for its variance remains valid. Examples which satisfy the condition are given. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
724
727
Nader Ebrahimi
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:559-5712013-03-04RePEc:oup:biomet
article
Locally-efficient robust estimation of haplotype-disease association in family-based studies
Modelling human genetic variation is critical to understanding the genetic basis of complex disease. The Human Genome Project has discovered millions of binary DNA sequence variants, called single nucleotide polymorphisms, and millions more may exist. As coding for proteins takes place along chromosomes, organisation of polymorphisms along each chromosome, the haplotype phase structure, may prove to be most important in discovering genetic variants associated with disease. As haplotype phase is often uncertain, procedures that model the distribution of parental haplotypes can, if this distribution is misspecified, lead to substantial bias in parameter estimates even when complete genotype information is available. Using a geometric approach to estimation in the presence of nuisance parameters, we address this problem and develop locally-efficient estimators of the effect of haplotypes on disease that are robust to incorrect estimates of haplotype frequencies. The methods are demonstrated with a simulation study of a case-parent design. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
559
571
http://hdl.handle.net/10.1093/biomet/92.3.559
text/html
Access to full text is restricted to subscribers.
Andrew S. Allen
Glen A. Satten
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:679-6942013-03-04RePEc:oup:biomet
article
Improving the efficiency of the log-rank test using auxiliary covariates
Under the assumption of proportional hazards, the log-rank test is optimal for testing the null hypothesis <inline-formula><inline-graphic xlink:href="asn003ilm1.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="asn003ilm2.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> denotes the logarithm of the hazard ratio. However, if there are additional covariates that correlate with survival times, making use of their information will increase the efficiency of the log-rank test. We apply the theory of semiparametrics to characterize a class of regular and asymptotically linear estimators for <inline-formula><inline-graphic xlink:href="asn003ilm3.gif" xmlns:xlink="http://www.w3.org/1999/xlink"/></inline-formula> when auxiliary covariates are incorporated into the model, and derive estimators that are more efficient. The Wald tests induced by these estimators are shown to be more powerful than the log-rank test. Simulation studies are used to illustrate the gains in efficiency. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
679
694
http://hdl.handle.net/10.1093/biomet/asn003
application/pdf
Access to full text is restricted to subscribers.
Xiaomin Lu
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:19-362013-03-04RePEc:oup:biomet
article
Efficient nonparametric estimation of causal effects in randomized trials with noncompliance
Causal approaches based on the potential outcome framework provide a useful tool for addressing noncompliance problems in randomized trials. We propose a new estimator of causal treatment effects in randomized clinical trials with noncompliance. We use the empirical likelihood approach to construct a profile random sieve likelihood and take into account the mixture structure in outcome distributions, so that our estimator is robust to parametric distribution assumptions and provides substantial finite-sample efficiency gains over the standard instrumental variable estimator. Our estimator is asymptotically equivalent to the standard instrumental variable estimator, and it can be applied to outcome variables with a continuous, ordinal or binary scale. We apply our method to data from a randomized trial of an intervention to improve the treatment of depression among depressed elderly patients in primary care practices. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
19
36
http://hdl.handle.net/10.1093/biomet/asn056
application/pdf
Access to full text is restricted to subscribers.
Jing Cheng
Dylan S. Small
Zhiqiang Tan
Thomas R. Ten Have
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:921-9372013-03-04RePEc:oup:biomet
article
Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data
A semiparametric regression model for longitudinal data is considered. The empirical likelihood method is used to estimate the regression coefficients and the baseline function, and to construct confidence regions and intervals. It is proved that the maximum empirical likelihood estimator of the regression coefficients achieves asymptotic efficiency and the estimator of the baseline function attains asymptotic normality when a bias correction is made. Two calibrated empirical likelihood approaches to inference for the baseline function are developed. We propose a groupwise empirical likelihood procedure to handle the inter-series dependence for the longitudinal semiparametric regression model, and employ bias correction to construct the empirical likelihood ratio functions for the parameters of interest. This leads us to prove a nonparametric version of Wilks' theorem. Compared with methods based on normal approximations, the empirical likelihood does not require consistent estimators for the asymptotic variance and bias. A simulation compares the empirical likelihood and normal-based methods in terms of coverage accuracies and average areas/lengths of confidence regions/intervals. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
921
937
http://hdl.handle.net/10.1093/biomet/asm066
application/pdf
Access to full text is restricted to subscribers.
Liugen Xue
Lixing Zhu
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:927-9412013-03-04RePEc:oup:biomet
article
Modelling of covariance structures in generalised estimating equations for longitudinal data
When used for modelling longitudinal data generalised estimating equations specify a working structure for the within-subject covariance matrices, aiming to produce efficient parameter estimators. However, misspecification of the working covariance structure may lead to a large loss of efficiency of the estimators of the mean parameters. In this paper we propose an approach for joint modelling of the mean and covariance structures of longitudinal data within the framework of generalised estimating equations. The resulting estimators for the mean and covariance parameters are shown to be consistent and asymptotically Normally distributed. Real data analysis and simulation studies show that the proposed approach yields e?cient estimators for both the mean and covariance parameters. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
927
941
http://hdl.handle.net/10.1093/biomet/93.4.927
text/html
Access to full text is restricted to subscribers.
Huajun Ye
Jianxin Pan
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:105-1182013-03-04RePEc:oup:biomet
article
Theory for penalised spline regression
Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rateestablished by Stone (1982). Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
105
118
http://hdl.handle.net/10.1093/biomet/92.1.105
text/html
Access to full text is restricted to subscribers.
Peter Hall
J. D. Opsomer
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:1-132013-03-04RePEc:oup:biomet
article
Nonparametric estimation in nonlinear mixed effects models
A nonparametric approach is developed herein to estimate parameters in nonlinear mixed effects models. Asymptotic properties of the nonparametric maximum likelihood estimators and associated computational algorithms are provided. Empirical Bayes estimators of functionals of the random effects are also developed. Applications to population pharmacokinetics are given. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
1
13
Tze Leung Lai
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:519-5282013-03-04RePEc:oup:biomet
article
A note on composite likelihood inference and model selection
A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
519
528
http://hdl.handle.net/10.1093/biomet/92.3.519
text/html
Access to full text is restricted to subscribers.
Cristiano Varin
Paolo Vidoni
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:553-5662013-03-04RePEc:oup:biomet
article
Bayesian analysis of covariance matrices and dynamic models for longitudinal data
Parsimonious modelling of the within-subject covariance structure while heeding its positive-definiteness is of great importance in the analysis of longitudinal data. Using the Cholesky decomposition and the ensuing unconstrained and statistically meaningful reparameterisation, we provide a convenient and intuitive framework for developing conditionally conjugate prior distributions for covariance matrices and show their connections with generalised inverse Wishart priors. Our priors offer many advantages with regard to elicitation, positive definiteness, computations using Gibbs sampling, shrinking covariances toward a particular structure with considerable flexibility, and modelling covariances using covariates. Bayesian estimation methods are developed and the results are compared using two simulation studies. These simulations suggest simpler and more suitable priors for the covariance structure of longitudinal data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
553
566
Michael J. Daniels
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:135-1522013-03-04RePEc:oup:biomet
article
Extending conventional priors for testing general hypotheses in linear models
We consider that observations come from a general normal linear model and that it is desirable to test a simplifying null hypothesis about the parameters. We approach this problem from an objective Bayesian, model-selection perspective. Crucial ingredients for this approach are 'proper objective priors' to be used for deriving the Bayes factors. Jeffreys-Zellner-Siow priors have good properties for testing null hypotheses defined by specific values of the parameters in full-rank linear models. We extend these priors to deal with general hypotheses in general linear models, not necessarily of full rank. The resulting priors, which we call 'conventional priors', are expressed as a generalization of recently introduced 'partially informative distributions'. The corresponding Bayes factors are fully automatic, easily computed and very reasonable. The methodology is illustrated for the change-point problem and the equality of treatments effects problem. We compare the conventional priors derived for these problems with other objective Bayesian proposals like the intrinsic priors. It is concluded that both priors behave similarly although interesting subtle differences arise. We adapt the conventional priors to deal with nonnested model selection as well as multiple-model comparison. Finally, we briefly address a generalization of conventional priors to nonnormal scenarios. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
135
152
http://hdl.handle.net/10.1093/biomet/asm014
application/pdf
Access to full text is restricted to subscribers.
M.J. Bayarri
Gonzalo García-Donato
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:435-4442013-03-04RePEc:oup:biomet
article
Closed-form likelihoods for Arnason--Schwarz models
We provide a general framework for the computationally efficient analysis, both Bayesian and classical, of integrated multi-site recovery/recapture models in the presence of individual-level covariates by extending the basic Arnason--Schwarz models and deriving closed-form likelihood expressions, together with corresponding sufficient statistics. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
435
444
R. King
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:75-922013-03-04RePEc:oup:biomet
article
Predicting future responses based on possibly mis-specified working models
Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the 𝒦-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
75
92
http://hdl.handle.net/10.1093/biomet/asm078
application/pdf
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
Scott D. Solomon
L.J. Wei
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:41-522013-03-04RePEc:oup:biomet
article
Efficient Bayes factor estimation from the reversible jump output
We propose a class of estimators of the Bayes factor which is based on an extension of the bridge sampling identity of Meng & Wong (1996) and makes use of the output of the reversible jump algorithm of Green (1995). Within this class we give the optimal estimator and also a suboptimal one which may be simply computed on the basis of the acceptance probabilities used within the reversible jump algorithm for jumping between models. The proposed estimators are very easily computed and lead to a substantial gain of efficiency in estimating the Bayes factor over the standard estimator based on the reversible jump output. This is illustrated through a series of Monte Carlo simulations involving a linear and a logistic regression model. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
41
52
http://hdl.handle.net/10.1093/biomet/93.1.41
text/html
Access to full text is restricted to subscribers.
Francesco Bartolucci
Luisa Scaccia
Antonietta Mira
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:63-742013-03-04RePEc:oup:biomet
article
Shared parameter models under random effects misspecification
A common objective in longitudinal studies is the investigation of the association structure between a longitudinal response process and the time to an event of interest. An attractive paradigm for the joint modelling of longitudinal and survival processes is the shared parameter framework, where a set of random effects is assumed to induce their interdependence. In this work, we propose an alternative parameterization for shared parameter models and investigate the effect of misspecifying the random effects distribution in the parameter estimates and their standard errors. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
63
74
http://hdl.handle.net/10.1093/biomet/asm087
application/pdf
Access to full text is restricted to subscribers.
Dimitris Rizopoulos
Geert Verbeke
Geert Molenberghs
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:787-7992013-03-04RePEc:oup:biomet
article
Symmetric diagnostics for the analysis of the residuals in regression models
Typical alternative hypotheses in the analysis of residuals of a standard regression model are considered, and for each one a Bayesian diagnostic based on a symmetric form of the Kullback--Leibler divergence is determined. The results include an explicit expression for the diagnostic when the alternative hypothesis is that the errors are generated by an unknown distribution function with a Dirichlet process prior. This expression is immediately interpretable, exactly computable and endowed with important asymptotic connections. A linear approximation of the diagnostic reveals close links with the class of Lagrange multiplier test statistics. When the alternative hypothesis is that the errors are generated by an autoregressive process the linear approximation is proportional to the Box--Pierce statistic or to the Ljung--Box statistic, according to the characteristics of the prior, if the observations have zero mean; it depends on the Durbin--Watson statistic if the errors are first-order autoregressive, and it is related to the Cliff--Ord statistic if they are generated by a first-order spatial autoregression. The sensitivity to the prior of the diagnostic and of its linear approximation is also discussed. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
787
799
http://hdl.handle.net/10.1093/biomet/92.4.787
text/html
Access to full text is restricted to subscribers.
Cinzia Carota
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:17-332013-03-04RePEc:oup:biomet
article
Distortion of effects caused by indirect confounding
Undetected confounding may severely distort the effect of an explanatory variable on a response variable, as defined by a stepwise data-generating process. The best known type of distortion, which we call direct confounding, arises from an unobserved explanatory variable common to a response and its main explanatory variable of interest. It is relevant mainly for observational studies, since it is avoided by successful randomization. By contrast, indirect confounding, which we identify in this paper, is an issue also for intervention studies. For general stepwise-generating processes, we provide matrix and graphical criteria to decide which types of distortion may be present, when they are absent and how they are avoided. We then turn to linear systems without other types of distortion, but with indirect confounding. For such systems, the magnitude of distortion in a least-squares regression coefficient is derived and shown to be estimable, so that it becomes possible to recover the effect of the generating process from the distorted coefficient. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
17
33
http://hdl.handle.net/10.1093/biomet/asm092
application/pdf
Access to full text is restricted to subscribers.
Nanny Wermuth
D. R. Cox
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:133-1482013-03-04RePEc:oup:biomet
article
Model checking in regression via dimension reduction
Lack-of-fit checking for parametric and semiparametric models is essential in reducing misspecification. The efficiency of most existing model-checking methods drops rapidly as the dimension of the covariates increases. We propose to check a model by projecting the fitted residuals along a direction that adapts to the systematic departure of the residuals from the desired pattern. Consistency of the method is proved for parametric and semiparametric regression models. A bootstrap implementation is also discussed. Simulation comparisons with several existing methods are made, suggesting that the proposed methods are more efficient than the existing methods when the dimension increases. Air pollution data from Chicago are used to illustrate the procedure. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
133
148
http://hdl.handle.net/10.1093/biomet/asn074
application/pdf
Access to full text is restricted to subscribers.
Yingcun Xia
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:49-602013-03-04RePEc:oup:biomet
article
Fuzzy p-values in latent variable problems
We consider the problem of testing a statistical hypothesis where the scientifically meaningful test statistic is a function of latent variables. In particular, we consider detection of genetic linkage, where the latent variables are patterns of inheritance at specific genome locations. Introduced by Geyer & Meeden (2005), fuzzy p-values are random variables, described by their probability distributions, that are interpreted as p-values. For latent variable problems, we introduce the notion of a fuzzy p-value as having the conditional distribution of the latent p-value given the observed data, where the latent p-value is the random variable that would be the p-value if the latent variables were observed.The fuzzy p-value provides an exact test using two sets of simulations of the latent variables under the null hypothesis, one unconditional and the other conditional on the observed data. It provides not only an expression of the strength of the evidence against the null hypothesis but also an expression of the uncertainty in that expression owing to lack of knowledge of the latent variables. We illustrate these features with an example of simulated data mimicking a real example of the detection of genetic linkage. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
49
60
http://hdl.handle.net/10.1093/biomet/asm001
application/pdf
Access to full text is restricted to subscribers.
Elizabeth A. Thompson
Charles J. Geyer
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:248-2522013-03-04RePEc:oup:biomet
article
Testing hypotheses in order
In certain circumstances, one wishes to test one hypothesis only if certain other hypotheses have been rejected. This ordering of hypotheses simplifies the task of controlling the probability of rejecting any true hypothesis. In an example from an observational study, a treated group is shown to be further from both of two control groups than the two control groups are from each other. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
248
252
http://hdl.handle.net/10.1093/biomet/asm085
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:709-7192013-03-04RePEc:oup:biomet
article
The Benjamini--Hochberg method with infinitely many contrasts in linear models
Benjamini and Hochberg's method for controlling the false discovery rate is applied to the problem of testing infinitely many contrasts in linear models. Exact, easily calculated critical values are derived, defining a new multiple comparisons method for testing contrasts in linear models. The method is adaptive, depending on the data through the F-statistic, like the Waller--Duncan Bayesian multiple comparisons method. Comparisons with Scheffé's method are given, and the method is extended to the simultaneous confidence intervals of Benjamini and Yekutieli. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
709
719
http://hdl.handle.net/10.1093/biomet/asn033
application/pdf
Access to full text is restricted to subscribers.
Peter H. Westfall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:53-642013-03-04RePEc:oup:biomet
article
Latent-model robustness in structural measurement error models
We present methods for diagnosing the effects of model misspecification of the true-predictor distribution in structural measurement error models. We first formulate latent-model robustness theoretically. Then we provide practical techniques for examining the adequacy of an assumed latent predictor model. The methods are illustrated via analytical examples, application to simulated data and with data from a study of coronary heart disease. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
53
64
http://hdl.handle.net/10.1093/biomet/93.1.53
text/html
Access to full text is restricted to subscribers.
Xianzheng Huang
Leonard A. Stefanski
Marie Davidian
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:861-8752013-03-04RePEc:oup:biomet
article
Covariate selection for the nonparametric estimation of an average treatment effect
Observational studies in which the effect of a nonrandomized treatment on an outcome of interest is estimated are common in domains such as labour economics and epidemiology. Such studies often rely on an assumption of unconfounded treatment when controlling for a given set of observed pre-treatment covariates. The choice of covariates to control in order to guarantee unconfoundedness should primarily be based on subject matter theories, although the latter typically give only partial guidance. It is tempting to include many covariates in the controlling set to try to make the assumption of an unconfounded treatment realistic. Including unnecessary covariates is suboptimal when the effect of a binary treatment is estimated nonparametrically. For instance, when using a n-super-1/2-consistent estimator, a loss of efficiency may result from using covariates that are irrelevant for the unconfoundedness assumption. Moreover, bias may dominate the variance when many covariates are used. Embracing the Neyman--Rubin model typically used in conjunction with nonparametric estimators of treatment effects, we characterize subsets from the original reservoir of covariates that are minimal in the sense that the treatment ceases to be unconfounded given any proper subset of these minimal sets. These subsets of covariates are shown to be identified under mild assumptions. These results lead us to propose data-driven algorithms for the selection of minimal sets of covariates. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
861
875
http://hdl.handle.net/10.1093/biomet/asr041
application/pdf
Access to full text is restricted to subscribers.
Xavier De Luna
Ingeborg Waernbaum
Thomas S. Richardson
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:491-5152013-03-04RePEc:oup:biomet
article
Uniform consistency in causal inference
There is a long tradition of representing causal relationships by directed acyclic graphs (Wright, 1934). Spirtes (1994), Spirtes et al. (1993) and Pearl & Verma (1991) describe procedures for inferring the presence or absence of causal arrows in the graph even if there might be unobserved confounding variables, and/or an unknown time order, and that under weak conditions, for certain combinations of directed acyclic graphs and probability distributions, are asymptotically, in sample size, consistent. These results are surprising since they seem to contradict the standard statistical wisdom that consistent estimators of causal effects do not exist for nonrandomised studies if there are potentially unobserved confounding variables. We resolve the apparent incompatibility of these views by closely examining the asymptotic properties of these causal inference procedures. We show that the asymptotically consistent procedures are 'pointwise consistent', but 'uniformly consistent' tests do not exist. Thus, no finite sample size can ever be guaranteed to approximate the asymptotic results. We also show the nonexistence of valid, consistent confidence intervals for causal effects and the nonexistence of uniformly consistent point estimators. Our results make no assumption about the form of the tests or estimators. In particular, the tests could be classical independence tests, they could be Bayes tests or they could be tests based on scoring methods such as BIC or AIC. The implications of our results for observational studies are controversial and are discussed briefly in the last section of the paper. The results hinge on the following fact: it is possible to find, for each sample size n, distributions P and Q such that P and Q are empirically indistinguishable and yet P and Q correspond to different causal effects. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
491
515
James M. Robins
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:939-9462013-03-04RePEc:oup:biomet
article
Forward search added-variable t-tests and the effect of masked outliers on model selection
Monitoring the t-tests for individual regression coefficients in 'forward' search fails to identify the importance of observations to the significance of the individual regressors. This failure is due to the ordering of the data by the search. We introduce an added-variable test which has the desired properties since the projection leading to residuals destroys the effect of the ordering. An example illustrates the effect of several masked outliers on model selection. Comments are given on the related test for response transformations. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
939
946
Anthony C. Atkinson
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:831-8442013-03-04RePEc:oup:biomet
article
Nonparametric estimation of large covariance matrices of longitudinal data
Estimation of an unstructured covariance matrix is difficult because of its positive-definiteness constraint. This obstacle is removed by regressing each variable on its predecessors, so that estimation of a covariance matrix is shown to be equivalent to that of estimating a sequence of varying-coefficient and varying-order regression models. Our framework is similar to the use of increasing-order autoregressive models in approximating the covariance matrix or the spectrum of a stationary time series. As an illustration, we adopt Fan & Zhang's (2000) two-step estimation of functional linear models and propose nonparametric estimators of covariance matrices which are guaranteed to be positive definite. For parsimony a suitable order for the sequence of (auto)regression models is found using penalised likelihood criteria like AIC and BIC. Some asymptotic results for the local polynomial estimators of components of a covariance matrix are established. Two longitudinal datasets are analysed to illustrate the methodology. A simulation study reveals the advantage of the nonparametric covariance estimator over the sample covariance matrix for large covariance matrices. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
831
844
Wei Biao Wu
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:961-9722013-03-04RePEc:oup:biomet
article
Isotonic logistic discrimination
We propose an isotonic logistic discrimination procedure which generalises linear logistic discrimination by allowing linear boundaries to be more flexibly shaped as monotone functions of the discriminant variables. Under each of three familiar sampling schemes for obtaining a training dataset, namely prospective, mixture and retrospective, we provide the corresponding likelihood-based inference. An application to a cancer study is given. In addition, we consider theoretical comparisons of our method with two recent algorithmic monotone discrimination procedures. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
961
972
http://hdl.handle.net/10.1093/biomet/93.4.961
text/html
Access to full text is restricted to subscribers.
Sungyoung Auh
Allan R. Sampson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:911-9262013-03-04RePEc:oup:biomet
article
A functional-based distribution diagnostic for a linear model with correlated outcomes
In this paper we present an easy-to-implement graphical distribution diagnostic for linear models with correlated errors. Houseman et al. (2004) constructed quantile--quantile plots for the marginal residuals of such models, suitably transformed. We extend the pointwise asymptotic theory to address the global stochastic behaviour of the corresponding empirical cumulative distribution function, and describe a simulation technique that serves as a computationally efficient parametric bootstrap for generating representatives of its stochastic limit. Thus, continuous functionals of the empirical cumulative distribution function may be used to form global tests of normality. Through the use of projection matrices, we generalised our methods to include tests that are directed at assessing the normality of particular components of the error. Thus, tests proposed by Lange & Ryan (1989) follow as a special case. Our method works well both for models having independent units of sampling and for those in which all observations are correlated. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
911
926
http://hdl.handle.net/10.1093/biomet/93.4.911
text/html
Access to full text is restricted to subscribers.
E. Andres Houseman
Brent A. Coull
Louise M. Ryan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:249-2652013-03-04RePEc:oup:biomet
article
An asymptotic theory for model selection inference in general semiparametric problems
Hjort & Claeskens (2003) developed an asymptotic theory for model selection, model averaging and subsequent inference using likelihood methods in parametric models, along with associated confidence statements. In this article, we consider a semiparametric version of this problem, wherein the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model. We show that all the results of Hjort & Claeskens hold in the semiparametric context, if the Fisher information matrix for parametric models is replaced by the semiparametric information bound for semiparametric models, and if maximum likelihood estimators for parametric models are replaced by semiparametric efficient profile estimators. Our methods of proof employ Le Cam's contiguity lemmas, leading to transparent results. The results also describe the behaviour of semiparametric model estimators when the parametric component is misspecified, and also have implications for pointwise-consistent model selectors. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
249
265
http://hdl.handle.net/10.1093/biomet/asm034
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Raymond J. Carroll
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:943-9592013-03-04RePEc:oup:biomet
article
Multi-level modelling under informative sampling
We consider a model-dependent approach for multi-level modelling that accounts for informative probability sampling of first- and lower-level population units. The proposed approach consists of first extracting the hierarchical model holding for the sample data given the selected sample, as a function of the corresponding population model and the first- and lower-level sample selection probabilities, and then fitting the resulting sample model using Bayesian methods. An important implication of the use of the model holding for the sample is that the sample selection probabilities feature in the analysis as additional data that possibly strengthen the estimators. A simulation experiment is carried out in order to study the performance of this approach and compare it to the use of 'design-based' methods. The simulation study indicates that both approaches perform in general equally well in terms of point estimation, but the model-dependent approach yields confidence/credibility intervals with better coverage properties. Another simulation study assesses the impact of misspecification of the models assumed for the sample selection probabilities. The use of maximum likelihood estimation is also considered. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
943
959
http://hdl.handle.net/10.1093/biomet/93.4.943
text/html
Access to full text is restricted to subscribers.
Danny Pfeffermann
Fernando Antonio Da Silva Moura
Pedro Luis Do Nascimento Silva
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:65-742013-03-04RePEc:oup:biomet
article
Using intraslice covariances for improved estimation of the central subspace in regression
Popular methods for estimating the central subspace in regression require slicing a continuous response. However, slicing can result in loss of information and in some cases that loss can be substantial. We use intraslice covariances to construct improved inference methods for the central subspace. These methods are optimal within a class of quadratic inference functions and permit chi-squared tests of conditional independence hypotheses involving the predictors. Our experience gained through simulation is that the new method is never worse than existing methods, and can be substantially better. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
65
74
http://hdl.handle.net/10.1093/biomet/93.1.65
text/html
Access to full text is restricted to subscribers.
R. Dennis Cook
Liqiang Ni
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:937-9502013-03-04RePEc:oup:biomet
article
Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation
A traditional approach to statistical inference is to identify the true or best model first with little or no consideration of the specific goal of inference in the model identification stage. Can the pursuit of the true model also lead to optimal regression estimation? In model selection, it is well known that BIC is consistent in selecting the true model, and AIC is minimax-rate optimal for estimating the regression function. A recent promising direction is adaptive model selection, in which, in contrast to AIC and BIC, the penalty term is data-dependent. Some theoretical and empirical results have been obtained in support of adaptive model selection, but it is still not clear if it can really share the strengths of AIC and BIC. Model combining or averaging has attracted increasing attention as a means to overcome the model selection uncertainty. Can Bayesian model averaging be optimal for estimating the regression function in a minimax sense? We show that the answers to these questions are basically in the negative: for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
937
950
http://hdl.handle.net/10.1093/biomet/92.4.937
text/html
Access to full text is restricted to subscribers.
Yuhong Yang
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:1001-10062013-03-04RePEc:oup:biomet
article
Empirical likelihood and quantile regression in longitudinal data analysis
We propose a novel quantile regression approach for longitudinal data analysis which naturally incorporates auxiliary information from the conditional mean model to account for within-subject correlations. The efficiency gain is quantified theoretically and demonstrated empirically via simulation studies and the analysis of a real dataset. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
1001
1006
http://hdl.handle.net/10.1093/biomet/asr050
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:747-7582013-03-04RePEc:oup:biomet
article
Conditional properties of unconditional parametric bootstrap procedures for inference in exponential families
Higher-order inference about a scalar parameter in the presence of nuisance parameters can be achieved by bootstrapping, in circumstances where the parameter of interest is a component of the canonical parameter in a full exponential family. The optimal test, which is approximated, is a conditional one based on conditioning on the sufficient statistic for the nuisance parameter. A bootstrap procedure that ignores the conditioning is shown to have desirable conditional properties in providing third-order relative accuracy in approximation of p-values associated with the optimal test, in both continuous and discrete models. The bootstrap approach is equivalent to third-order analytical approaches, and is demonstrated in a number of examples to give very accurate approximations even for very small sample sizes. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
747
758
http://hdl.handle.net/10.1093/biomet/asn011
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:831-8432013-03-04RePEc:oup:biomet
article
The Aalen additive gamma frailty hazards model
In this paper, we consider clustered right-censored time-to-event data. Such data can be analysed either using a marginal model if one is interested in population effects or using so-called frailty models if one is interested in covariate effects on the individual level and in estimation of correlation. The Cox frailty model has been studied extensively in the last decade or so and estimation techniques and large sample results are now available. It is, however, difficult to deal with time-changing covariate effects when using the Cox model. An appealing alternative model is the Aalen additive hazards model, in which it is easy to work with time dynamics. In this paper, we describe an innovative approach to estimation in the Aalen additive gamma frailty hazards model. We give the large sample properties of the estimators and investigate their small sample properties by Monte Carlo simulation. A real example is provided for illustration. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
831
843
http://hdl.handle.net/10.1093/biomet/asr049
application/pdf
Access to full text is restricted to subscribers.
Torben Martinussen
Thomas H. Scheike
David M. Zucker
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:45-632013-03-04RePEc:oup:biomet
article
Bayesian criterion based model assessment for categorical data
We propose a general Bayesian criterion for model assessment for categorical data called the weighted L measure, which is constructed from the posterior predictive distribution of the data. The measure is based on weighting the observations according to the sampling variance of their future response vector. The weight component in the weighted L measure plays the role of a penalty term in the criterion, in which a greater weight assigned to covariate values implies a greater penalty term on the dimension of the model. A detailed justification is provided for such a weighting procedure and several theoretical properties of the weighted L measure are presented for a wide variety of discrete data models. For these models, we examine properties of the weighted L measure, and show that it can perform better than the unweighted L measure in a variety of settings. In addition, we show that the weighted quadratic loss L measure is more attractive than the unweighted L measure and the deviance loss L measure for categorical data. Moreover, a calibration for the weighted L measure is motivated and proposed, which allows us to compare formally the L measure values of competing models. A detailed simulation study is presented to examine the performance of the weighted L measure, and it is compared to other established model-selection methods. Finally, the method is applied to a real dataset using a bivariate ordinal response model. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
45
63
Ming-Hui Chen
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:451-4562013-03-04RePEc:oup:biomet
article
Nonparametric state estimation of diffusion processes
The paper presents a method for estimating nonparametrically the states of one-dimensional diffusion processes. Once certain nuisance parameters have been estimated from the time series, states of a diffusion process can be estimated by the Kalman filter algorithm, so that the method is also useful for filtering and smoothing the states of the process. Numerical comparison of the method with the case of fitting a linear model to data shows that the method is clearly superior in terms of prediction errors. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
451
456
Isao Shoji
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:1-222013-03-04RePEc:oup:biomet
article
A class of logistic-type discriminant functions
In two-group discriminant analysis, the Neyman--Pearson Lemma establishes that the ROC, receiver operating characteristic, curve for an arbitrary linear function is everywhere below the ROC curve for the true likelihood ratio. The weighted area between these two curves can be used as a risk function for finding good discriminant functions. The weight function corresponds to the objective of the analysis, for example to minimise the expected cost of misclassification, or to maximise the area under the ROC. The resulting discriminant functions can be estimated by iteratively reweighted logistic regression. We investigate some asymptotic properties in the 'near-logistic' setting, where we assume the covariates have been chosen such that a linear function gives a reasonable, but not necessarily exact, approximation to the true log likelihood ratio. Some examples are discussed, including a study of medical diagnosis in breast cytology. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
1
22
Shinto Eguchi
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:427-4432013-03-04RePEc:oup:biomet
article
Double block bootstrap confidence intervals for dependent data
The block bootstrap confidence interval for dependent data can outperform the conventional normal approximation only with nontrivial studentization which, in the case of complicated statistics, calls for specialist treatment and often results in unstable endpoints. We propose two double block bootstrap approaches for improving the accuracy of the block bootstrap confidence interval under very general conditions. The first approach calibrates the nominal coverage level and the second calculates studentizing factors directly from a block bootstrap series without the need for nontrivial analytical treatment. We prove that the two approaches reduce the coverage error of the block bootstrap interval by an order of magnitude with simple tuning of block lengths at the two block bootstrapping levels. Empirical properties of the procedures are investigated by simulations and application to an econometric time series. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
427
443
http://hdl.handle.net/10.1093/biomet/asp018
application/pdf
Access to full text is restricted to subscribers.
Stephen M. S. Lee
P. Y. Lai
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:539-5532013-03-04RePEc:oup:biomet
article
A new approach to weighting and inference in sample surveys
The validity of design-based inference is not dependent on any model assumption. However, it is well known that estimators derived through design-based theory may be inefficient for the estimation of population totals when the design weights are weakly related to the variables of interest and have widely dispersed values. We propose estimators that have the potential to improve the efficiency of any estimator derived under the design-based theory. Our main focus is limited to the improvement of the Horvitz--Thompson estimator, but we also discuss the extension to calibration estimators. The new estimators are obtained by smoothing design or calibration weights using an appropriate model. Our approach to inference requires the modelling of only one variable, the weight, and it leads to a single set of smoothed weights in multipurpose surveys. This is to be contrasted with other model-based approaches, such as the prediction approach, in which it is necessary to postulate and validate a model for each variable of interest leading potentially to variable-specific sets of weights. Our proposed approach is first justified theoretically and then evaluated through a simulation study. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/asn028
application/pdf
Access to full text is restricted to subscribers.
Jean-François Beaumont
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:529-5412013-03-04RePEc:oup:biomet
article
Case-control current status data
In this paper, we show that the distribution function of survival times is identified, up to a one-parameter family of distribution functions, based on information from case-control current status data. With supplementary information on the population frequency of cases relative to controls, a simple weighted version of the nonparametric maximum likelihood estimator for prospective current status data provides a natural estimator for case-control samples. Following the parametric results of Scott & Wild (1997), we show that this estimator is, in fact, the nonparametric maximum likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
529
541
Nicholas P. Jewell
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:755-7592013-03-04RePEc:oup:biomet
article
On a generalization of a result of W. G. Cochran
A relationship due to W.G. Cochran showing the effect on least squares regression coefficients of marginalizing over or conditioning on an explanatory variable is generalized to quantile regression coefficients. The condition under which conditioning does not induce interaction or effect reversal is shown. Examples are given. The discussion is simplest when all variables are continuous; the extension to discrete variables is outlined. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
755
759
http://hdl.handle.net/10.1093/biomet/asm046
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:741-7472013-03-04RePEc:oup:biomet
article
Construction of φ<sub>p</sub>-optimal exact designs with minimum experimental run size for a linear log contrast model in mixture experiments
We propose a new method with minimum experimental run size using the properties of Hadamard matrices through which some φ<sub>p</sub>-optimal exact designs including A-, D- and E-optimal designs are constructed for a linear log contrast model in mixture experiments. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asr014
application/pdf
Access to full text is restricted to subscribers.
Baisuo Jin
Mong-Na Lo Huang
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:765-7752013-03-04RePEc:oup:biomet
article
Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function
Random effects logistic regression models are often used to model clustered binary response data. Regression parameters in these models have a conditional, subject-specific interpretation in that they quantify regression effects for each cluster. Very often, the logistic functional shape conditional on the random effects does not carry over to the marginal scale. Thus, parameters in these models usually do not have an explicit marginal, population-averaged interpretation. We study a bridge distribution function for the random effect in the random intercept logistic regression model. Under this distributional assumption, the marginal functional shape is still of logistic form, and thus regression parameters have an explicit marginal interpretation. The main advantage of this approach is that likelihood inference can be obtained for either marginal or conditional regression inference within a single model framework. The generality of the results and some properties of the bridge distribution functions are discussed. An example is used for illustration. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
765
775
Zengri Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:615-6312013-03-04RePEc:oup:biomet
article
Aggregation-cokriging for highly multivariate spatial data
Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
615
631
http://hdl.handle.net/10.1093/biomet/asr029
application/pdf
Access to full text is restricted to subscribers.
Reinhard Furrer
Marc G. Genton
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:153-1652013-03-04RePEc:oup:biomet
article
Modelling the effects of partially observed covariates on Poisson process intensity
We propose an estimating function for parameters in a model for Poisson process intensity when time- or space-varying covariates are observed for both the events of the process and at sample times or locations selected from a probability-based sampling design. We investigate the large-sample properties of the proposed estimator under increasing domain asymptotics, demonstrating that it is consistent and asymptotically normally distributed. We illustrate our approach using data from an ecological momentary assessment of smoking. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
153
165
http://hdl.handle.net/10.1093/biomet/asm009
application/pdf
Access to full text is restricted to subscribers.
Stephen L. Rathbun
Saul Shiffman
Chad J. Gwaltney
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:687-7042013-03-04RePEc:oup:biomet
article
A Haar--Fisz technique for locally stationary volatility estimation
We consider a locally stationary model for financial log-returns whereby the returns are independent and the volatility is a piecewise-constant function with jumps of an unknown number and locations, defined on a compact interval to enable a meaningful estimation theory. We demonstrate that the model explains well the common characteristics of log-returns. We propose a new wavelet thresholding algorithm for volatility estimation in this model, in which Haar wavelets are combined with the variance-stabilising Fisz transform. The resulting volatility estimator is mean-square consistent with a near-parametric rate, does not require any pre-estimates, is rapidly computable and is easily implemented. We also discuss important variations on the choice of estimation parameters. We show that our approach both gives a very good fit to selected currency exchange datasets, and achieves accurate long- and short-term volatility forecasts in comparison to the GARCH(1, 1) and moving window techniques. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
687
704
http://hdl.handle.net/10.1093/biomet/93.3.687
text/html
Access to full text is restricted to subscribers.
Piotr Fryzlewicz
Theofanis Sapatinas
Suhasini Subba Rao
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:255-2682013-03-04RePEc:oup:biomet
article
M-quantile models for small area estimation
Small area estimation techniques typically rely on regression models that use both covariates and random effects to explain variation between the areas. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier-robust inference. We describe a new approach to small area estimation that is based on modelling quantilelike parameters of the conditional distribution of the target variable given the covariates. This avoids the problems associated with specification of random effects, allowing inter-area differences to be characterised by area-specific M-quantile coefficients. The proposed approach is easily made robust against outlying data values and can be adapted for estimation of a wide range of area-specific parameters, including quantiles of the distribution of the target variable in the different small areas. The differences between M-quantile and random effects models are discussed and the alternative approaches to small area estimation are compared using both simulated and real data. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
255
268
http://hdl.handle.net/10.1093/biomet/93.2.255
text/html
Access to full text is restricted to subscribers.
Ray Chambers
Nikos Tzavidis
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:321-3302013-03-04RePEc:oup:biomet
article
Analysis of longitudinal data in case-control studies
Case-control studies for longitudinal data are considered. Among repeated binary measurements of disease status in each subject, the exposure levels of risk factors for all diseased cases are identified and the exposure levels for only a small fraction of disease-free cases, to be regarded as controls, are identified. Case-control studies for longitudinal data bring about economies in cost and time when the disease is rare and when assessing the exposure level of risk factors is difficult. We propose a way of using an ordinary logistic model to analyse case-control longitudinal data. We prove that the proposed estimator is consistent and asymptotically normally distributed provided that the choice of control observations is independent of the covariates for those subjects. We also discuss the validity of the generalised estimating equation method for case-control longitudinal data. Simulation results are provided, and a real example is presented. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
321
330
Eunsik Park
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:169-1862013-03-04RePEc:oup:biomet
article
Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorized into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinite-dimensional process, implementation of the conditional method has relied on finite approximations. In this paper, we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a non-conjugate model, different datasets and prior specifications. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
169
186
http://hdl.handle.net/10.1093/biomet/asm086
application/pdf
Access to full text is restricted to subscribers.
Omiros Papaspiliopoulos
Gareth O. Roberts
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:249-2622013-03-04RePEc:oup:biomet
article
Nonparametric Bayes local partition models for random effects
This paper focuses on the problem of choosing a prior for an unknown random effects distribution within a Bayesian hierarchical model. The goal is to obtain a sparse representation by allowing a combination of global and local borrowing of information. A local partition process prior is proposed, which induces dependent local clustering. Subjects can be clustered together for a subset of their parameters, and one learns about similarities between subjects increasingly as parameters are added. Some basic properties are described, including simple two-parameter expressions for marginal and conditional clustering probabilities. A slice sampler is developed which bypasses the need to approximate the countably infinite random measure in performing posterior computation. The methods are illustrated using simulation examples, and an application to hormone trajectory data. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
249
262
http://hdl.handle.net/10.1093/biomet/asp021
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:427-4412013-03-04RePEc:oup:biomet
article
Uncertainty in prior elicitations: a nonparametric approach
A key task in the elicitation of expert knowledge is to construct a distribution from the finite, and usually small, number of statements that have been elicited from the expert. These statements typically specify some quantiles or moments of the distribution. Such statements are not enough to identify the expert's probability distribution uniquely, and the usual approach is to fit some member of a convenient parametric family. There are two clear deficiencies in this solution. First, the expert's beliefs are forced to fit the parametric family. Secondly, no account is then taken of the many other possible distributions that might have fitted the elicited statements equally well. We present a nonparametric approach which tackles both of these deficiencies. We also consider the issue of the imprecision in the elicited probability judgements. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
427
441
http://hdl.handle.net/10.1093/biomet/asm031
application/pdf
Access to full text is restricted to subscribers.
Jeremy E. Oakley
Anthony O'Hagan
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:985-9902013-03-04RePEc:oup:biomet
article
A note on methods of restoring consistency to the bootstrap
We consider the property of consistency and its relevance for determining the performance of the bootstrap. We analyse various parametric bootstrap approximations to the distributions of the Hodges and Stein estimators, whose behaviour is typical of that of super-efficient estimators employed in wavelet regression, kernel density estimation and nonparametric curve fitting. Our results reveal not only some of the difficulties in selecting good modifications to the intuitive bootstrap, but also that inconsistent bootstrap approximations may perform better than consistent versions even in large samples. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
985
990
Richard Samworth
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:1-182013-03-04RePEc:oup:biomet
article
Maxima of discretely sampled random fields, with an application to 'bubbles'
A smooth Gaussian random field with zero mean and unit variance is sampled on a discrete lattice, and we are interested in the exceedance probability or P-value of the maximum in a finite region. If the random field is smooth relative to the mesh size, then the P-value can be well approximated by results for the continuously sampled smooth random field (Adler, 1981; Worsley, 1995a; Taylor & Adler, 2003; Adler & Taylor, 2007). If the random field is not smooth, so that adjacent lattice values are nearly independent, then the usual Bonferroni bound is very accurate. The purpose of this paper is to bridge the gap between the two, and derive a simple, accurate upper bound for intermediate mesh sizes. The result uses a new improved Bonferroni-type bound based on discrete local maxima. We give an application to the 'bubbles' technique for detecting areas of the face used to discriminate fear from happiness. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
1
18
http://hdl.handle.net/10.1093/biomet/asm004
application/pdf
Access to full text is restricted to subscribers.
J. E. Taylor
K. J. Worsley
F. Gosselin
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:399-4142013-03-04RePEc:oup:biomet
article
Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity
We consider a unified least absolute deviation estimator for stationary and nonstationary fractionally integrated autoregressive moving average models with conditional heteroscedasticity. Its asymptotic normality is established when the second moments of errors and innovations are finite. Several other alternative estimators are also discussed and are shown to be less efficient and less robust than the proposed approach. A diagnostic tool, consisting of two portmanteau tests, is designed to check whether or not the estimated models are adequate. The simulation experiments give further support to our model and the results for the absolute returns of the Dow Jones Industrial Average Index daily closing price demonstrate their usefulness in modelling time series exhibiting the features of long memory, conditional heteroscedasticity and heavy tails. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
399
414
http://hdl.handle.net/10.1093/biomet/asn014
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:585-5962013-03-04RePEc:oup:biomet
article
Using logistic regression procedures for estimating receiver operating characteristic curves
Estimation of a receiver operating characteristic, ROC, curve is usually based either on a fully parametric model such as a normal model or on a fully nonparametric model. In this paper, we explore a semiparametric approach by assuming a density ratio model for disease and disease-free densities. This model has a natural connection with the logistic regression model. The proposed semiparametric approach is more robust than a fully parametric approach and is more efficient than a fully nonparametric approach. Two real examples demonstrate that the ROC curve estimated by our semiparametric method is much smoother than that estimated by the nonparametric method. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
585
596
Jing Qin
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:603-6162013-03-04RePEc:oup:biomet
article
A simple and efficient simulation smoother for state space time series analysis
A simulation smoother in state space time series analysis is a procedure for drawing samples from the conditional distribution of state or disturbance vectors given the observations. We present a new technique for this which is both simple and computationally efficient. The treatment includes models with diffuse initial conditions and regression effects. Computational comparisons are made with the previous standard method. Two applications are provided to illustrate the use of the simulation smoother for Gibbs sampling for Bayesian inference and importance sampling for classical inference. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
603
616
J. Durbin
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:669-6822013-03-04RePEc:oup:biomet
article
A Poisson model for the coverage problem with a genomic application
Suppose a population has infinitely many individuals and is partitioned into unknown N disjoint classes. The sample coverage of a random sample from the population is the total proportion of the classes observed in the sample. This paper uses a nonparametric Poisson mixture model to give new understanding and results for inference on the sample coverage. The Poisson mixture model provides a simplified framework for inferring any general abundance-K coverage, the sum of the proportions of those classes that contribute exactly k individuals in the sample for some k in K, with K being a set of nonnegative integers. A new moment-based derivation of the well-known Turing estimators is presented. As an application, a gene-categorisation problem in genomic research is addressed. Since Turing's approach is a moment-based method, maximum likelihood estimation and minimum distance estimation are indicated as alternatives for the coverage problem. Finally, it will be shown that any Turing estimator is asymptotically fully efficient. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
669
682
Chang Xuan Mao
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:585-6012013-03-04RePEc:oup:biomet
article
Implications of influence function analysis for sliced inverse regression and sliced average variance estimation
Sliced inverse regression, sliced inverse regression II and sliced average variance estimation are three related dimension-reduction methods that require relatively mild model assumptions. As an approximation for the relative influence of single observations from large samples, the influence function is used to compare the sensitivity of the three methods to particular observational types. The analysis carried out here helps to explain why there is a lack of agreement concerning the preferability of these dimension-reduction procedures in general. An efficient sample version of the influence function is also developed and evaluated. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asm055
application/pdf
Access to full text is restricted to subscribers.
Luke A. Prendergast
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:543-5572013-03-04RePEc:oup:biomet
article
Smooth quantile ratio estimation
We propose a novel approach to estimating the mean difference between two highly skewed distributions. The method, which we call smooth quantile ratio estimation, smooths, over percentiles, the ratio of the quantiles of the two distributions. The method defines a large class of estimators, including the sample mean difference, the maximum likelihood estimator under log-normal samples and the L-estimator. We derive asymptotic properties such as consistency and asymptotic normality, and also provide a closed-form expression for the asymptotic variance. In a simulation study, we show that smooth quantile ratio estimation has lower mean squared error than several competitors, including the sample mean difference and the log-normal parametric estimator in several realistic situations. We apply the method to the 1987 National Medicare Expenditure Survey to estimate the difference in medical expenditures between persons suffering from the smoking attributable diseases, lung cancer and chronic obstructive pulmonary disease, and persons without these diseases. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
543
557
http://hdl.handle.net/10.1093/biomet/92.3.543
text/html
Access to full text is restricted to subscribers.
Francesca Dominici
Leslie Cope
Daniel Q. Naiman
Scott L. Zeger
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:23-372013-03-04RePEc:oup:biomet
article
The analysis of retrospective family studies
Case-control samples allow straightforward calculation of estimates of the association between covariates and disease status by fitting a prospective logistic regression model. In genetic studies of disease, investigators often gather additional information on response and covariate variables from family members of cases and controls. The objective is to model the responses of all the family members in terms of the covariate data. Whittemore (1995) has discussed maximum likelihood methods for fitting a special class of logistic models to family data collected according to a particular design. In the present paper, we show that we can obtain efficient semiparametric maximum likelihood estimates for an arbitrary multivariate binary regression model by fitting a modified prospective model for a wide class of retrospective designs. However, in contrast to the situation with simple case-control studies, the prospective model will differ from the original model even when the model is logistic. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
23
37
J. Neuhaus
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:982-9842013-03-04RePEc:oup:biomet
article
Conditional and marginal association for binary random variables
The relationship between marginal and conditional distributions of binary random variables is analysed via a log-linear model. Conditions for the Yule--Simpson effect are established and the implications for latent class analysis examined. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
982
984
D. R. Cox
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:827-8412013-03-04RePEc:oup:biomet
article
Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling
We consider parameter-driven models of time series of counts, where the observations are assumed to arise from a Poisson distribution with a mean changing over time according to a latent process. Estimation of these models is carried out within a Bayesian framework using data augmentation and Markov chain Monte Carlo methods. We suggest a new auxiliary mixture sampler, which possesses a Gibbsian transition kernel, where we draw from full conditional distributions belonging to standard distribution families only. Emphasis lies on application to state space modelling of time series of counts, but we show that auxiliary mixture sampling may be applied to a wider range of parameter-driven models, including random-effects models and panel data models based on the Poisson distribution. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
827
841
http://hdl.handle.net/10.1093/biomet/93.4.827
text/html
Access to full text is restricted to subscribers.
Sylvia FrüHwirth-Schnatter
Helga Wagner
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:671-6862013-03-04RePEc:oup:biomet
article
Models for interval censoring and simulation-based inference for lifetime distributions
Interval-censored lifetime data arise when individuals in a study are inspected intermittently so that a lifetime is observed to lie between two successive times. In settings where only these two times are available, methods exist for nonparametric or parametric estimation of lifetime distributions. However, there has been virtually no discussion of how inspection processes may be estimated or identified. Such estimates are needed if one is to generate interval-censored data by simulation. This paper identifies which aspects of an independent inspection process are estimable from interval-censored data, and shows how to obtain nonparametric estimates. The results allow interval-censored data from any specified distribution to be generated, and give new simulation procedures for estimation or testing. A new omnibus goodness-of-fit test is introduced. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
671
686
http://hdl.handle.net/10.1093/biomet/93.3.671
text/html
Access to full text is restricted to subscribers.
J. F. Lawless
Denise Babineau
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:819-8292013-03-04RePEc:oup:biomet
article
Estimation of nonstationary spatial covariance structure
We introduce a method for estimating nonstationary spatial covariance structure from space-time data and apply the method to an analysis of Sydney wind patterns. Our method constructs a process honouring a given spatial covariance matrix at observing stations and uses one or more stationary processes to describe conditional behaviour given observing site values. The stationary processes give a localised description of the spatial covariance structure. The method is computationally attractive, and can be extended to the assessment of covariance for multivariate processes. The technique is illustrated for data describing the east-west component of Sydney winds. For this example, our own methods are contrasted with a geometrically appealing though computationally intensive technique which describes spatial correlation via an isotropic process and a deformation of the geographical space. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
819
829
David J. Nott
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:627-6462013-03-04RePEc:oup:biomet
article
Simulation and inference for stochastic volatility models driven by Lévy processes
We study Ornstein-Uhlenbeck stochastic processes driven by Lévy processes, and extend them to more general non-Ornstein-Uhlenbeck models. In particular, we investigate the means of making the correlation structure in the volatility process more flexible. For one model, we implement a method for introducing quasi long-memory into the volatility model. We demonstrate that the models can be fitted to real share price returns data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
627
646
http://hdl.handle.net/10.1093/biomet/asm048
application/pdf
Access to full text is restricted to subscribers.
Matthew P. S. Gander
David A. Stephens
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:877-8912013-03-04RePEc:oup:biomet
article
Generalised incomplete Trojan designs
Generalised incomplete (m x n)/k Trojan designs for m replicates of nk treatments based on sets of k cyclic generators are discussed. Normal equations for plots-within-columns, plots-within-blocks and blocks-within-columns treatment effects are developed. The nk treatments are divided into k subsets each of size n and the conditional plots-within-blocks and blocks-within-columns information matrix for each subset is defined. Efficient conditional treatment estimates are discussed and efficient generators for the various strata are discussed. Balanced (m x n)/k incomplete Trojan designs based on Youden generators are constructed and designs based on multiples of a single generator are discussed. Some ideas for constructing efficient general (m x n)/2 designs are outlined and some advantages of generalised incomplete Trojan designs are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
877
891
R. N. Edmondson
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:119-1332013-03-04RePEc:oup:biomet
article
Multiscale generalised linear models for nonparametric function estimation
We present a method for extracting information about both the scale and trend of local components of an inhomogeneous function in a nonparametric generalised linear model. Our multiscale framework combines recursive partitions, which allow for the incorporation of scale in a natural manner, with systems of piecewise polynomials supported on the partition intervals, which serve to summarise the smooth trend within each interval. Our estimators are formulated as solutions of complexity-penalised likelihood optimisations, where the penalty seeks to limit the number of intervals used to model the data. The actual calculation of the estimators may be accomplished using standard software routines for generalised linear models, within the context of efficient, tree-based, polynomial-time algorithms. A risk analysis shows that these estimators achieve the same asymptotic rates in the nonparametric generalised linear model as the classical wavelet-based estimators in the Gaussian 'function plus noise' model, for suitably defined ranges of Besov spaces. Numerical simulations show that the method tends to perform at least as well as, and often better than, alternative wavelet-based methodologies in the context of finite samples, while applications to gamma-ray burst data in astronomy and packet loss data in computer network tra.c analysis confirm its practical relevance. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
119
133
http://hdl.handle.net/10.1093/biomet/92.1.119
text/html
Access to full text is restricted to subscribers.
Eric D. Kolaczyk
Robert D. Nowak
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:51-652013-03-04RePEc:oup:biomet
article
Orthogonal and nearly orthogonal designs for computer experiments
We introduce a method for constructing a rich class of designs that are suitable for use in computer experiments. The designs include Latin hypercube designs and two-level fractional factorial designs as special cases and fill the vast vacuum between these two familiar classes of designs. The basic construction method is simple, building a series of larger designs based on a given small design. If the base design is orthogonal, the resulting designs are orthogonal; likewise, if the base design is nearly orthogonal, the resulting designs are nearly orthogonal. We present two generalizations of our basic construction method. The first generalization improves the projection properties of the basic method; the second generalization gives rise to designs that have smaller correlations. Sample constructions are presented and properties of these designs are discussed. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
51
65
http://hdl.handle.net/10.1093/biomet/asn057
application/pdf
Access to full text is restricted to subscribers.
Derek Bingham
Randy R. Sitter
Boxin Tang
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:996-10022013-03-04RePEc:oup:biomet
article
Identification of a competing risks model with unknown transformations of latent failure times
This paper is concerned with identification of a competing risks model with unknown transformations of latent failure times. The model includes, as special cases, competing risks versions of proportional hazards, mixed proportional hazards and accelerated failure time models. It is shown that covariate effects on latent failure times, cause-specific link functions and the joint survivor function of the disturbance terms can be identified without relying on modelling the dependence between latent failure times parametrically nor using an exclusion restriction among covariates. As a result, the paper provides an identification result about the joint survivor function of the latent failure times conditional on covariates. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
996
1002
http://hdl.handle.net/10.1093/biomet/93.4.996
text/html
Access to full text is restricted to subscribers.
Sokbae Lee
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:238-2442013-03-04RePEc:oup:biomet
article
Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure
We propose a method for comparing survival distributions when cause-of-failure information is missing for some individuals. We use multiple imputation to impute missing causes of failure, where the probability that a missing cause is that of interest may depend on auxiliary covariates, and combine log-rank statistics computed from several 'completed