2014-08-28T07:04:05Z
http://oai.repec.openlib.org/oai.php
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:71-842012-05-01RePEc:oup:biomet
article
Optimal fractions of two-level factorials under a baseline parameterization
Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/asr071
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:15-282012-05-01RePEc:oup:biomet
article
Factor profiled sure independence screening
We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
15
28
http://hdl.handle.net/10.1093/biomet/asr074
application/pdf
Access to full text is restricted to subscribers.
H. Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:238-2442012-05-01RePEc:oup:biomet
article
On robust estimation via pseudo-additive information
We consider a robust parameter estimator minimizing an empirical approximation to the q-entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
238
244
http://hdl.handle.net/10.1093/biomet/asr061
application/pdf
Access to full text is restricted to subscribers.
Davide Ferrari
Davide La Vecchia
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:230-2372012-05-01RePEc:oup:biomet
article
Estimating overdispersion when fitting a generalized linear model to sparse data
We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, φ, is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of φ on Pearson's lack-of-fit statistic, with or without Farrington's modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
230
237
http://hdl.handle.net/10.1093/biomet/asr083
application/pdf
Access to full text is restricted to subscribers.
D. J. Fletcher
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:43-552012-05-01RePEc:oup:biomet
article
Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds
A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
43
55
http://hdl.handle.net/10.1093/biomet/asr078
application/pdf
Access to full text is restricted to subscribers.
Emma F. Eastoe
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:141-1502012-05-01RePEc:oup:biomet
article
A moving average Cholesky factor model in covariance modelling for longitudinal data
We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have a moving average and log-innovation interpretation and are modelled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the coefficient estimates. Furthermore, we study a local search algorithm, computationally more efficient than traditional all subset selection, based on bic �for model selection, and show its model selection consistency. Thus, a conjecture of Pan & MacKenzie (2003) is verified. We demonstrate the finite-sample performance of the method via analysis of data on CD4 trajectories and through simulations. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
141
150
http://hdl.handle.net/10.1093/biomet/asr068
application/pdf
Access to full text is restricted to subscribers.
Weiping Zhang
Chenlei Leng
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:151-1652012-05-01RePEc:oup:biomet
article
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
151
165
http://hdl.handle.net/10.1093/biomet/asr076
application/pdf
Access to full text is restricted to subscribers.
Grace Y. Yi
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:245-2512012-05-01RePEc:oup:biomet
article
Optimality of group testing in the presence of misclassification
Several optimality properties of Dorfman's (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/asr064
application/pdf
Access to full text is restricted to subscribers.
Aiyi Liu
Chunling Liu
Zhiwei Zhang
Paul S. Albert
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:57-692012-05-01RePEc:oup:biomet
article
Conservative hypothesis tests and confidence intervals using importance sampling
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
57
69
http://hdl.handle.net/10.1093/biomet/asr079
application/pdf
Access to full text is restricted to subscribers.
Matthew T. Harrison
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:211-2222012-05-01RePEc:oup:biomet
article
A proportional likelihood ratio model
We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
211
222
http://hdl.handle.net/10.1093/biomet/asr060
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:85-1002012-05-01RePEc:oup:biomet
article
Combining data from two independent surveys: a model-assisted approach
Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
85
100
http://hdl.handle.net/10.1093/biomet/asr063
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:127-1402012-05-01RePEc:oup:biomet
article
Bayesian analysis of multistate event history data: beta-Dirichlet process prior
Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
127
140
http://hdl.handle.net/10.1093/biomet/asr067
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Lancelot James
Rafael Weissbach
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:185-1972012-05-01RePEc:oup:biomet
article
Mean residual life models with time-dependent coefficients under right censoring
The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
185
197
http://hdl.handle.net/10.1093/biomet/asr065
application/pdf
Access to full text is restricted to subscribers.
Liuquan Sun
Xinyuan Song
Zhigang Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:167-1842012-05-01RePEc:oup:biomet
article
Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
167
184
http://hdl.handle.net/10.1093/biomet/asr062
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Ming-Hui Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:115-1262012-05-01RePEc:oup:biomet
article
Directed acyclic graphs with edge-specific bounds
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
115
126
http://hdl.handle.net/10.1093/biomet/asr059
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
Zhiqiang Tan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:29-422012-05-01RePEc:oup:biomet
article
A direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
29
42
http://hdl.handle.net/10.1093/biomet/asr066
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
Ming Yuan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:1-142012-05-01RePEc:oup:biomet
article
Studies in the history of probability and statistics, L: Karl Pearson and the Rule of Three
Karl Pearson's role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson's work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson's statistical work, and his three major achievements are briefly described. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asr046
application/pdf
Access to full text is restricted to subscribers.
Stephen M. Stigler
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:199-2102012-05-01RePEc:oup:biomet
article
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
199
210
http://hdl.handle.net/10.1093/biomet/asr072
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:101-1132012-05-01RePEc:oup:biomet
article
Optimal allocation to maximize the power of two-sample tests for binary response
We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
101
113
http://hdl.handle.net/10.1093/biomet/asr077
application/pdf
Access to full text is restricted to subscribers.
D. Azriel
M. Mandel
Y. Rinott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:223-2292012-05-01RePEc:oup:biomet
article
Proportional likelihood ratio models for mean regression
The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
223
229
http://hdl.handle.net/10.1093/biomet/asr075
application/pdf
Access to full text is restricted to subscribers.
Alan Huang
Paul J. Rathouz
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-1025a2010-11-09RePEc:oup:biomet
article
Amendments and Corrections
The paper included comparison of a 12-factor, 16-run design to randomly generated Latin hypercube designs and U-designs, with respect to the properties of their alias matrices. An error in a computer program led to incorrect computation of the properties of the alias matrix of the orthogonal design. A corrected version of Table 2 is provided here. The orthogonal Latin hypercube design still has better properties than the best of 100 random designs, but the differences are less striking than those in our original table. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025-a
text/html
Access to full text is restricted to subscribers.
David M. Steinberg
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:177-1932010-11-09RePEc:oup:biomet
article
Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data
For independent data, it is well known that kernel methods and spline methods are essentially asymptotically equivalent (Silverman, 1984). However, recent work of Welsh et al. (2002) shows that the same is not true for clustered/longitudinal data. Splines and conventional kernels are different in localness and ability to account for the within-cluster correlation. We show that a smoothing spline estimator is asymptotically equivalent to a recently proposed seemingly unrelated kernel estimator of Wang (2003) for any working covariance matrix. We show that both estimators can be obtained iteratively by applying conventional kernel or spline smoothing to pseudo-observations. This result allows us to study the asymptotic properties of the smoothing spline estimator by deriving its asymptotic bias and variance. We show that smoothing splines are consistent for an arbitrary working covariance and have the smallest variance when assuming the true covariance. We further show that both the seemingly unrelated kernel estimator and the smoothing spline estimator are nonlocal unless working independence is assumed but have asymptotically negligible bias. Their finite sample performance is compared through simulations. Our results justify the use of efficient, non-local estimators such as smoothing splines for clustered/longitudinal data. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
177
193
Xihong Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:219-2252010-11-09RePEc:oup:biomet
article
Estimating genetic association parameters from family data
We consider the problem of estimating a parameter theta, reflecting association between a disease and genotypes of a genetic polymorphism, using nuclear family data. In many applications, some parental genotypes are missing, and the distribution of these genotypes is unknown. Since misspecification of this distribution can bias estimators for theta, we consider estimating functions that are unbiased, regardless of how the distribution is specified. We call the resulting estimators parental-genotype-robust. Rabinowitz (2002) has proposed a constrained optimisation method for obtaining locally optimal unbiased tests of the null hypothesis of no association. We use a similar method to derive estimating functions that yield parental-genotype-robust estimators with minimum variance in the class of all such estimators. We extend the estimating functions to obtain parental-genotype-robust estimators when theta is a vector of unknown parameters, and show that the estimating functions enjoy a certain optimality property. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
219
225
Alice S. Whittemore
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10252010-11-09RePEc:oup:biomet
article
Amendments and Corrections
It has been brought to our attention that the implicit expression (6) for the estimator with general warping function had been derived earlier by B. Ronn, in an unpublished technical report of the Royal Veterinary and Agricultural University, Frederiksberg. However, the actual implementation and computation of the estimators are very different in our paper from in the technical report. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1025
http://hdl.handle.net/10.1093/biomet/93.4.1025
text/html
Access to full text is restricted to subscribers.
D. Gervini
T. Gasser
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:240-2452010-11-09RePEc:oup:biomet
article
Revisiting simple linear regression with autocorrelated errors
This paper studies properties of ordinary and generalised least squares estimators in a simple linear regression with stationary autocorrelated errors. Explicit expressions for the variances of the regression parameter estimators are derived for some common time series autocorrelation structures, including a first-order autoregression and general moving averages. Applications of the results include confidence intervals and an example where the variance of the trend slope estimator does not increase with increasing autocorrelation. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
240
245
Jaechoul Lee
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-5052010-11-09RePEc:oup:biomet
article
"A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505
text/html
Access to full text is restricted to subscribers.
Albert W. Marshall
Ingram Olkin
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:505-505a2010-11-09RePEc:oup:biomet
article
"Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies"
2
2005
92
June
Biometrika
505
505
http://hdl.handle.net/10.1093/biomet/92.2.505-a
text/html
Access to full text is restricted to subscribers.
Shaun R. Seaman
Sylvia Richardson
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:231-2372010-03-05RePEc:oup:biomet
article
Weighted least squares approximate restricted likelihood estimation for vector autoregressive processes
We derive a weighted least squares approximate restricted likelihood estimator for a k-dimensional pth-order autoregressive model with intercept. Exact likelihood optimization of this model is generally infeasible due to the parameter space, which is complicated and high-dimensional, involving pk-super-2 parameters. The weighted least squares estimator has significantly reduced bias and mean squared error than the ordinary least squares estimator for both stationary and nonstationary processes. Furthermore, at the unit root, the limiting distribution of the weighted least squares approximate restricted likelihood estimator is shown to be the zero-intercept Dickey--Fuller distribution, unlike the ordinary least squares with intercept estimator that has a different distribution with significantly higher bias. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
231
237
http://hdl.handle.net/10.1093/biomet/asp071
application/pdf
Access to full text is restricted to subscribers.
Willa W. Chen
Rohit S. Deo
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:181-1982010-03-05RePEc:oup:biomet
article
On Bayesian testimation and its application to wavelet thresholding
We consider the problem of estimating the unknown response function in the Gaussian white noise model. We first utilize the recently developed Bayesian maximum a posteriori testimation procedure of Abramovich et al. (2007) for recovering an unknown high-dimensional Gaussian mean vector. The existing results for its upper error bounds over various sparse l p-balls are extended to more general cases. We show that, for a properly chosen prior on the number of nonzero entries of the mean vector, the corresponding adaptive estimator is asymptotically minimax in a wide range of sparse and dense l p-balls. The proposed procedure is then applied in a wavelet context to derive adaptive global and level-wise wavelet estimators of the unknown response function in the Gaussian white noise model. These estimators are then proven to be, respectively, asymptotically near-minimax and minimax in a wide range of Besov balls. These results are also extended to the estimation of derivatives of the response function. Simulated examples are conducted to illustrate the performance of the proposed level-wise wavelet estimator in finite sample situations, and to compare it with several existing counterparts. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
181
198
http://hdl.handle.net/10.1093/biomet/asp080
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
Athanasia Petsa
Theofanis Sapatinas
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:49-642010-03-05RePEc:oup:biomet
article
Functional quadratic regression
We extend the common linear functional regression model to the case where the dependency of a scalar response on a functional predictor is of polynomial rather than linear nature. Focusing on the quadratic case, we demonstrate the usefulness of the polynomial functional regression model, which encompasses linear functional regression as a special case. Our approach works under mild conditions for the case of densely spaced observations and also can be extended to the important practical situation where the functional predictors are derived from sparse and irregular measurements, as is the case in many longitudinal studies. A key observation is the equivalence of the functional polynomial model with a regression model that is a polynomial of the same order in the functional principal component scores of the predictor processes. Theoretical analysis as well as practical implementations are based on this equivalence and on basis representations of predictor processes. We also obtain an explicit representation of the regression surface that defines quadratic functional regression and provide functional asymptotic results for an increasing number of model components as the number of subjects in the study increases. The improvements that can be gained by adopting quadratic as compared to linear functional regression are illustrated with a case study that includes absorption spectra as functional predictors. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
49
64
http://hdl.handle.net/10.1093/biomet/asp069
application/pdf
Access to full text is restricted to subscribers.
Fang Yao
Hans-Georg Müller
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:199-2082010-03-05RePEc:oup:biomet
article
Forecasting for quantile self-exciting threshold autoregressive time series models
Self-exciting threshold autoregressive time series models have been used extensively, and the conditional mean obtained from these models can be used to predict the future value of a random variable. In this paper we consider quantile forecasts of a time series based on the quantile self-exciting threshold autoregressive time series models proposed by Cai and Stander (2008) and present a new forecasting method for them. Simulation studies and application to real time series show that the method works very well. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
199
208
http://hdl.handle.net/10.1093/biomet/asp070
application/pdf
Access to full text is restricted to subscribers.
Yuzhi Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:31-482010-03-05RePEc:oup:biomet
article
Incorporating prior probabilities into high-dimensional classifiers
In standard parametric classifiers, or classifiers based on nonparametric methods but where there is an opportunity for estimating population densities, the prior probabilities of the respective populations play a key role. However, those probabilities are largely ignored in the construction of high-dimensional classifiers, partly because there are no likelihoods to be constructed or Bayes risks to be estimated. Nevertheless, including information about prior probabilities can reduce the overall error rate, particularly in cases where doing so is most important, i.e. when the classification problem is particularly challenging and error rates are not close to zero. In this paper we suggest a general approach to reducing error rate in this way, by using a method derived from Breiman's bagging idea. The potential improvements in performance are identified in theoretical and numerical work, the latter involving both applications to real data and simulations. The method is simple and explicit to apply, and does not involve choice of any tuning parameters. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
31
48
http://hdl.handle.net/10.1093/biomet/asp081
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Jing-Hao Xue
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:133-1452010-03-05RePEc:oup:biomet
article
A semiparametric random effects model for multivariate competing risks data
We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
133
145
http://hdl.handle.net/10.1093/biomet/asp082
application/pdf
Access to full text is restricted to subscribers.
Thomas H. Scheike
Yanqing Sun
Mei-Jie Zhang
Tina Kold Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:147-1582010-03-05RePEc:oup:biomet
article
Estimation of the retransformed conditional mean in health care cost studies
We propose a new approach for analyzing skewed and heteroscedastic health care cost data through regression of the conditional quantiles of the transformed cost. Using the appealing equivariance property of quantiles to monotone transformations, we propose a distribution-free estimator of the conditional mean cost on the original scale. The proposed method is extended to a two-part heteroscedastic model to account for zero costs commonly seen in health care cost studies. Simulation studies indicate that the proposed estimator has competitive and more robust performance than existing estimators in various heteroscedastic models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
147
158
http://hdl.handle.net/10.1093/biomet/asp072
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Xiao-Hua Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:95-1082010-03-05RePEc:oup:biomet
article
On the use of stochastic ordering to test for trend with clustered binary data
We introduce the use of stochastic ordering for defining treatment-related trend in clustered exchangeable binary data for both when cluster sizes are fixed and when they vary randomly. In the latter case, there is a well-documented tendency for such data to be sparse, a problem we address by making an assumption of interpretability or, equivalently, marginal compatibility. Our procedures are based on a representation of the joint distribution of binary exchangeable random variables by a saturated model, and may hence be considered nonparametric. The definition of trend by stochastic ordering is proposed to ensure a flexibility that allows for various forms of monotone increases in response to the cluster as a whole to be included in the evaluation of the trend. We obtain maximum likelihood estimates of probability functions under stochastic ordering using mixture-likelihood-based algorithms. Since the data are sparse, we avoid the use of asymptotic results and obtain p-values of the likelihood ratio procedures by permutation resampling. We demonstrate how the proposed framework can be used in risk assessment, and provide comparisons with existing procedures. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
95
108
http://hdl.handle.net/10.1093/biomet/asp077
application/pdf
Access to full text is restricted to subscribers.
Aniko Szabo
E. Olusegun George
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:1-132010-03-05RePEc:oup:biomet
article
Systematic sampling with errors in sample locations
Systematic sampling of points in continuous space is widely used in microscopy and spatial surveys. Classical theory provides asymptotic expressions for the variance of estimators based on systematic sampling as the grid spacing decreases. However, the classical theory assumes that the sample grid is exactly periodic; real physical sampling procedures may introduce errors in the placement of the sample points. This paper studies the effect of errors in sample positioning on the variance of estimators in the case of one-dimensional systematic sampling. First we sketch a general approach to variance analysis using point process methods. We then analyze three different models for the error process, calculate exact expressions for the variances, and derive asymptotic variances. Errors in the placement of sample points can lead to substantial inflation of the variance, dampening of zitterbewegung, that is fluctuation effects, and a slower order of convergence. This suggests that the current practice in some areas of microscopy may be based on over-optimistic predictions of estimator accuracy. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
1
13
http://hdl.handle.net/10.1093/biomet/asp067
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Adrian Baddeley
Karl-Anton Dorph-Petersen
Eva B. Vedel Jensen
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:171-1802010-03-05RePEc:oup:biomet
article
On doubly robust estimation in a semiparametric odds ratio model
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007). Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
171
180
http://hdl.handle.net/10.1093/biomet/asp062
application/pdf
Access to full text is restricted to subscribers.
Eric J. Tchetgen Tchetgen
James M. Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:79-932010-03-05RePEc:oup:biomet
article
Generalized empirical likelihood methods for analyzing longitudinal data
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
79
93
http://hdl.handle.net/10.1093/biomet/asp073
application/pdf
Access to full text is restricted to subscribers.
Suojin Wang
Lianfen Qian
Raymond J. Carroll
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:123-1322010-03-05RePEc:oup:biomet
article
Sharp bounds on causal effects in case-control and cohort studies
Evaluating the causal effect of an exposure on a response from case-control and cohort studies is a major concern in epidemiological and medical research. Since causal effects are in general nonidentifiable from such studies, this paper derives bounds on two causal measures: the causal risk difference and the causal risk ratio. We use the potential response approach and a linear programming method to derive sharp bounds on the causal risk difference, and a novel fractional programming method to derive bounds on the causal risk ratio. In addition, in the presence of missing data, we consider three different missingness mechanisms and propose sharp bounds under these situations. The results provide new guidance on causal inference in case-control and cohort studies. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
123
132
http://hdl.handle.net/10.1093/biomet/asp076
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
Zhi Geng
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:65-782010-03-05RePEc:oup:biomet
article
Marginal analyses of longitudinal data with an informative pattern of observations
We consider solutions to generalized estimating equations with singular working correlation matrices, of which the estimator of Diggle et al. (2007) is a special case. We give explicit conditions for consistent estimation when the pattern of observations may be informative. In such cases, simulations reveal reduced bias and reduced mean squared error compared with existing alternatives. A study of peritoneal dialysis is used to illustrate the methodology. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
65
78
http://hdl.handle.net/10.1093/biomet/asp068
application/pdf
Access to full text is restricted to subscribers.
D. M. Farewell
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:15-302010-03-05RePEc:oup:biomet
article
Cross-covariance functions for multivariate random fields based on latent dimensions
The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
15
30
http://hdl.handle.net/10.1093/biomet/asp078
application/pdf
Access to full text is restricted to subscribers.
Tatiyana V. Apanasovich
Marc G. Genton
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:254-2592010-03-05RePEc:oup:biomet
article
The maximal data piling direction for discrimination
We study a discriminant direction vector that generally exists only in high-dimension, low sample size settings. Projections of data onto this direction vector take on only two distinct values, one for each class. There exist infinitely many such directions in the subspace generated by the data; but the maximal data piling vector has the longest distance between the projections. This paper investigates mathematical properties and classification performance of this discrimination method. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
254
259
http://hdl.handle.net/10.1093/biomet/asp084
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:215-2222010-03-05RePEc:oup:biomet
article
Pseudo-score confidence intervals for parameters in discrete statistical models
We propose pseudo-score confidence intervals for parameters in models for discrete data. The confidence interval is obtained by inverting a test that uses a Pearson chi-squared statistic to compare fitted values for the working model with fitted values of the model when a parameter of interest takes various fixed values. For multinomial models, the pseudo-score method simplifies to the score method when the model is saturated and otherwise it is asymptotically equivalent to score and likelihood ratio test-based inferences. For cases in which ordinary score methods are impractical, such as when the likelihood function is not an explicit function of model parameters, the pseudo-score method is feasible. We illustrate the method for four such examples. Generalizations of the method are also presented for future research, including inference for complex sampling designs using a quasilikelihood Pearson statistic that compares fitted values for two models relative to the variance of the observations under the simpler model. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
215
222
http://hdl.handle.net/10.1093/biomet/asp074
application/pdf
Access to full text is restricted to subscribers.
Alan Agresti
Euijung Ryu
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:109-1212010-03-05RePEc:oup:biomet
article
Stochastic approximation with virtual observations for dose-finding on discrete levels
Phase I clinical studies are experiments in which a new drug is administered to humans to determine the maximum dose that causes toxicity with a target probability. Phase I dose-finding is often formulated as a quantile estimation problem. For studies with a biological endpoint, it is common to define toxicity by dichotomizing the continuous biomarker expression. In this article, we propose a novel variant of the Robbins--Monro stochastic approximation that utilizes the continuous measurements for quantile estimation. The Robbins--Monro method has seldom seen clinical applications, because it does not perform well for quantile estimation with binary data and it works with a continuum of doses that are generally not available in practice. To address these issues, we formulate the dose-finding problem as root-finding for the mean of a continuous variable, for which the stochastic approximation procedure is efficient. To accommodate the use of discrete doses, we introduce the idea of virtual observation that is defined on a continuous dosage range. Our proposed method inherits the convergence properties of the stochastic approximation algorithm and its computational simplicity. Simulations based on real trial data show that our proposed method improves accuracy compared with the continual re-assessment method and produces results robust to model misspecification. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
109
121
http://hdl.handle.net/10.1093/biomet/asp065
application/pdf
Access to full text is restricted to subscribers.
Ying Kuen Cheung
Mitchell S. V. Elkind
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:238-2452010-03-05RePEc:oup:biomet
article
Nonparametric Bayesian inference for the spectral density function of a random field
A powerful technique for inference concerning spatial dependence in a random field is to use spectral methods based on frequency domain analysis. Here we develop a nonparametric Bayesian approach to statistical inference for the spectral density of a random field. We construct a multi-dimensional Bernstein polynomial prior for the spectral density and devise a Markov chain Monte Carlo algorithm to simulate from the posterior of the spectral density. The posterior sampling enables us to obtain a smoothed estimate of the spectral density as well as credible bands at desired levels. Simulation shows that our proposed method is more robust than a parametric approach. For illustration, we analyse a soil data example. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
238
245
http://hdl.handle.net/10.1093/biomet/asp066
application/pdf
Access to full text is restricted to subscribers.
Yanbing Zheng
Jun Zhu
Anindya Roy
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:159-1702010-03-05RePEc:oup:biomet
article
Mean loglikelihood and higher-order approximations
Higher-order approximations to p-values can be obtained from the loglikelihood function and a reparameterization that can be viewed as a canonical parameter in an exponential family approximation to the model. This approach clarifies the connection between Skovgaard (1996) and Fraser et al. (1999a), and shows that the Skovgaard approximation can be obtained directly using the mean loglikelihood function. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
159
170
http://hdl.handle.net/10.1093/biomet/asq001
application/pdf
Access to full text is restricted to subscribers.
N. Reid
D. A. S. Fraser
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:209-2142010-03-05RePEc:oup:biomet
article
A note on the sensitivity to assumptions of a generalized linear mixed model
A simple case of Poisson regression is used to study the potential gain in efficiency from using a mixed model representation. Possible systematic errors arising from misspecification of the random terms in the model are examined. It is shown in particular that for a special but realistic problem, appreciable bias may arise from misspecification of a random component. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
209
214
http://hdl.handle.net/10.1093/biomet/asp083
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
M. Y. Wong
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:246-2532010-03-05RePEc:oup:biomet
article
The distribution-based p-value for the outlier sum in differential gene expression analysis
Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
246
253
http://hdl.handle.net/10.1093/biomet/asp075
application/pdf
Access to full text is restricted to subscribers.
Lin-An Chen
Dung-Tsa Chen
Wenyaw Chan
oai:RePEc:oup:biomet:v:97:y:2010:i:1:p:223-2302010-03-05RePEc:oup:biomet
article
Global and local spectral-based tests for periodicities
We investigate tests for periodicity based on a spectral analysis of a time series, differentiating between global and local spectral-based tests. Global tests use information across the entire frequency band,whereas local tests are based on a window around the test frequency.We show that many spectral-based tests can be expressed in terms of a regression-based F test, which allows for approximate size and power calculations. Since global tests are usually derived assuming white noise errors, we extend to the correlated noise case. We demonstrate via a Monte Carlo study that although the global test may have better size and power, local tests are easier to use, and are comparable or better in terms of the power to detect periodicities, especially for spectra with a large dynamic range. We apply this methodology to a nonbehavioural test of hearing. Copyright 2010, Oxford University Press.
1
2010
97
Biometrika
223
230
http://hdl.handle.net/10.1093/biomet/asp079
application/pdf
Access to full text is restricted to subscribers.
L. Wei
P. F. Craigmile
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:929-9442013-01-01RePEc:oup:biomet
article
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction
Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
929
944
http://hdl.handle.net/10.1093/biomet/ass044
application/pdf
Access to full text is restricted to subscribers.
James Y. Dai
Charles Kooperberg
Michael Leblanc
Ross L. Prentice
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:799-8112013-01-01RePEc:oup:biomet
article
Choosing trajectory and data type when classifying functional data
In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
799
811
http://hdl.handle.net/10.1093/biomet/ass011
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Tapabrata Maiti
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:995-10002013-01-01RePEc:oup:biomet
article
Proportional mean residual life model for right-censored length-biased data
To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika 77, 409--10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
995
1000
http://hdl.handle.net/10.1093/biomet/ass049
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
Ying Qing Chen
Chong-Zhi Di
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:981-9882013-01-01RePEc:oup:biomet
article
Finite population estimators in stochastic search variable selection
Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz--Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
981
988
http://hdl.handle.net/10.1093/biomet/ass040
application/pdf
Access to full text is restricted to subscribers.
Merlise A. Clyde
Joyee Ghosh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:813-8322013-01-01RePEc:oup:biomet
article
Dispersion operators and resistant second-order functional data analysis
Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M-test based on a spectrally truncated version of the Hilbert--Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
813
832
http://hdl.handle.net/10.1093/biomet/ass037
application/pdf
Access to full text is restricted to subscribers.
David Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:945-9582013-01-01RePEc:oup:biomet
article
Penalized balanced sampling
Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz--Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz--Thompson estimation dominates a variety of standard strategies. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
945
958
http://hdl.handle.net/10.1093/biomet/ass033
application/pdf
Access to full text is restricted to subscribers.
F. J. Breidt
G. Chauvet
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:915-9282013-01-01RePEc:oup:biomet
article
On the sparsity of signals in a random sample
This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
915
928
http://hdl.handle.net/10.1093/biomet/ass039
application/pdf
Access to full text is restricted to subscribers.
Binyan Jiang
Wei-Liem Loh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:879-8982013-01-01RePEc:oup:biomet
article
Scaled sparse linear regression
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
879
898
http://hdl.handle.net/10.1093/biomet/ass043
application/pdf
Access to full text is restricted to subscribers.
Tingni Sun
Cun-Hui Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:833-8492013-01-01RePEc:oup:biomet
article
A geometric approach to projective shape and the cross ratio
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
833
849
http://hdl.handle.net/10.1093/biomet/ass055
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Kanti V. Mardia
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:775-7862013-01-01RePEc:oup:biomet
article
Classification based on a permanental process with cyclic approximation
We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2--3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
775
786
http://hdl.handle.net/10.1093/biomet/ass047
application/pdf
Access to full text is restricted to subscribers.
J. Yang
K. Miescke
P. McCullagh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:959-9722013-01-01RePEc:oup:biomet
article
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
959
972
http://hdl.handle.net/10.1093/biomet/ass036
application/pdf
Access to full text is restricted to subscribers.
Ronald W. Butler
Douglas A. Bronson
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:865-8772013-01-01RePEc:oup:biomet
article
A two-stage dimension-reduction method for transformed responses and its applications
Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
865
877
http://hdl.handle.net/10.1093/biomet/ass042
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:787-7982013-01-01RePEc:oup:biomet
article
Orthogonalization of vectors with minimal adjustment
Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
787
798
http://hdl.handle.net/10.1093/biomet/ass041
application/pdf
Access to full text is restricted to subscribers.
Paul H. Garthwaite
Frank Critchley
Karim Anaya-Izquierdo
Emmanuel Mubwandarikwa
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:973-9802013-01-01RePEc:oup:biomet
article
Statistical properties of an early stopping rule for resampling-based multiple testing
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
973
980
http://hdl.handle.net/10.1093/biomet/ass051
application/pdf
Access to full text is restricted to subscribers.
Hui Jiang
Julia Salzman
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:1001-10072013-01-01RePEc:oup:biomet
article
An efficient empirical likelihood approach for estimating equations with missing data
We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
1001
1007
http://hdl.handle.net/10.1093/biomet/ass045
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Yongsong Qin
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:763-7742013-01-01RePEc:oup:biomet
article
Testing one hypothesis twice in observational studies
In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
763
774
http://hdl.handle.net/10.1093/biomet/ass032
application/pdf
Access to full text is restricted to subscribers.
P. R. Rosenbaum
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:989-9942013-01-01RePEc:oup:biomet
article
Compatible weighted proper scoring rules
Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
989
994
http://hdl.handle.net/10.1093/biomet/ass046
application/pdf
Access to full text is restricted to subscribers.
P. G. M. Forbes
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:851-8642013-01-01RePEc:oup:biomet
article
Bidirectional discrimination with application to data visualization
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
851
864
http://hdl.handle.net/10.1093/biomet/ass029
application/pdf
Access to full text is restricted to subscribers.
Hanwen Huang
Yufeng Liu
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:899-9142013-01-01RePEc:oup:biomet
article
Simultaneous supervised clustering and feature selection over a graph
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
899
914
http://hdl.handle.net/10.1093/biomet/ass038
application/pdf
Access to full text is restricted to subscribers.
Xiaotong Shen
Hsin-Cheng Huang
Wei Pan
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:305-3192010-09-29RePEc:oup:biomet
article
Semiparametric dimension reduction estimation for mean response with missing data
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
305
319
http://hdl.handle.net/10.1093/biomet/asq005
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Jing Qin
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:631-6452010-09-29RePEc:oup:biomet
article
Detecting simultaneous changepoints in multiple sequences
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
631
645
http://hdl.handle.net/10.1093/biomet/asq025
application/pdf
Access to full text is restricted to subscribers.
Nancy R. Zhang
David O. Siegmund
Hanlee Ji
Jun Z. Li
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:551-5662010-09-29RePEc:oup:biomet
article
Penalized Bregman divergence for large-dimensional regression and classification
Regularization methods are characterized by loss functions measuring data fits and penalty terms constraining model parameters. The commonly used quadratic loss is not suitable for classification with binary responses, whereas the loglikelihood function is not readily applicable to models where the exact distribution of observations is unknown or not fully specified. We introduce the penalized Bregman divergence by replacing the negative loglikelihood in the conventional penalized likelihood with Bregman divergence, which encompasses many commonly used loss functions in the regression analysis, classification procedures and machine learning literature. We investigate new statistical properties of the resulting class of estimators with the number p n of parameters either diverging with the sample size n or even nearly comparable with n, and develop statistical inference tools. It is shown that the resulting penalized estimator, combined with appropriate penalties, achieves the same oracle property as the penalized likelihood estimator, but asymptotically does not rely on the complete specification of the underlying distribution. Furthermore, the choice of loss function in the penalized classifiers has an asymptotically relatively negligible impact on classification performance. We illustrate the proposed method for quasilikelihood regression and binary classification with simulation evaluation and real-data application. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
551
566
http://hdl.handle.net/10.1093/biomet/asq033
application/pdf
Access to full text is restricted to subscribers.
Chunming Zhang
Yuan Jiang
Yi Chai
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:621-6302010-09-29RePEc:oup:biomet
article
Accurate and robust tests for indirect inference
In this paper we propose accurate parameter and over-identification tests for indirect inference. Under the null hypothesis the new tests are asymptotically χ-super-2-distributed with a relative error of order n-super- - 1. They exhibit better finite sample accuracy than classical tests for indirect inference, which have the same asymptotic distribution but an absolute error of order n-super- - 1-2. Robust versions of the tests are also provided. We illustrate their accuracy in nonlinear regression, Poisson regression with overdispersion and diffusion models. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
621
630
http://hdl.handle.net/10.1093/biomet/asq040
application/pdf
Access to full text is restricted to subscribers.
Veronika Czellar
Elvezio Ronchetti
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:405-4182010-09-29RePEc:oup:biomet
article
Interval estimation for drop-the-losers designs
In the first stage of a two-stage, drop-the-losers design, a candidate for the best treatment is selected. At the second stage, additional observations are collected to decide whether the candidate is actually better than the control. The design also allows the investigator to stop the trial for ethical reasons at the end of the first stage if there is already strong evidence of futility or superiority. Two types of tests have recently been developed, one based on the combined means and the other based on the combined p-values, but corresponding interval estimators are unavailable except in special cases. The problem is that, in most cases, the interval estimators depend on the mean configuration of all treatments in the first stage, which is unknown. In this paper, we prove a basic stochastic ordering lemma that enables us to bridge the gap between hypothesis testing and interval estimation. The proposed confidence intervals achieve the nominal confidence level in certain special cases. Simulations show that decisions based on our intervals are usually more powerful than those based on existing methods. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
405
418
http://hdl.handle.net/10.1093/biomet/asq003
application/pdf
Access to full text is restricted to subscribers.
Samuel S. Wu
Weizhen Wang
Mark C. K. Yang
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:513-5182010-09-29RePEc:oup:biomet
article
Optimal designs for the emax, log-linear and exponential models
We derive locally D- and ED p-optimal designs for the exponential, log-linear and three-parameter emax models. For each model the locally D- and ED p-optimal designs are supported at the same set of points, while the corresponding weights are different. This indicates that for a given model, D-optimal designs are efficient for estimating the smallest dose that achieves 100p% of the maximum effect in the observed dose range. Conversely, ED p-optimal designs also yield good D-efficiencies. We illustrate the results using several examples and demonstrate that locally D- and ED p-optimal designs for the emax, log-linear and exponential models are relatively robust with respect to misspecification of the model parameters. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
513
518
http://hdl.handle.net/10.1093/biomet/asq020
application/pdf
Access to full text is restricted to subscribers.
H. Dette
C. Kiss
M. Bevanda
F. Bretz
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:603-6202010-09-29RePEc:oup:biomet
article
On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, , , based on the pseudolikelihood , where is a consistent estimator of , the nuisance parameter. We show that the asymptotic distribution of T under H 0 is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, , or the true value of the nuisance parameter, , lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
603
620
http://hdl.handle.net/10.1093/biomet/asq031
application/pdf
Access to full text is restricted to subscribers.
Yong Chen
Kung-Yee Liang
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:765-7722010-09-29RePEc:oup:biomet
article
Strictly stationary solutions of autoregressive moving average equations
Necessary and sufficient conditions for the existence of a strictly stationary solution of the equations defining an autoregressive moving average process driven by an independent and identically distributed noise sequence are determined. No moment assumptions on the driving noise sequence are made. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
765
772
http://hdl.handle.net/10.1093/biomet/asq034
application/pdf
Access to full text is restricted to subscribers.
Peter J. Brockwell
Alexander Lindner
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:447-4642010-09-29RePEc:oup:biomet
article
A sequential smoothing algorithm with linear computational cost
In this paper we propose a new particle smoother that has a computational complexity of O(N), where N is the number of particles. This compares favourably with the O(N-super-2) computational cost of most smoothers. The new method also overcomes some degeneracy problems in existing algorithms. Through simulation studies we show that substantial gains in efficiency are obtained for practical amounts of computational cost. It is shown both through these simulation studies, and by the analysis of an athletics dataset, that our new method also substantially outperforms the simple filter-smoother, the only other smoother with computational cost that is O(N). Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
447
464
http://hdl.handle.net/10.1093/biomet/asq013
application/pdf
Access to full text is restricted to subscribers.
Paul Fearnhead
David Wyncoll
Jonathan Tawn
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:741-7552010-09-29RePEc:oup:biomet
article
Properties of nested sampling
Nested sampling is a simulation method for approximating marginal likelihoods. We establish that nested sampling has an approximation error that vanishes at the standard Monte Carlo rate and that this error is asymptotically Gaussian. It is shown that the asymptotic variance of the nested sampling approximation typically grows linearly with the dimension of the parameter. We discuss the applicability and efficiency of nested sampling in realistic problems, and compare it with two current methods for computing marginal likelihood. Finally, we propose an extension that avoids resorting to Markov chain Monte Carlo simulation to obtain the simulated points. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
741
755
http://hdl.handle.net/10.1093/biomet/asq021
application/pdf
Access to full text is restricted to subscribers.
Nicolas Chopin
Christian P. Robert
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:567-5842010-09-29RePEc:oup:biomet
article
Shape curves and geodesic modelling
A family of shape curves is introduced that is useful for modelling the changes in shape in a series of geometrical objects. The relationship between the preshape sphere and the shape space is used to define a general family of curves based on horizontal geodesics on the preshape sphere. Methods for fitting geodesics and more general curves in the non-Euclidean shape space of point sets are discussed, based on minimizing sums of squares of Procrustes distances. Likelihood-based inference is considered. We illustrate the ideas by carrying out statistical analysis of two-dimensional landmarks on rats' skulls at various times in their development and three-dimensional landmarks on lumbar vertebrae from three primate species. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
567
584
http://hdl.handle.net/10.1093/biomet/asq027
application/pdf
Access to full text is restricted to subscribers.
Kim Kenobi
Ian L. Dryden
Huiling Le
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:321-3322010-09-29RePEc:oup:biomet
article
On the relative efficiency of using summary statistics versus individual-level data in meta-analysis
Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
321
332
http://hdl.handle.net/10.1093/biomet/asq006
application/pdf
Access to full text is restricted to subscribers.
D. Y. Lin
D. Zeng
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:333-3452010-09-29RePEc:oup:biomet
article
Evidence factors in observational studies
Some experiments involve more than one random assignment of treatments to units. An analogous situation arises in certain observational studies, although randomization is not used, so each assignment may be biased. If each assignment is suspect, it is natural to ask whether there are separate pieces of information, dependent upon different assumptions, and perhaps whether conclusions about treatment effects are not critically dependent upon one or another suspect assumption. The design of an observational study contains evidence factors if it permits several statistically independent tests of the same null hypothesis about treatment effects, where these tests rely on different assumptions about treatment assignments at several levels of assignment. Two designs and two empirical examples are considered, one example of each design. In the dose-control design, there are matched pairs of a treated subject and an untreated control, and doses of treatment vary between pairs for treated subjects; this yields two evidence factors. In the varied intensity design, there are matched sets with two treated subjects and one or more untreated controls, where the two treated subjects within the same matched set receive different doses of treatment, and in a technically different way, the design yields two evidence factors. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
333
345
http://hdl.handle.net/10.1093/biomet/asq019
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:497-5042010-09-29RePEc:oup:biomet
article
Objective Bayes and conditional inference in exponential families
Objective Bayes methodology is considered for conditional frequentist inference about a canonical parameter in a multi-parameter exponential family. A condition is derived under which posterior Bayes quantiles match the conditional frequentist coverage to a higher-order approximation in terms of the sample size. This condition is on the model, not on the prior, and it ensures that any first-order probability matching prior in the unconditional sense automatically yields higher-order conditional probability matching. Objective Bayes methods are compared to parametric bootstrap and analytic methods for higher-order conditional frequentist inference. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
497
504
http://hdl.handle.net/10.1093/biomet/asq002
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:757-7642010-09-29RePEc:oup:biomet
article
Empirical likelihood methods for two-dimensional shape analysis
We consider empirical likelihood for the mean similarity shape of objects in two dimensions described by labelled landmarks. The restriction to two dimensions permits the representation of preshapes as complex unit vectors. We focus on the use of empirical likelihood techniques for the construction of confidence regions for the mean shape and for testing the hypothesis of a common mean shape across several populations. Theoretical properties and computational details are discussed and the results of a simulation study are presented. Our results show that bootstrap calibrated empirical likelihood performs well in practice in the planar shape setting. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
757
764
http://hdl.handle.net/10.1093/biomet/asq028
application/pdf
Access to full text is restricted to subscribers.
Getulio J. A. Amaral
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:683-6982010-09-29RePEc:oup:biomet
article
Analysis of cohort studies with multivariate and partially observed disease classification data
Complex diseases like cancers can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semiparametric Cox proportional hazards regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating equation approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of the estimating equation method under a general missing-at-random assumption and propose a novel influence-function-based sandwich variance estimator. The methods are illustrated using simulation studies and a real data application involving the Cancer Prevention Study II nutrition cohort. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
683
698
http://hdl.handle.net/10.1093/biomet/asq036
application/pdf
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Samiran Sinha
W. Ryan Diver
Heather Spencer Feigelson
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:419-4332010-09-29RePEc:oup:biomet
article
Efficient scalable schemes for monitoring a large number of data streams
The sequential changepoint detection problem is studied in the context of global online monitoring of a large number of independent data streams. We are interested in detecting an occurring event as soon as possible, but we do not know when the event will occur, nor do we know which subset of data streams will be affected by the event. A family of scalable schemes is proposed based on the sum of the local cumulative sum, cusum , statistics from each individual data stream, and is shown to asymptotically minimize the detection delays for each and every possible combination of affected data streams, subject to the global false alarm constraint. The usefulness and limitations of our asymptotic optimality results are illustrated by numerical simulations and heuristic arguments. The Appendices contain a probabilistic result on the first epoch to simultaneous record values for multiple independent random walks. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
419
433
http://hdl.handle.net/10.1093/biomet/asq010
application/pdf
Access to full text is restricted to subscribers.
Y. Mei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:585-6012010-09-29RePEc:oup:biomet
article
A class of grouped Brunk estimators and penalized spline estimators for monotone regression
We study a class of monotone univariate regression estimators. We use B-splines to approximate an underlying regression function and estimate spline coefficients based on grouped data. We investigate asymptotic properties of two monotone estimators: a grouped Brunk estimator and a penalized monotone estimator. These estimators are consistent at the boundary and their mean square errors achieve optimal convergence rates under suitable assumptions of the true regression function. Asymptotic distributions are developed and are shown to be independent of spline degrees and the number of knots. Simulation results and car data illustrate performance of the proposed estimators. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asq029
application/pdf
Access to full text is restricted to subscribers.
Xiao Wang
Jinglai Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:435-4462010-09-29RePEc:oup:biomet
article
Estimating linear dependence between nonstationary time series using the locally stationary wavelet model
Large volumes of neuroscience data comprise multiple, nonstationary electrophysiological or neuroimaging time series recorded from different brain regions. Accurately estimating the dependence between such neural time series is critical, since changes in the dependence structure are presumed to reflect functional interactions between neuronal populations. We propose a new dependence measure, derived from a bivariate locally stationary wavelet time series model. Since wavelets are localized in both time and scale, this approach leads to a natural, local and multi-scale estimate of nonstationary dependence. Our methodology is illustrated by application to a simulated example, and to electrophysiological data relating to interactions between the rat hippocampus and prefrontal cortex during working memory and decision making. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
435
446
http://hdl.handle.net/10.1093/biomet/asq007
application/pdf
Access to full text is restricted to subscribers.
J. Sanderson
P. Fryzlewicz
M. W. Jones
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:727-7402010-09-29RePEc:oup:biomet
article
Estimating species richness by a Poisson-compound gamma model
We propose a Poisson-compound gamma approach for species richness estimation. Based on the denseness and nesting properties of the gamma mixture, we fix the shape parameter of each gamma component at a unified value, and estimate the mixture using nonparametric maximum likelihood. A least-squares crossvalidation procedure is proposed for the choice of the common shape parameter. The performance of the resulting estimator of N is assessed using numerical studies and genomic data. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/asq026
application/pdf
Access to full text is restricted to subscribers.
Ji-Ping Wang
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:361-3742010-09-29RePEc:oup:biomet
article
Efficient estimation in multi-phase case-control studies
In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979), work by Breslow & Cain (1988), Scott & Wild (1991), Breslow & Holubkov (1997) and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations used in a Newton--Raphson or Fisher-scoring maximization of the prospective loglikelihood. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
361
374
http://hdl.handle.net/10.1093/biomet/asq009
application/pdf
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:347-3602010-09-29RePEc:oup:biomet
article
A theory for testing hypotheses under covariate-adaptive randomization
The covariate-adaptive randomization method was proposed for clinical trials long ago but little theoretical work has been done for statistical inference associated with it. Practitioners often apply test procedures available for simple randomization, which is controversial since procedures valid under simple randomization may not be valid under other randomization schemes. In this paper, we provide some theoretical results for testing hypotheses after covariate-adaptive randomization. We show that one way to obtain a valid test procedure is to use a correct model between outcomes and covariates, including those used in randomization. We also show that the simple two sample t-test, without using any covariate, is conservative under covariate-adaptive biased coin randomization in terms of its Type I error, and that a valid bootstrap t-test can be constructed. The powers of several tests are examined theoretically and empirically. Our study provides guidance for applications and sheds light on further research in this area. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
347
360
http://hdl.handle.net/10.1093/biomet/asq014
application/pdf
Access to full text is restricted to subscribers.
Jun Shao
Xinxin Yu
Bob Zhong
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:505-5122010-09-29RePEc:oup:biomet
article
Copula inference under censoring
This paper discusses copula model selection procedures and goodness-of-fit tests under censoring. The proposed methodology is based on a comparison of nonparametric and model-based estimators of the probability integral transformation, K. New weighted estimators for K are introduced. The resulting tests are compared to an existing approach by simulation and illustrated with an example involving bleeding changes in a woman's reproductive history. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
505
512
http://hdl.handle.net/10.1093/biomet/asq011
application/pdf
Access to full text is restricted to subscribers.
M. L. Lakhal-Chaieb
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:647-6592010-09-29RePEc:oup:biomet
article
Sufficient cause interactions for categorical and ordinal exposures with three levels
Definitions are given for weak and strong sufficient cause interactions in settings in which the outcome is binary and in which there are two exposures of interest that are categorical or ordinal. Weak sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes some other value. Strong sufficient cause interactions concern cases in which a mechanism will operate under certain values of the two exposures but not when one or the other of the exposures takes any other value. Empirical conditions are derived for such interactions when exposures have two or three levels and are related to regression coefficients in linear and log-linear models. When the exposures are binary, the notions of a weak and a strong sufficient cause interaction coincide, but not when the exposures are categorical or ordinal. The results are applied to examples concerning gene-gene and gene-environment interactions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
647
659
http://hdl.handle.net/10.1093/biomet/asq030
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:539-5502010-09-29RePEc:oup:biomet
article
A new approach to Cholesky-based covariance regularization in high dimensions
In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well-known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretation always results in a positive definite estimator. In particular, one can obtain a positive definite banded estimator of the covariance matrix at the same computational cost as the popular banded estimator of Bickel & Levina (2008b), which is not guaranteed to be positive definite. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint, and compare the numerical performance of several methods in simulations and on a sonar data example. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
539
550
http://hdl.handle.net/10.1093/biomet/asq022
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
Elizaveta Levina
Ji Zhu
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:261-2782010-09-29RePEc:oup:biomet
article
Variable selection in high-dimensional linear models: partially faithful distributions and the pc -simple algorithm
We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the pc algorithm (Spirtes et al., 2000), which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
261
278
http://hdl.handle.net/10.1093/biomet/asq008
application/pdf
Access to full text is restricted to subscribers.
P. Bühlmann
M. Kalisch
M. H. Maathuis
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:389-4042010-09-29RePEc:oup:biomet
article
Calibrating parametric subject-specific risk estimation
For modern evidence-based medicine, decisions on disease prevention or management strategies are often guided by a risk index system. For each individual, the system uses his/her baseline information to estimate the risk of experiencing a future disease-related clinical event. Such a risk scoring scheme is usually derived from an overly simplified parametric model. To validate a model-based procedure, one may perform a standard global evaluation via, for instance, a receiver operating characteristic analysis. In this article, we propose a method to calibrate the risk index system at a subject level. Specifically, we developed point and interval estimation procedures for t-year mortality rates conditional on the estimated parametric risk score. The proposals are illustrated with a dataset from a large clinical trial with post-myocardial infarction patients. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
389
404
http://hdl.handle.net/10.1093/biomet/asq012
application/pdf
Access to full text is restricted to subscribers.
T. Cai
L. Tian
Hajime Uno
Scott D. Solomon
L. J. Wei
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:699-7122010-09-29RePEc:oup:biomet
article
A semiparametric additive rate model for recurrent events with an informative terminal event
We propose a semiparametric additive rate model for modelling recurrent events in the presence of a terminal event. The dependence between recurrent events and terminal event is nonparametric. A general transformation model is used to model the terminal event. We construct an estimating equation for parameter estimation and derive the asymptotic distributions of the proposed estimators. Simulation studies demonstrate that the proposed inference procedure performs well in realistic settings. Application to a medical study is presented. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
699
712
http://hdl.handle.net/10.1093/biomet/asq039
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Jianwen Cai
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:375-3882010-09-29RePEc:oup:biomet
article
Risk-adjusted monitoring of time to event
Recently there has been interest in risk-adjusted cumulative sum charts, CUSUMs , to monitor the performance of e.g. hospitals, taking into account the heterogeneity of patients. Even though many outcomes involve time, only conventional regression models are commonly used. In this article we investigate how time to event models may be used for monitoring purposes. We consider monitoring using CUSUMs based on the partial likelihood ratio between an out-of-control state and an in-control state. We consider both proportional and non-proportional alternatives, as well as a head start. Against proportional alternatives, we present an analytic method of computing the expected number of observed events before stopping or the probability of stopping before a given observed number of events. In a stationary set-up, the former is roughly proportional to the average run length in calendar time. Adding a head start changes the threshold only slightly if the expected number of events until hitting is used as a criterion. However, it changes the threshold substantially if a false alarm probability is used. In simulation studies, charts based on survival analysis perform better than simpler monitoring schemes. We present one example from retail finance and one medical application. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
375
388
http://hdl.handle.net/10.1093/biomet/asq004
application/pdf
Access to full text is restricted to subscribers.
A. Gandy
J. T. Kvaløy
A. Bottle
F. Zhou
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:465-4802010-09-29RePEc:oup:biomet
article
The horseshoe estimator for sparse signals
This paper proposes a new approach to sparsity, called the horseshoe estimator, which arises from a prior based on multivariate-normal scale mixtures. We describe the estimator's advantages over existing approaches, including its robustness, adaptivity to different sparsity patterns and analytical tractability. We prove two theorems: one that characterizes the horseshoe estimator's tail robustness and the other that demonstrates a super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using both real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers obtained by Bayesian model averaging under a point-mass mixture prior. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
465
480
http://hdl.handle.net/10.1093/biomet/asq017
application/pdf
Access to full text is restricted to subscribers.
Carlos M. Carvalho
Nicholas G. Polson
James G. Scott
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:295-3042010-09-29RePEc:oup:biomet
article
Sufficient dimension reduction through discretization-expectation estimation
In the context of sufficient dimension reduction, the goal is to parsimoniously recover the central subspace of a regression model. Many inverse regression methods use slicing estimation to recover the central subspace. The efficacy of slicing estimation depends heavily upon the number of slices. However, the selection of the number of slices is an open and long-standing problem. In this paper, we propose a discretization-expectation estimation method, which avoids selecting the number of slices, while preserving the integrity of the central subspace. This generic method assures root-n consistency and asymptotic normality of slicing estimators for many inverse regression methods, and can be applied to regressions with multivariate responses. A BIC -type criterion for the dimension of the central subspace is proposed. Comprehensive simulations and an illustrative application show that our method compares favourably with existing estimators. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
295
304
http://hdl.handle.net/10.1093/biomet/asq018
application/pdf
Access to full text is restricted to subscribers.
Liping Zhu
Tao Wang
Lixing Zhu
Louis Ferré
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:661-6822010-09-29RePEc:oup:biomet
article
Bounded, efficient and doubly robust estimation with inverse weighting
Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on a previous nonparametric likelihood approach and propose new doubly robust estimators, which have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare the new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
661
682
http://hdl.handle.net/10.1093/biomet/asq035
application/pdf
Access to full text is restricted to subscribers.
Zhiqiang Tan
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:519-5382010-09-29RePEc:oup:biomet
article
Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
519
538
http://hdl.handle.net/10.1093/biomet/asq038
application/pdf
Access to full text is restricted to subscribers.
Ali Shojaie
George Michailidis
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:481-4962010-09-29RePEc:oup:biomet
article
Likelihood ratio statistics based on an integrated likelihood
An integrated likelihood depends only on the parameter of interest and the data, so it can be used as a standard likelihood function for likelihood-based inference. In this paper, the higher-order asymptotic properties of the signed integrated likelihood ratio statistic for a scalar parameter of interest are considered. These results are used to construct a modified integrated likelihood ratio statistic and to suggest a class of prior densities to use in forming the integrated likelihood. The properties of the integrated likelihood ratio statistic are compared to those of the standard likelihood ratio statistic. Several examples show that the integrated likelihood ratio statistic can be a useful alternative to the standard likelihood ratio statistic. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
481
496
http://hdl.handle.net/10.1093/biomet/asq015
application/pdf
Access to full text is restricted to subscribers.
T. A. Severini
oai:RePEc:oup:biomet:v:97:y:2010:i:2:p:279-2942010-09-29RePEc:oup:biomet
article
Dimension reduction for non-elliptically distributed predictors: second-order methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify first-order methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to second-order methods, such as sliced average variance estimation and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment-based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts. Copyright 2010, Oxford University Press.
2
2010
97
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asq016
application/pdf
Access to full text is restricted to subscribers.
Yuexiao Dong
Bing Li
oai:RePEc:oup:biomet:v:97:y:2010:i:3:p:713-7262010-09-29RePEc:oup:biomet
article
Attributable fraction functions for censored event times
Attributable fractions are commonly used to measure the impact of risk factors on disease incidence in the population. These static measures can be extended to functions of time when the time to disease occurrence or event time is of interest. The present paper deals with nonparametric and semiparametric estimation of attributable fraction functions for cohort studies with potentially censored event time data. The semiparametric models include the familiar proportional hazards model and a broad class of transformation models. The proposed estimators are shown to be consistent, asymptotically normal and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. A cardiovascular health study is provided. Connections to causal inference are discussed. Copyright 2010, Oxford University Press.
3
2010
97
Biometrika
713
726
http://hdl.handle.net/10.1093/biomet/asq023
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:975-9822009-12-01RePEc:oup:biomet
article
Maximum likelihood estimation using composite likelihoods for closed exponential families
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
975
982
http://hdl.handle.net/10.1093/biomet/asp056
application/pdf
Access to full text is restricted to subscribers.
Kanti V. Mardia
John T. Kent
Gareth Hughes
Charles C. Taylor
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:805-8202009-12-01RePEc:oup:biomet
article
Inference on population size in binomial detectability models
Many models for biological populations, including simple mark-recapture models and distance sampling models, involve a binomially distributed number, n, of observations x 1, …, x n on members of a population of size N. Two popular estimators of (N, θ), where θ is a vector parameter, are the maximum likelihood estimator and the conditional maximum likelihood estimator based on the conditional distribution of x 1, …, x n given n. We derive the large-N asymptotic distributions of and , and give formulae for the biases of and . We show that the difference is, remarkably, of order 1 and we give a simple formula for the leading part of this difference. Simulations indicate that in many cases this formula is very accurate and that confidence intervals based on the asymptotic distribution have excellent coverage. An extension to product-binomial models is given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
805
820
http://hdl.handle.net/10.1093/biomet/asp051
application/pdf
Access to full text is restricted to subscribers.
R. M. Fewster
P. E. Jupp
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:957-9702009-12-01RePEc:oup:biomet
article
Nested Latin hypercube designs
We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
957
970
http://hdl.handle.net/10.1093/biomet/asp045
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:847-8602009-12-01RePEc:oup:biomet
article
Generalized fiducial inference for wavelet regression
We apply Fisher's fiducial idea to wavelet regression, first developing a general methodology for handling model selection problems within the fiducial framework. We propose fiducial-based methods for wavelet curve estimation and the construction of pointwise confidence intervals. We show that these confidence intervals have asymptotically correct coverage. Simulations demonstrate that they possess promising empirical properties. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
847
860
http://hdl.handle.net/10.1093/biomet/asp050
application/pdf
Access to full text is restricted to subscribers.
Jan Hannig
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:761-7802009-12-01RePEc:oup:biomet
article
Sinh-arcsinh distributions
We introduce the sinh-arcsinh transformation and hence, by applying it to a generating distribution with no parameters other than location and scale, usually the normal, a new family of sinh-arcsinh distributions. This four-parameter family has symmetric and skewed members and allows for tailweights that are both heavier and lighter than those of the generating distribution. The central place of the normal distribution in this family affords likelihood ratio tests of normality that are superior to the state-of-the-art in normality testing because of the range of alternatives against which they are very powerful. Likelihood ratio tests of symmetry are also available and are very successful. Three-parameter symmetric and asymmetric subfamilies of the full family are also of interest. Heavy-tailed symmetric sinh-arcsinh distributions behave like Johnson S U distributions, while their light-tailed counterparts behave like sinh-normal distributions, the sinh-arcsinh family allowing a seamless transition between the two, via the normal, controlled by a single parameter. The sinh-arcsinh family is very tractable and many properties are explored. Likelihood inference is pursued, including an attractive reparameterization. Illustrative examples are given. A multivariate version is considered. Options and extensions are discussed. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
761
780
http://hdl.handle.net/10.1093/biomet/asp053
application/pdf
Access to full text is restricted to subscribers.
M. C. Jones
Arthur Pewsey
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:873-8862009-12-01RePEc:oup:biomet
article
Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach
To estimate the lifetime distribution of right-censored length-biased data, we propose a pseudo-partial likelihood approach that allows us to derive two nonparametric estimators. With its closed-form estimators and explicit limiting variances, this approach retains the simplicity of conditional analysis, and has only a small efficiency loss compared with the unconditional analysis. Under some regularity conditions, we show that the two estimators are uniformly consistent and converge weakly to Gaussian processes. A simulation study demonstrates that the proposed estimators have satisfactory finite-sample performance. Application to an Alzheimer's disease study is reported. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
873
886
http://hdl.handle.net/10.1093/biomet/asp064
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:983-9902009-12-01RePEc:oup:biomet
article
Adaptive approximate Bayesian computation
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
983
990
http://hdl.handle.net/10.1093/biomet/asp052
application/pdf
Access to full text is restricted to subscribers.
Mark A. Beaumont
Jean-Marie Cornuet
Jean-Michel Marin
Christian P. Robert
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:781-7922009-12-01RePEc:oup:biomet
article
A new look at time series of counts
This paper proposes a simple new model for stationary time series of integer counts. Previous work has focused on thinning methods and classical time series autoregressive moving-average difference equations; in contrast, our methods use a renewal process to generate a correlated sequence of Bernoulli trials. By superpositioning independent copies of such processes, stationary series with binomial, Poisson, geometric or any other discrete marginal distribution can be readily constructed. The model class proposed is parsimonious, non-Markov and readily generates series with either short- or long-memory autocovariances. The model can be fitted with linear prediction techniques for stationary series. As an example, a stationary series with binomial marginal distributions is fitted to the number of rainy days in 210 consecutive weeks at Key West, Florida. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
781
792
http://hdl.handle.net/10.1093/biomet/asp057
application/pdf
Access to full text is restricted to subscribers.
Yunwei Cui
Robert Lund
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:917-9322009-12-01RePEc:oup:biomet
article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
917
932
http://hdl.handle.net/10.1093/biomet/asp041
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:945-9562009-12-01RePEc:oup:biomet
article
Sliced space-filling designs
We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
945
956
http://hdl.handle.net/10.1093/biomet/asp044
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:991-9972009-12-01RePEc:oup:biomet
article
Semiparametric methods for evaluating risk prediction markers in case-control studies
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
991
997
http://hdl.handle.net/10.1093/biomet/asp040
application/pdf
Access to full text is restricted to subscribers.
Ying Huang
Margaret Sullivan Pepe
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1005-10112009-12-01RePEc:oup:biomet
article
A note on automatic variable selection using smooth-threshold estimating equations
This paper develops smooth-threshold estimating equations that can automatically eliminate irrelevant parameters by setting them as zero. The resulting estimator enjoys the oracle property in the sense of Fan & Li (2001), even in estimators for which the covariance assumption of Wang & Leng (2007) is violated, such as the Buckley--James estimator. Furthermore, the estimator can be obtained without solving a convex optimization problem. A bic -type criterion for tuning parameter selection is also proposed. It is shown that the criterion achieves consistent model selection. A numerical study confirms the performance of the method. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1005
1011
http://hdl.handle.net/10.1093/biomet/asp060
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:998-10042009-12-01RePEc:oup:biomet
article
A note on the variance of doubly-robust G-estimators
A recursive variance calculation is derived for doubly-robust G-estimators for dynamic treatment regimes in a multi-interval setting. Treatment decision parameters are not assumed to be shared across treatment intervals; this independence of parameters permits sequential estimation of the G-estimators' variance when G-estimation is performed in a sequential fashion. The recursive variance calculation is both natural and computationally feasible. This development can easily be adapted to other complex estimating procedures that require nuisance parameter estimation. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
998
1004
http://hdl.handle.net/10.1093/biomet/asp043
application/pdf
Access to full text is restricted to subscribers.
E. E. M. Moodie
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:861-8722009-12-01RePEc:oup:biomet
article
Nonparametric estimation of the probability of illness in the illness-death model under cross-sectional sampling
Cross-sectional sampling is an attractive design that saves resources but results in biased data. For proper inference, one should first discover the bias function and then weigh observations appropriately. We consider cross-sectioning of the illness-death model with the aim of estimating the probability of visiting the illness state before death. We develop simple consistent and asymptotically normal estimators under various assumptions on the model and data collection and, in particular, compare designs with and without a follow-up. These designs are common in surveillance of hospital acquired infections, but estimators currently in use do not properly correct the bias. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asp046
application/pdf
Access to full text is restricted to subscribers.
M. Mandel
R. Fluss
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1024-10242009-12-01RePEc:oup:biomet
article
'Generalized method of moments estimation for linear regression with clustered failure time data'
4
2009
96
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/asp061
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:933-9442009-12-01RePEc:oup:biomet
article
Some design properties of a rejective sampling procedure
Occasionally, a selected probability sample may appear undesirable with respect to the available auxiliary information. In such a situation, the practitioner might consider rejecting the sample and selecting a new set of sample elements. We consider a procedure in which the probability sample is rejected unless the sample mean of an auxiliary vector is within a specified distance of the population mean. It is proven that the large sample mean and variance of the regression estimator for the rejective sample are the same as those of the regression estimator for the original selection procedure. Likewise, the usual estimator of variance for the regression estimator is appropriate for the rejective sample. In a Monte Carlo experiment, the large sample properties hold for relatively small samples and the Monte Carlo results are in agreement with the theoretical orders of approximation. The efficiency effect of the described rejective sampling is o(n N-super- - 1, where n N is the expected sample size, but the effect can be important for particular samples. For example, rejective sampling can be used to eliminate those samples that give negative weights for the regression estimator. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
933
944
http://hdl.handle.net/10.1093/biomet/asp042
application/pdf
Access to full text is restricted to subscribers.
Wayne A. Fuller
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:793-8042009-12-01RePEc:oup:biomet
article
Bias reduction in exponential family nonlinear models
In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
793
804
http://hdl.handle.net/10.1093/biomet/asp055
application/pdf
Access to full text is restricted to subscribers.
Ioannis Kosmidis
David Firth
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:903-9152009-12-01RePEc:oup:biomet
article
Tests and confidence intervals for secondary endpoints in sequential clinical trials
In a sequential clinical trial whose stopping rule depends on the primary endpoint, inference on secondary endpoints is an important long-standing problem. Ignoring the possibility of early stopping based on the primary endpoint may result in substantial bias. To address this problem, a commonly used approach is to develop bias correction by estimating the bias in the case of bivariate normal outcomes and appealing to joint asymptotic normality of the statistics associated with the primary and secondary endpoints. We propose herein a new approach that uses resampling and a novel ordering scheme in the sample space of sequential statistics observed up to a stopping time. This approach is shown to provide accurate inference in complex clinical trials, including time-sequential trials with survival endpoints and covariates. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
903
915
http://hdl.handle.net/10.1093/biomet/asp063
application/pdf
Access to full text is restricted to subscribers.
Tze Leung Lai
Mei-Chiung Shih
Zheng Su
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1012-10182009-12-01RePEc:oup:biomet
article
A note on adaptive Bonferroni and Holm procedures under dependence
Hochberg & Benjamini (1990) first presented adaptive procedures for controlling familywise error rate. However, until now, it has not been proved that these procedures control the familywise error rate. We introduce a simplified version of Hochberg & Benjamini's adaptive Bonferroni and Holm procedures. Assuming a conditional dependence model, we prove that the former procedure controls the familywise error rate in finite samples while the latter controls it approximately. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1012
1018
http://hdl.handle.net/10.1093/biomet/asp048
application/pdf
Access to full text is restricted to subscribers.
Wenge Guo
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:887-9012009-12-01RePEc:oup:biomet
article
Marginal hazards model for case-cohort studies with multiple disease outcomes
Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
887
901
http://hdl.handle.net/10.1093/biomet/asp059
application/pdf
Access to full text is restricted to subscribers.
S. Kang
J. Cai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:971-9742009-12-01RePEc:oup:biomet
article
Construction of orthogonal Latin hypercube designs
Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/asp058
application/pdf
Access to full text is restricted to subscribers.
Fasheng Sun
Min-Qian Liu
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:835-8452009-12-01RePEc:oup:biomet
article
Bayesian lasso regression
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
835
845
http://hdl.handle.net/10.1093/biomet/asp047
application/pdf
Access to full text is restricted to subscribers.
Chris Hans
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:821-8342009-12-01RePEc:oup:biomet
article
Bayesian analysis of matrix normal graphical models
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
821
834
http://hdl.handle.net/10.1093/biomet/asp049
application/pdf
Access to full text is restricted to subscribers.
Hao Wang
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1019-10232009-12-01RePEc:oup:biomet
article
A note on a conjectured sharpness principle for probabilistic forecasting with calibration
This note proves a weak sharpness principle as conjectured by Gneiting et al. (2007) in connection with probabilistic forecasting subject to calibration constraints. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1019
1023
http://hdl.handle.net/10.1093/biomet/asp054
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:383-3982013-03-04RePEc:oup:biomet
article
Nonparametric additive regression for repeatedly measured data
We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
383
398
http://hdl.handle.net/10.1093/biomet/asp015
application/pdf
Access to full text is restricted to subscribers.
Raymond J. Carroll
Arnab Maity
Enno Mammen
Kyusang Yu
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:579-5892013-03-04RePEc:oup:biomet
article
A diagnostic procedure based on local influence
Cook's (1986) normal curvature measure is useful for sensitivity analysis of model assumptions in statistical models. However, there is no rigorous approach based on the normal curvature for addressing two fundamental issues: to assess the extent of discrepancy between an assumed model and the underlying model from which the data are generated, and to identify suspicious data points for which the discrepancy is most evident. Our purpose is to establish a theoretically sound procedure for resolving these issues for case-weight perturbation under the framework of independent distributions. We show that the local influence measure, Cook's distance and likelihood distance are asymptotically equivalent. A diagnostic procedure, based on local influence, is proposed for evaluating model misspecification and for detecting influential points simultaneously. We analyse two real datasets. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
579
589
Hongtu Zhu
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:403-1142013-03-04RePEc:oup:biomet
article
Nonparametric estimation of age-at-onset distributions from censored kin-cohort data
We present a nonparametric estimator of genotype-specific age-at-onset distributions from kin-cohort data. Standard error calculations are derived and the methodology is illustrated through an analysis of the influence of mutations of the Parkin gene on Parkinson's disease. Semiparametric efficiency considerations are briefly discussed. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
403
114
http://hdl.handle.net/10.1093/biomet/asm027
application/pdf
Access to full text is restricted to subscribers.
Yuanjia Wang
Lorraine N. Clark
Karen Marder
Daniel Rabinowitz
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:489-4902013-03-04RePEc:oup:biomet
article
A counterexample to a claim about stochastic simulations
Engen & Lilleg�rd (1997) presented a general method for doing Monte Carlo simulations conditioned on a sufficient statistic. The basic idea was to adjust the parameter values in the corresponding unconditional simulation so that the actual value of the sufficient statistic is obtained, and the claim was that if this adjustment is unique then the modified simulation is from the conditional distribution. Unfortunately the claim is not correct, as shown by a counterexample. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
489
490
Bo Henry Lindqvist
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:147-1612013-03-04RePEc:oup:biomet
article
On least-squares regression with censored data
The semiparametric accelerated failure time model relates the logarithm of the failure time linearly to the covariates while leaving the error distribution unspecified. The present paper describes simple and reliable inference procedures based on the least-squares principle for this model with right-censored data. The proposed estimator of the vector-valued regression parameter is an iterative solution to the Buckley--James estimating equation with a preliminary consistent estimator as the starting value. The estimator is shown to be consistent and asymptotically normal. A novel resampling procedure is developed for the estimation of the limiting covariance matrix. Extensions to marginal models for multivariate failure time data are considered. The performance of the new inference procedures is assessed through simulation studies. Illustrations with medical studies are provided. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
147
161
http://hdl.handle.net/10.1093/biomet/93.1.147
text/html
Access to full text is restricted to subscribers.
Zhezhen Jin
D. Y. Lin
Zhiliang Ying
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:503-5182013-03-04RePEc:oup:biomet
article
Sample size formulae for two-stage randomized trials with survival outcomes
Two-stage randomized trials are growing in importance in developing adaptive treatment strategies, i.e. treatment policies or dynamic treatment regimes. Usually, the first stage involves randomization to one of the several initial treatments. The second stage of treatment begins when an early nonresponse criterion or response criterion is met. In the second-stage, nonresponding subjects are re-randomized among second-stage treatments. Sample size calculations for planning these two-stage randomized trials with failure time outcomes are challenging because the variances of common test statistics depend in a complex manner on the joint distribution of time to the early nonresponse criterion or response criterion and the primary failure time outcome. We produce simple, albeit conservative, sample size formulae by using upper bounds on the variances. The resulting formulae only require the working assumptions needed to size a standard single-stage randomized trial and, in common settings, are only mildly conservative. These sample size formulae are based on either a weighted Kaplan--Meier estimator of survival probabilities at a fixed time-point or a weighted version of the log-rank test. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
503
518
http://hdl.handle.net/10.1093/biomet/asr019
application/pdf
Access to full text is restricted to subscribers.
Zhiguo Li
Susan A. Murphy
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:617-6342013-03-04RePEc:oup:biomet
article
Estimation of the failure time distribution in the presence of informative censoring
We present a method for estimating the survival curve of a continuous failure time random variable from right-censored data. Our method allows adjustment for informative censoring due to measured prognostic factors for time-to-event and censoring while simultaneously quantifying the sensitivity of the inference to residual dependence between failure and censoring due to unmeasured factors. We present the results of a simulation study and illustrate our approach using data from the AIDS Clinical Trial Group 175 study. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
617
634
Daniel O. Scharfstein
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:197-2122013-03-04RePEc:oup:biomet
article
Adaptive two-stage test procedures to find the best treatment in clinical trials
A main objective in clinical trials is to find the best treatment in a given finite class of competing treatments and then to show superiority of this treatment against a control treatment. The traditional procedure estimates the best treatment in a first trial. Then in an independent second trial superiority of this treatment, estimated as best in the first trial, is to be shown against the control treatment by a size α test. In this paper we investigate these two trials of this traditional procedure as a two-stage test procedure. Additionally we introduce competing two-stage group-sequential test procedures. Then we derive formulae for the expected number of patients. These formulae depend on unknown parameters. When we have a prior for the unknown parameters we can determine the two-stage test procedure of size α and power β that is optimal, in that it needs a minimal number of observations. The results are illustrated by a numerical example, which indicates the superiority of the group-sequential procedures. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
197
212
http://hdl.handle.net/10.1093/biomet/92.1.197
text/html
Access to full text is restricted to subscribers.
Wolfgang Bischoff
Frank Miller
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:967-9782013-03-04RePEc:oup:biomet
article
Blocking, efficiency and weighted optimality
Optimal blocking is explored for experiments, such as those incorporating one or more controls, where not all treatment comparisons are of equal interest. Weighted optimality functions are employed in gaining both analytic and enumerative results; a catalogue of smaller optimal designs is provided. It is shown how design selection based on functions of variances, and on functions of efficiency factors, are both subsumed by the weighted approach. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
967
978
http://hdl.handle.net/10.1093/biomet/asr042
application/pdf
Access to full text is restricted to subscribers.
Xiaowei Wang
J. P. Morgan
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:847-8582013-03-04RePEc:oup:biomet
article
Estimating equations for spatially correlated data in multi-dimensional space
We use the quasilikelihood concept to propose an estimating equation for spatial data with correlation across the study region in a multi-dimensional space. With appropriate mixing conditions, we develop a central limit theorem for a random field under various L p metrics. The consistency and asymptotic normality of quasilikelihood estimators can then be derived. We also conduct simulations to evaluate the performance of the proposed estimating equation, and a dataset from East Lansing Woods is used to illustrate the method. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
847
858
http://hdl.handle.net/10.1093/biomet/asn046
application/pdf
Access to full text is restricted to subscribers.
Pei-Sheng Lin
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Optimal blocking of two-level factorial designs
Blocking of two-level factorial designs is considered for block sizes 2 and 4 using the method of fractional partial confounding. A-, D- and E-optimal designs are obtained for block size 2 within the class of orthogonal designs for which main effects and two-factor interactions are all orthogonal to each other before allowing for blocking. A-, D- and E-optimal designs are obtained for block size 4 within the class of orthogonal designs with main effects orthogonal to blocks. The designs obtained also have other favourable properties including orthogonal estimation of effects and orthogonality to superblocks. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
289
302
http://hdl.handle.net/10.1093/biomet/93.2.289
text/html
Access to full text is restricted to subscribers.
Neil A. Butler
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:791-8072013-03-04RePEc:oup:biomet
article
Posterior propriety and computation for the Cox regression model with applications to missing covariates
In this paper, we carry out an in-depth theoretical investigation of Bayesian inference for the Cox regression model. We establish necessary and sufficient conditions for posterior propriety of the regression coefficient, β, in Cox's partial likelihood, which can be obtained as the limiting marginal posterior distribution of β through the specification of a gamma process prior for the cumulative baseline hazard and a uniform improper prior for β. We also examine necessary and sufficient conditions for posterior propriety of the regression coefficients, β, using full likelihood Bayesian approaches in which a gamma process prior is specified for the cumulative baseline hazard. We examine characterisation of posterior propriety under completely observed data settings as well as for settings involving missing covariates. Latent variables are introduced to facilitate a straightforward Gibbs sampling scheme in the Bayesian computation. A real dataset is presented to illustrate the proposed methodology. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
791
807
http://hdl.handle.net/10.1093/biomet/93.4.791
text/html
Access to full text is restricted to subscribers.
Ming-Hui Chen
Joseph G. Ibrahim
Qi-Man Shao
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:437-4492013-03-04RePEc:oup:biomet
article
Nonparametric variance estimation in the analysis of microarray data: a measurement error approach
We investigate the effects of measurement error on the estimation of nonparametric variance functions. We show that either ignoring measurement error or direct application of the simulation extrapolation, SIMEX, method leads to inconsistent estimators. Nevertheless, the direct SIMEX method can reduce bias relative to a naive estimator. We further propose a permutation SIMEX method that leads to consistent estimators in theory. The performance of both the SIMEX methods depends on approximations to the exact extrapolants. Simulations show that both the SIMEX methods perform better than ignoring measurement error. The methodology is illustrated using microarray data from colon cancer patients. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
437
449
http://hdl.handle.net/10.1093/biomet/asn017
application/pdf
Access to full text is restricted to subscribers.
Raymond J. Carroll
Yuedong Wang
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:691-7012013-03-04RePEc:oup:biomet
article
Diagnostic checking for time series models with conditional heteroscedasticity estimated by the least absolute deviation approach
The recent paper by Peng & Yao (2003) gave an interesting extension of least absolute deviation estimation to generalised autoregressive conditional heteroscedasticity, GARCH, time series models. The asymptotic distributions of absolute residual autocorrelations and squared residual autocorrelations from the GARCH model estimated by the least absolute deviation method are derived in this paper. These results lead to two useful diagnostic tools which can be used to check whether or not a GARCH model fitted by using the least absolute deviation method is adequate. Some simulation experiments give further support to the asymptotic theory and a real data example is also reported. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
691
701
http://hdl.handle.net/10.1093/biomet/92.3.691
text/html
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:569-5842013-03-04RePEc:oup:biomet
article
Dimension reduction in regression without matrix inversion
Regressions in which the fixed number of predictors p exceeds the number of independent observational units n occur in a variety of scientific fields. Sufficient dimension reduction provides a promising approach to such problems, by restricting attention to d�<�n linear combinations of the original p predictors. However, standard methods of sufficient dimension reduction require inversion of the sample predictor covariance matrix. We propose a method for estimating the central subspace that eliminates the need for such inversion and is applicable regardless of the (n, p) relationship. Simulations show that our method compares favourably with standard large sample techniques when the latter are applicable. We illustrate our method with a genomics application. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
569
584
http://hdl.handle.net/10.1093/biomet/asm038
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Bing Li
Francesca Chiaromonte
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:187-1992013-03-04RePEc:oup:biomet
article
Dealing with limited overlap in estimation of average treatment effects
Estimation of average treatment effects under unconfounded or ignorable treatment assignment is often hampered by lack of overlap in the covariate distributions between treatment groups. This lack of overlap can lead to imprecise estimates, and can make commonly used estimators sensitive to the choice of specification. In such cases researchers have often used ad hoc methods for trimming the sample. We develop a systematic approach to addressing lack of overlap. We characterize optimal subsamples for which the average treatment effect can be estimated most precisely. Under some conditions, the optimal selection rules depend solely on the propensity score. For a wide range of distributions, a good approximation to the optimal rule is provided by the simple rule of thumb to discard all units with estimated propensity scores outside the range [0.1,0.9]. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
187
199
http://hdl.handle.net/10.1093/biomet/asn055
application/pdf
Access to full text is restricted to subscribers.
Richard K. Crump
V. Joseph Hotz
Guido W. Imbens
Oscar A. Mitnik
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:337-3502013-03-04RePEc:oup:biomet
article
On the identification of path analysis models with one hidden variable
We study criteria for identifiability of path analysis models with one hidden variable. We first derive sufficient criteria for identification of models in which marginalisation is carried out over the hidden variable. The sufficient criteria are based on the structure of the directed acyclic graph associated with the path analysis model and can be derived from the graph. We treat further the identification of models when the hidden variable is conditioned on and establish connections with the extended skew-normal distribution. Finally it is shown that the derived conditions extend the existing graphical criteria for identification. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
337
350
http://hdl.handle.net/10.1093/biomet/92.2.337
text/html
Access to full text is restricted to subscribers.
Elena Stanghellini
Nanny Wermuth
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:991-9942013-03-04RePEc:oup:biomet
article
A note on 'Testing the number of components in a normal mixture'
In a recent paper, Lo et al. (2001) propose a test for the likelihood ratio statistic based on the Kullback--Leibler information criterion when testing the null hypothesis that a random sample is drawn from a mixture of k-sub-0 normal components against the alternative hypothesis of a mixture with k-sub-1 normal components with k-sub-0 less than k-sub-1. However, this result requires conditions that are generally not met when the null hypothesis holds. Consequently, the result is not proven and simulations suggest that it may not be correct. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
991
994
Neal O. Jeffries
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:149-1672013-03-04RePEc:oup:biomet
article
Probability estimation for large-margin classifiers
Large margin classifiers have proven to be effective in delivering high predictive accuracy, particularly those focusing on the decision boundaries and bypassing the requirement of estimating the class probability given input for discrimination. As a result, these classifiers may not directly yield an estimated class probability, which is of interest itself. To overcome this difficulty, this article proposes a novel method for estimating the class probability through sequential classifications, by using features of interval estimation of large-margin classifiers. The method uses sequential classifications to bracket the class probability to yield an estimate up to the desired level of accuracy. The method is implemented for support vector machines and ψ-learning, in addition to an estimated Kullback--Leibler loss for tuning. A solution path of the method is derived for support vector machines to reduce further its computational cost. Theoretical and numerical analyses indicate that the method is highly competitive against alternatives, especially when the dimension of the input greatly exceeds the sample size. Finally, an application to leukaemia data is described. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
149
167
http://hdl.handle.net/10.1093/biomet/asm077
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
Xiaotong Shen
Yufeng Liu
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:411-4262013-03-04RePEc:oup:biomet
article
Non-finite Fisher information and homogeneity: an EM approach
Even simple examples of finite mixture models can fail to fulfil the regularity conditions that are routinely assumed in standard parametric inference problems. Many methods have been investigated for testing for homogeneity in finite mixture models, for example, but all rely on regularity conditions including the finiteness of the Fisher information and the space of the mixing parameter being a compact subset of some Euclidean space. Very simple examples where such assumptions fail include mixtures of two geometric distributions and two exponential distributions, and, more generally, mixture models in scale distribution families. To overcome these difficulties, we propose and study an em -test statistic, which has a simple limiting distribution for examples in this paper. Simulations show that the em -test has accurate Type I errors and is more efficient than existing methods when they are applicable. A real example is included. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
411
426
http://hdl.handle.net/10.1093/biomet/asp011
application/pdf
Access to full text is restricted to subscribers.
P. Li
J. Chen
P. Marriott
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:987-9942013-03-04RePEc:oup:biomet
article
Likelihood analysis of the binary instrumental variable model
Instrumental variables are widely used for the identification of the causal effect of one random variable on another under unobserved confounding. The distribution of the observable variables for a discrete instrumental variable model satisfies certain inequalities but no conditional independence relations. Such models are usually tested by checking whether the relative frequency estimators of the parameters satisfy the constraints. This ignores sampling uncertainty in the data. Using the observable constraints for the instrumental variable model, a likelihood analysis is conducted. A significance test for its validity is developed, and a bootstrap algorithm for computing confidence intervals for the causal effect is proposed. Applications are given to illustrate the advantage of the suggested approach. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
987
994
http://hdl.handle.net/10.1093/biomet/asr040
application/pdf
Access to full text is restricted to subscribers.
R. R. Ramsahai
S. L. Lauritzen
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:461-4702013-03-04RePEc:oup:biomet
article
Efficient Robbins--Monro procedure for binary data
The Robbins--Monro procedure does not perform well in the estimation of extreme quantiles, because the procedure is implemented using asymptotic results, which are not suitable for binary data. Here we propose a modification of the Robbins--Monro procedure and derive the optimal procedure for binary data under some reasonable approximations. The improvement obtained by using the optimal procedure for the estimation of extreme quantiles is substantial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
461
470
V. Roshan Joseph
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:813-8292013-03-04RePEc:oup:biomet
article
Testing the covariance structure of multivariate random fields
There is an increasing wealth of multivariate spatial and multivariate spatio-temporal data appearing. For such data, an important part of model building is an assessment of the properties of the underlying covariance function describing variable, spatial and temporal correlations. In this paper, we propose a methodology to evaluate the appropriateness of several types of common assumptions on multivariate covariance functions in the spatio-temporal context. The methodology is based on the asymptotic joint normality of the sample space-time cross-covariance estimators. Specifically, we address the assumptions of symmetry, separability and linear models of coregionalization. We conduct simulation experiments to evaluate the sizes and powers of our tests and illustrate our methodology on a trivariate spatio-temporal dataset of pollutants over California. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
813
829
http://hdl.handle.net/10.1093/biomet/asn053
application/pdf
Access to full text is restricted to subscribers.
Bo Li
Marc G. Genton
Michael Sherman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:655-6682013-03-04RePEc:oup:biomet
article
Spherical regression
Methods are introduced for regressing points on the surface of one sphere on points on another. Complex variables and stereographic projection are used to deal with theoretical problems of directional statistics much as they have been used historically to deal with problems in non-Euclidean geometry. The complex plane harbours the group of M�bius transformations, and stereographic projection is used as a bridge to map these M�bius transforms to regression link functions on the surface of a unit sphere. A special form for these links is introduced which employs the complex plane and stereographic projection to effect angular scale changes on the sphere. The family of special forms is closed under orthogonal transformations of the dependent variable and M�bius transformations of the independent variable, and incorporates independence and proper and improper rotations as special cases. Parameter estimation and inference are exemplified using the von Mises--Fisher spherical distribution and vectorcardiogram data. All statistical results and calculations have been formulated in the real domain. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
655
668
T. D. Downs
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:211-2182013-03-04RePEc:oup:biomet
article
Contiguity of the Whittle measure for a Gaussian time series
For a stationary time series, Whittle constructed a likelihood for the spectral density based on the approximate independence of the discrete Fourier transforms of the data at certain frequencies. Whittle's likelihood has been widely used in the literature for constructing estimators. In this paper, we show that, for a Gaussian time series, the Whittle measure is mutually contiguous with the actual distribution of the data. As a consequence, most asymptotic properties of estimators and test statistics derived under the Whittle measure can be carried over to the actual distribution. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
211
218
Nidhan Choudhuri
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:647-6662013-03-04RePEc:oup:biomet
article
The accelerated gap times model
This paper develops a new semiparametric model for the effect of covariates on the conditional intensity of a recurrent event counting process. The model is a transparent extension of the accelerated failure time model for univariate survival data. Estimation of the regression parameter is motivated by semiparametric efficiency considerations, extending the class of weighted log-rank estimating functions originally proposed in Prentice (1978) and subsequently studied in detail by Tsiatis (1990) and Ritov (1990). A novel rank-based one-step estimator for the regression parameter is proposed. An Aalen-type estimator for the baseline intensity function is obtained. Asymptotics are handled with empirical process methods, and finite sample properties are studied via simulation. Finally, the new model is applied to the bladder tumour data of Byar (1980). Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
647
666
http://hdl.handle.net/10.1093/biomet/92.3.647
text/html
Access to full text is restricted to subscribers.
Robert L. Strawderman
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:735-7462013-03-04RePEc:oup:biomet
article
Conditionally specified continuous distributions
A distribution is conditionally specified when its model constraints are expressed conditionally. For example, Besag's (1974) spatial model was specified conditioned on the neighbouring states, and pseudolikelihood is intended to approximate the likelihood using conditional likelihoods. There are three issues of interest: existence, uniqueness and computation of a joint distribution. In the literature, most results and proofs are for discrete probabilities; here we exclusively study distributions with continuous state space. We examine all three issues using the dependence functions derived from decomposition of the conditional densities. We show that certain dependence functions of the joint density are shared with its conditional densities. Therefore, two conditional densities involving the same set of variables are compatible if their overlapping dependence functions are identical. We prove that the joint density is unique when the set of dependence functions is both compatible and complete. In addition, a joint density, apart from a constant, can be computed from the dependence functions in closed form. Since all of the results are expressed in terms of dependence functions, we consider our approach to be dependence-based, whereas methods in the literature are generally density-based. Applications of the dependence-based formulation are discussed. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
735
746
http://hdl.handle.net/10.1093/biomet/asn029
application/pdf
Access to full text is restricted to subscribers.
Yuchung J. Wang
Edward H. Ip
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:977-9842013-03-04RePEc:oup:biomet
article
Miscellanea Kernel-Type Density Estimation on the Unit Interval
We consider kernel-type methods for the estimation of a density on 0,1 which eschew explicit boundary correction. We propose using kernels that are symmetric in their two arguments; these kernels are conditional densities of bivariate copulas. We give asymptotic theory for the version of the new estimator using Gaussian copula kernels and report on simulation comparisons of it with the beta-kernel density estimator of Chen ([1]). We also provide automatic bandwidth selection in the form of 'rule-of-thumb' bandwidths for both estimators. As well as its competitive integrated squared error performance, advantages of the new approach include its greater range of possible values at 0 and 1, the fact that it is a bona fide density and that the individual kernels and resulting estimator are comprehensible in terms of a single simple picture. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asm068
application/pdf
Access to full text is restricted to subscribers.
M.C. Jones
D.A. Henderson
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:633-6462013-03-04RePEc:oup:biomet
article
Bayesian adaptive designs for clinical trials
A Bayesian adaptive design is proposed for a comparative two-armed clinical trial using decision-theoretic approaches. A loss function is specified, based on the cost for each patient and the costs of making incorrect decisions at the end of a trial. At each interim analysis, the decision to terminate or to continue the trial is based on the expected loss function while concurrently incorporating efficacy, futility and cost. The maximum number of interim analyses is determined adaptively by the observed data. We derive explicit connections between the loss function and the frequentist error rates, so that the desired frequentist properties can be maintained for regulatory settings. The operating characteristics of the design can be evaluated on frequentist grounds. Extensive simulations are carried out to compare the proposed design with existing ones. The design is general enough to accommodate both continuous and discrete types of data. We illustrate the methods with an animal study evaluating a medical treatment for cardiac arrest. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
633
646
http://hdl.handle.net/10.1093/biomet/92.3.633
text/html
Access to full text is restricted to subscribers.
Yi Cheng
Yu Shen
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:1-162013-03-04RePEc:oup:biomet
article
Studentization and deriving accurate p-values
We have a statistic for assessing an observed data point relative to a statistical model but find that its distribution function depends on the parameter. To obtain the corresponding p-value, we require the minimally modified statistic that is ancillary; this process is called Studentization. We use recent likelihood theory to develop a maximal third-order ancillary; this gives immediately a candidate Studentized statistic. We show that the corresponding p-value is higher-order Un(0, 1), is equivalent to a repeated bootstrap version of the initial statistic and agrees with a special Bayesian modification of the original statistic. More importantly, the modified statistic and p-value are available by Markov chain Monte Carlo simulations and, in some cases, by higher-order approximation methods. Examples, including the Behrens--Fisher problem, are given to indicate the ease and flexibility of the approach. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
1
16
http://hdl.handle.net/10.1093/biomet/asm093
application/pdf
Access to full text is restricted to subscribers.
D.A.S. Fraser
Judith Rousseau
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:679-6902013-03-04RePEc:oup:biomet
article
Orthogonal bases approach for comparing nonnormal continuous distributions
We present an orthonormal bases approach for detecting general differences among continuous distributions. An unknown density function is represented by a finite vector of its estimated Fourier coefficients with respect to a suitable orthonormal basis. For a wide class of orthonormal bases, we establish asymptotic normality of the vector of estimated Fourier coefficients and propose an unbiased and consistent estimator of its asymptotic covariance matrix. Fourier coeffients are modelled as functions of fixed and possibly random effects. This approach allows simultaneous detection of distributional differences attributable to various factors in clustered and correlated data with suffciently large numbers of observations per each cluster with the same fixed and random effects realisations. This work was motivated by multi-level clustered non-Gaussian datasets from genetic studies. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
679
690
http://hdl.handle.net/10.1093/biomet/92.3.679
text/html
Access to full text is restricted to subscribers.
Inna Chervoneva
Boris Iglewicz
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:401-4092013-03-04RePEc:oup:biomet
article
A type of restricted maximum likelihood estimator of variance components in generalised linear mixed models
The maximum likelihood estimator of the variance components in a linear model can be biased downwards. Restricted maximum likelihood (REML) corrects this problem by using the likelihood of a set of residual contrasts and is generally considered superior. However, this original restricted maximum likelihood definition does not directly extend beyond linear models. We propose a REML-type estimator for generalised linear mixed models by correcting the bias in the profile score function of the variance components. The proposed estimator has the same consistency properties as the maximum likelihood estimator if the number of parameters in the mean and variance components models remains fixed. However, the estimator of the variance components has a smaller finite sample bias. A simulation study with a logistic mixed model shows that the proposed estimator is effective in correcting the downward bias in the maximum likelihood estimator. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
401
409
J. G. Liao
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:943-9542013-03-04RePEc:oup:biomet
article
Statistical inference based on non-smooth estimating functions
When the estimating function for a vector of parameters is not smooth, it is often rather difficult, if not impossible, to obtain a consistent estimator by solving the corresponding estimating equation using standard numerical techniques. In this paper, we propose a simple inference procedure via the importance sampling technique, which provides a consistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates. The new proposal is illustrated and evaluated via two extensive examples with real and simulated datasets. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
943
954
http://hdl.handle.net/10.1093/biomet/91.4.943
text/html
Access to full text is restricted to subscribers.
L. Tian
J. Liu
Y. Zhao
L. J. Wei
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:907-9172013-03-04RePEc:oup:biomet
article
On the asymptotics of marginal regression splines with longitudinal data
There have been studies on how the asymptotic efficiency of a nonparametric function estimator depends on the handling of the within-cluster correlation when nonparametric regression models are used on longitudinal or cluster data. In particular, methods based on smoothing splines and local polynomial kernels exhibit different behaviour. We show that the generalized estimation equations based on weighted least squares regression splines for the nonparametric function have an interesting property: the asymptotic bias of the estimator does not depend on the working correlation matrix, but the asymptotic variance, and therefore the mean squared error, is minimized when the true correlation structure is specified. This property of the asymptotic bias distinguishes regression splines from smoothing splines. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
907
917
http://hdl.handle.net/10.1093/biomet/asn041
application/pdf
Access to full text is restricted to subscribers.
Zhongyi Zhu
Wing K. Fung
Xuming He
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Fully Bayesian spline smoothing and intrinsic autoregressive priors
There is a well-known Bayesian interpretation for function estimation by spline smoothing using a limit of proper normal priors. The limiting prior and the conditional and intrinsic autoregressive priors popular for spatial modelling have a common form, which we call partially informative normal. We derive necessary and sufficient conditions for the propriety of the posterior for this class of partially informative normal priors with noninformative priors on the variance components, a condition crucial for successful implementation of the Gibbs sampler. The results apply for fully Bayesian smoothing splines, thin-plate splines and L-splines, as well as models using intrinsic autoregressive priors. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
289
302
Paul L. Speckman
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:409-4232013-03-04RePEc:oup:biomet
article
Principal Hessian Directions for regression with measurement error
We consider a nonlinear regression problem with predictors with measurement error. We assume that the response is related to unknown linear combinations of a p-dimensional predictor vector through an unknown link function. Instead of observing the predictors, we observe a surrogate vector with the property that its expectation is linearly related to the predictor vector with constant variance. We use an important linear transformation of the surrogates. Based on the transformed variables, we develop the modified Principal Hessian Directions method for estimating the subspace of the effective dimension-reduction space. We derive the asymptotic variances of the modified Principal Hessian Directions estimators. Several examples are reported and comparisons are made with the sliced inverse regression method of Carroll & Li (1992). Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
409
423
Heng-Hui Lue
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:773-7782013-03-04RePEc:oup:biomet
article
A note on conditional aic for linear mixed-effects models
The conventional model selection criterion, the Akaike information criterion, aic , has been applied to choose candidate models in mixed-effects models by the consideration of marginal likelihood. Vaida & Blanchard (2005) demonstrated that such a marginal aic and its small sample correction are inappropriate when the research focus is on clusters. Correspondingly, these authors suggested the use of conditional aic . Their conditional aic is derived under the assumption that the variance-covariance matrix or scaled variance-covariance matrix of random effects is known. This note provides a general conditional aic but without these strong assumptions. Simulation studies show that the proposed method is promising. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
773
778
http://hdl.handle.net/10.1093/biomet/asn023
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Hulin Wu
Guohua Zou
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:221-2272013-03-04RePEc:oup:biomet
article
A note on time-reversibility of multivariate linear processes
We derive some readily verifiable necessary and sufficient conditions for a multivariate non-Gaussian linear process to be time-reversible, under two sets of conditions on the contemporaneous dependence structure of the innovations. One set of conditions concerns the case of independent-component innovations, in which case a multivariate non-Gaussian linear process is time-reversible if and only if the coefficients consist of essentially asymmetric columns with column-specific origins of symmetry or symmetric pairs of columns with pair-specific origins of symmetry. On the other hand, for dependent-component innovations plus other regularity conditions, a multivariate non-Gaussian linear process is time-reversible if and only if the coefficients are essentially symmetric about some origin. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
221
227
http://hdl.handle.net/10.1093/biomet/93.1.221
text/html
Access to full text is restricted to subscribers.
Kung-Sik Chan
Lop-Hing Ho
Howell Tong
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:913-9222013-03-04RePEc:oup:biomet
article
Testing the proportional odds model under random censoring
In practical applications, it is not uncommon for the hazard functions of two groups to converge with time. One approach that allows for converging hazard functions is the proportional odds model. We develop a procedure for testing the proportional odds assumption when the available data consist of two independent random samples of randomly right-censored lifetimes. Asymptotic normality of the test statistic is proved and the procedure is applied to two well-known datasets. The effective significance level and power of the proposed test are assessed through a simulation study. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
913
922
Jean-Yves Dauxois
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:279-2942013-03-04RePEc:oup:biomet
article
On weighted Hochberg procedures
We consider different ways of constructing weighted Hochberg-type step-up multiple test procedures including closed procedures based on weighted Simes tests and their conservative step-up short-cuts, and step-up counterparts of two weighted Holm procedures. It is shown that the step-up counterparts have some serious pitfalls such as lack of familywise error rate control and lack of monotonicity in rejection decisions in terms of p-values. Therefore an exact closed procedure appears to be the best alternative, its only drawback being lack of simple stepwise structure. A conservative step-up short-cut to the closed procedure may be used instead, but with accompanying loss of power. Simulations are used to study the familywise error rate and power properties of the competing procedures for independent and correlated p-values. Although many of the results of this paper are negative, they are useful in highlighting the need for caution when procedures with similar pitfalls may be used. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
279
294
http://hdl.handle.net/10.1093/biomet/asn018
application/pdf
Access to full text is restricted to subscribers.
Ajit C. Tamhane
Lingyun Liu
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:743-7502013-03-04RePEc:oup:biomet
article
Nonparametric confidence intervals for receiver operating characteristic curves
We study methods for constructing confidence intervals and confidence bands for estimators of receiver operating characteristics. Particular emphasis is placed on the way in which smoothing should be implemented, when estimating either the characteristic itself or its variance. We show that substantial undersmoothing is necessary if coverage properties are not to be impaired. A theoretical analysis of the problem suggests an empirical, plug-in rule for bandwidth choice, optimising the coverage accuracy of interval estimators. The performance of this approach is explored. Our preferred technique is based on asymptotic approximation, rather than a more sophisticated approach using the bootstrap, since the latter requires a multiplicity of smoothing parameters all of which must be chosen in nonstandard ways. It is shown that the asymptotic method can give very good performance. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
743
750
Peter Hall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:197-2062013-03-04RePEc:oup:biomet
article
Range of correlation matrices for dependent Bernoulli random variables
We say that a pair (p, R) is compatible if there exists a multivariate binary distribution with mean vector p and correlation matrix R. In this paper we study necessary and sufficient conditions for compatibility for structured and unstructured correlation matrices. We give examples of correlation matrices that are incompatible with any p. Using our results we show that the parametric binary models of Emrich & Piedmonte (1991) and Qaqish (2003) allow a good range of correlations between the binary variables. We also obtain necessary and sufficient conditions for a matrix of odds ratios to be compatible with a given p. Our findings support the popular belief that the odds ratios are less constrained and more flexible than the correlations. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
197
206
http://hdl.handle.net/10.1093/biomet/93.1.197
text/html
Access to full text is restricted to subscribers.
N. Rao Chaganty
Harry Joe
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:989-9952013-03-04RePEc:oup:biomet
article
Studies in the history of probability and statistics XLIX On the Matern correlation family
Handcock & Stein (1993) introduced the Matern family of spatial correlations into statistics as a flexible parametric class with one parameter determining the smoothness of the paths of the underlying spatial field. We document the varied history of this family, which includes contributions by eminent physical scientists and statisticians. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
989
995
http://hdl.handle.net/10.1093/biomet/93.4.989
text/html
Access to full text is restricted to subscribers.
Peter Guttorp
Tilmann Gneiting
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Some nonregular designs from the Nordstrom–Robinson code and their statistical properties
The Nordstrom--Robinson code is a well-known nonlinear code in coding theory. This paper explores the statistical properties of this nonlinear code. Many nonregular designs with 32, 64, 128 and 256 runs and 7--16 factors are derived from it. It is shown that these nonregular designs are better than regular designs of the same size in terms of resolution, aberration and projectivity. Furthermore, many of these nonregular designs are shown to have generalised minimum aberration among all possible designs. Seven orthogonal arrays are shown to have unique word-length pattern and four of them are shown to be unique up to isomorphism. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/92.2.385
text/html
Access to full text is restricted to subscribers.
Hongquan Xu
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:845-8582013-03-04RePEc:oup:biomet
article
Adjusted profile estimating function
In settings where the full probability model is not specified, consider a general estimating function g(&thgr;, &lgr;; y) that involves not only the parameters of interest, &thgr;, but also some nuisance parameters, &lgr;. We consider methods for reducing the effects on g of fitting nuisance parameters. We propose Cox--Reid-type adjustment to the profile estimating function, g(&thgr;, &lgr;ˆ-sub-&thgr;; y), that reduces its bias by two orders. Typically, only the first two moments of the response variable are needed to form the adjustment. Important applications of this method include the estimation of the pairwise association and main effects in stratified, clustered data and estimation of the main effects in a matched pair study. A brief simulation study shows that the proposed method considerably reduces the impact of the nuisance parameters. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
845
858
Molin Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:807-8202013-03-04RePEc:oup:biomet
article
Sparse estimation of a covariance matrix
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method's close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
807
820
http://hdl.handle.net/10.1093/biomet/asr054
application/pdf
Access to full text is restricted to subscribers.
Jacob Bien
Robert J. Tibshirani
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:379-3922013-03-04RePEc:oup:biomet
article
Discriminant analysis through a semiparametric model
We consider a semiparametric generalisation of normal-theory discriminant analysis. The semiparametric model assumes that, after unspecified univariate monotone transformations, the class distributions are multivariate normal. We introduce an estimation procedure based on the distribution quantiles, in which the parameters of the semiparametric model are estimated directly without estimating the nonparametric transformations. The procedure is computationally fast and the estimation accuracy is shown to have the usual parametric rate. The relationship between the method and more general nonparametric discriminant analysis is discussed. The semiparametric specification of the class densities is a submodel of the nonparametric log density functional analysis of variance model in which the main effects are completely nonparametric but the interaction terms are specified semiparametrically. Simulations and real examples are used to illustrate the procedure. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
379
392
Y. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:801-8182013-03-04RePEc:oup:biomet
article
Additive hazards model with multivariate failure time data
Marginal additive hazards models are considered for multivariate survival data in which individuals may experience events of several types and there may also be correlation between individuals. Estimators are proposed for the parameters of such models and for the baseline hazard functions. The estimators of the regression coeffcients are shown asymptotically to follow a multivariate normal distribution with a sandwich-type covariance matrix that can be consistently estimated. The estimated baseline and subject-specific cumulative hazard processes are shown to converge weakly to a zero-mean Gaussian random field. The weak convergence properties for the corresponding survival processes are established. A resampling technique is proposed for constructing simultaneous confidence bands for the survival curve of a specific subject. The methodology is extended to a multivariate version of a class of partly parametric additive hazards model. Simulation studies are conducted to assess finite sample properties, and the method is illustrated with an application to development of coronary heart diseases and cardiovascular accidents in the Framingham Heart Study. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
801
818
http://hdl.handle.net/10.1093/biomet/91.4.801
text/html
Access to full text is restricted to subscribers.
Guosheng Yin
Jianwen Cai
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:239-2442013-03-04RePEc:oup:biomet
article
On modelling mean-covariance structures in longitudinal studies
We exploit a reparameterisation of the marginal covariance matrix arising in longitudinal studies (Pourahmadi, 1999, 2000) to model, jointly, the mean and covariance structures in terms of three polynomial functions of time. By reanalysing Kenward's (1987) cattle data, we compare model selection procedures based on regressogram estimation with these based on a global search of the model space. Using a BIC-based model selection criterion to identify the optimum degree triple of the three polynomials, we show that the use of a saturated mean model is not optimal and explain why regressogram-based model estimation may be misleading. We also suggest a new computational method for finding the global optimum based on a criterion involving three pairwise saturated profile likelihoods. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
239
244
Jianxin Pan
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:251-2702013-03-04RePEc:oup:biomet
article
Marginal likelihood, conditional likelihood and empirical likelihood: Connections and applications
Marginal likelihood and conditional likelihood are often used for eliminating nuisance parameters. For a parametric model, it is well known that the full likelihood can be decomposed into the product of a conditional likelihood and a marginal likelihood. This property is less transparent in a nonparametric or semiparametric likelihood setting. In this paper we show that this nice parametric likelihood property can be carried over to the empirical likelihood world. We discuss applications in case-control studies, genetical linkage analysis, genetical quantitative traits analysis, tuberculosis infection data and unordered-paired data, all of which can be treated as semiparametric finite mixture models. We consider the estimation problem in detail in the simplest case of unordered-paired data where we can only observe the minimum and maximum values of two random variables; the identities of the minimum and maximum values are lost. The profile empirical likelihood approach is used for maximum semiparametric likelihood estimation. We present some large-sample results along with a simulation study. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
251
270
http://hdl.handle.net/10.1093/biomet/92.2.251
text/html
Access to full text is restricted to subscribers.
Jing Qin
Biao Zhang
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:491-4962013-03-04RePEc:oup:biomet
article
Nonparametric detection of correlated errors
In regression problems it is hard to detect correlated errors since the errors are not observed. In this paper, a nonparametric method is proposed for the detection of correlated errors when the design points are equally spaced. It turns out that the first-order sample autocovariance of the residuals from the kernel regression estimates provides essential information about correlated errors and its bootstrap is quite effective in implementing such information. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
491
496
Tae Yoon Kim
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:659-6682013-03-04RePEc:oup:biomet
article
Semiparametric analysis of transformation models with censored data
A unified estimation procedure is proposed for the analysis of censored data using linear transformation models, which include the proportional hazards model and the proportional odds model as special cases. This procedure is easily implemented numerically and its validity does not rely on the assumption of independence between the covariates and the censoring variable. The estimator is the same as the Cox partial likelihood estimator in the case of the proportional hazards model. Moreover, the asymptotic variance of the proposed estimator has a closed form and its variance estimator is easily obtained by plug-in rules. The method is illustrated by simulation and is applied to the Veterans' Administration lung cancer data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
659
668
Kani Chen
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:801-8202013-03-04RePEc:oup:biomet
article
Nonparametric maximum likelihood estimation of the structural mean of a sample of curves
A random sample of curves can be usually thought of as noisy realisations of a compound stochastic process X(t) = Z{W(t)}, where Z(t) produces random amplitude variation and W(t) produces random dynamic or phase variation. In most applications it is more important to estimate the so-called structural mean μ(t) = E{Z(t)} than the crosssectional mean E{X(t)}, but this estimation problem is difficult because the process Z(t) is not directly observable. In this paper we propose a nonparametric maximum likelihood estimator of μ(t). This estimator is shown to be √n-consistent and asymptotically normal under the assumed model and robust to model misspecification. Simulations and a realdata example show that the proposed estimator is competitive with landmark registration, often considered the benchmark, and has the advantage of avoiding time-consuming and often infeasible individual landmark identification. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
801
820
http://hdl.handle.net/10.1093/biomet/92.4.801
text/html
Access to full text is restricted to subscribers.
Daniel Gervini
Theo Gasser
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Nonparametric k-sample tests with panel count data
We study the nonparametric k-sample test problem with panel count data. The asymptotic normality of a smooth functional of the nonparametric maximum pseudo-likelihood estimator (Wellner & Zhang, 2000) is established under some mild conditions. We construct a class of easy-to-implement nonparametric tests for comparing mean functions of k populations based on this asymptotic normality. We conduct various simulations to validate and compare the tests. The simulations show that the tests perform quite well and generally have good power to detect differences among the mean functions. The method is illustrated with a real-life example. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
777
790
http://hdl.handle.net/10.1093/biomet/93.4.777
text/html
Access to full text is restricted to subscribers.
Ying Zhang
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:723-7332013-03-04RePEc:oup:biomet
article
Empirical-type likelihoods allowing posterior credible sets with frequentist validity: Higher-order asymptotics
With reference to a general class of empirical-type likelihoods, we develop higher-order asymptotics for the frequentist coverage of Bayesian credible sets based on posterior quantiles and highest posterior density. These asymptotics, in turn, characterise members of the class that allow approximate frequentist validity of such sets. It is seen that the usual empirical likelihood does not enjoy this property up to the order of approximation considered here. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
723
733
http://hdl.handle.net/10.1093/biomet/93.3.723
text/html
Access to full text is restricted to subscribers.
Kai-Tai Fang
Rahul Mukerjee
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:613-6282013-03-04RePEc:oup:biomet
article
Large-sample properties of the periodogram estimator of seasonally persistent processes
Seasonally persistent models were first introduced by Andel (1986) and Gray et al. (1989) to extend autoregressive moving-average and fractionally differenced models and to encompass long-memory quasi-periodic behaviour. These models are, for certain ranges of parameters, stationary, and we prove here that the behaviour of the periodogram and other tapered estimators cannot be simply extended from the work of Kunsch (1986) and Hurvich & Beltrao (1993) on long memory induced by a pole at the origin. We demonstrate that potentially large both positive and negative bias can be found from the same value of the long-memory parameter, and that the new distribution can be easily written down in the case of Gaussian processes. We also consider using both the cosine taper and the sine taper. The extended least squares estimator is also considered in this context. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
613
628
Sofia C. Olhede
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:183-1962013-03-04RePEc:oup:biomet
article
Models and inference for uncertainty in extremal dependence
Conventionally, modelling of multivariate extremes has been based on the class of multivariate extreme value distributions. More recently, other classes have been developed, allowing for the possibility that, whilst dependence is observed at finite levels, the limit distribution is independent. A number of articles have shown this development to be important for accurate estimation of the extremal properties, both of theoretical processes and observed datasets. It has also been shown that, so far as dependence is concerned, the choice between modelling with either asymptotically dependent or asymptotically independent distributions can be far more influential than model choice within either of these two classes. In this paper we explore the issue of modelling across both classes, examining in particular the effect of uncertainty caused by lack of knowledge about the status of asymptotic dependence. This is achieved by new multivariate models whose parameter spaces are such that asymptotic dependence occurs on a boundary. Standard techniques in Bayesian inference, implemented through Markov chain Monte Carlo, enable inferences to be drawn that assign posterior probability mass to the boundary region. The techniques are illustrated on a set of oceanographic data for which previous analyses have shown that it is difficult to resolve the question of asymptotic dependence status, which is however important in model extrapolation. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
183
196
Stuart Coles
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:975-9862013-03-04RePEc:oup:biomet
article
Multivariate distributions with support above the diagonal
A general family of distributions for the empirical modelling of ordered multivariate data is proposed. The family is based on, but greatly extends, the joint distribution of order statistics from an independent and identically distributed univariate sample. General properties, including marginal and conditional distributions, bivariate dependence, limiting distributions and links to the Dirichlet distribution are described. Univariate and bivariate special cases of the multivariate distributions, the latter including an equivalent rotated version, are considered. Two particular tractable special cases are stressed. The models are successfully and usefully fitted, by maximum likelihood, to meteorological data. The models are also applicable to data in which one variable is unconstrained and the other are all nonnegative. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
975
986
http://hdl.handle.net/10.1093/biomet/91.4.975
text/html
Access to full text is restricted to subscribers.
M. C. Jones
P. V. Larsen
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:899-9122013-03-04RePEc:oup:biomet
article
Martingale difference residuals as a diagnostic tool for the Cox model
The proportional hazards model makes two major assumptions: the hazard ratio is constant over time, and the relationship between the hazard and continuous covariates is log-linear. Methods exist for checking and relaxing each of these assumptions, but in both cases the methods rely on the other assumption being true. Problems can occur if neither of the assumptions is appropriate, or even if only one of the assumptions is appropriate but it is not known which. We propose a new kind of residual for checking the two assumptions simultaneously. The smoothed residuals provide a flexible estimate of the hazard ratio, which may deviate from the standard proportional hazards model by having a time-dependent hazard ratio, transformed covariates or both. The methods are illustrated using data from the Medical Research Council's myeloma trials. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
899
912
Peter D. Sasieni
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:729-7372013-03-04RePEc:oup:biomet
article
A note on pseudolikelihood constructed from marginal densities
For likelihood-based inference involving distributions in which high-dimensional dependencies are present it may be useful to use approximate likelihoods based, for example, on the univariate or bivariate marginal distributions. The asymptotic properties of formal maximum likelihood estimators in such cases are outlined. In particular, applications in which only a single qx1 vector of observations is observed are examined. Conditions under which consistent estimators of parameters result from the approximate likelihood using only pairwise joint distributions are studied. Some examples are analysed in detail. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
729
737
D. R. Cox
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:299-3142013-03-04RePEc:oup:biomet
article
Modelling multivariate failure time associations in the presence of a competing risk
There has been much research on analysing multivariate failure times, but little that has accommodated failures that arise in the presence of a competing failure process. This paper studies the problem of describing associations among times to such failures. It proposes a modified conditional hazard ratio measure of association that is tailored to competing risks data, develops frailty models and a nonparametric method for describing the proposed measure, and contrasts estimation by proposed methods with the 'standard' of treating competing risks as independently censoring failure times due to targeted causes. The methods are investigated on simulated and real data. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
299
314
Karen Bandeen-Roche
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:785-8062013-03-04RePEc:oup:biomet
article
Bayesian model discrimination for multiple strata capture-recapture data
Extending the work of Dupuis (1995), we motivate a range of biologically plausible models for multiple-site capture-recapture and show how the original Gibbs sampling algorithm of Dupuis can be extended to obtain posterior model probabilities using reversible jump Markov chain Monte Carlo. This model selection procedure improves upon previous analyses in two distinct ways. First, Bayesian model averaging provides a robust parameter estimation technique which properly incorporates model uncertainty in the resulting intervals. Secondly, by discriminating among perhaps millions of competing models, we are able to discern fine structure within the data and thereby answer questions of primary biological importance. We demonstrate how reversible jump Markov chain Monte Carlo methods provide the only viable method for exploring model spaces of this size. We examine the lizard data discussed in Dupuis (1995) and show that most of the posterior mass is placed upon models not previously considered for these data. We discuss model discrimination and model averaging and focus upon the increased scientific understanding of the data obtained via the Bayesian model comparison procedure. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
785
806
R. King
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:371-3822013-03-04RePEc:oup:biomet
article
Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve
Recent scientific and technological innovations have produced an abundance of potential markers that are being investigated for their use in disease screening and diagnosis. In evaluating these markers, it is often necessary to account for covariates associated with the marker of interest. Covariates may include subject characteristics, expertise of the test operator, test procedures or aspects of specimen handling. In this paper, we propose the covariate-adjusted receiver operating characteristic curve, a measure of covariate-adjusted classification accuracy. Nonparametric and semiparametric estimators are proposed, asymptotic distribution theory is provided and finite sample performance is investigated. For illustration we characterize the age-adjusted discriminatory accuracy of prostate-specific antigen as a biomarker for prostate cancer. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
371
382
http://hdl.handle.net/10.1093/biomet/asp002
application/pdf
Access to full text is restricted to subscribers.
Holly Janes
Margaret S. Pepe
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:875-8892013-03-04RePEc:oup:biomet
article
Pairwise curve synchronization for functional data
Data collected by scientists are increasingly in the form of trajectories or curves. Often these can be viewed as realizations of a composite process driven by both amplitude and time variation. We consider the situation in which functional variation is dominated by time variation, and develop a curve-synchronization method that uses every trajectory in the sample as a reference to obtain pairwise warping functions in the first step. These initial pairwise warping functions are then used to create improved estimators of the underlying individual warping functions in the second step. A truncated averaging process is used to obtain robust estimation of individual warping functions. The method compares well with other available time-synchronization approaches and is illustrated with Berkeley growth data and gene expression data for multiple sclerosis. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
875
889
http://hdl.handle.net/10.1093/biomet/asn047
application/pdf
Access to full text is restricted to subscribers.
Rong Tang
Hans-Georg Müller
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:967-9752013-03-04RePEc:oup:biomet
article
Least absolute deviations estimation for ARCH and GARCH models
Hall & Yao (2003) showed that, for ARCH/GARCH, i.e. autoregressive conditional heteroscedastic/generalised autoregressive conditional heteroscedastic, models with heavy-tailed errors, the conventional maximum quasilikelihood estimator suffers from complex limit distributions and slow convergence rates. In this paper three types of absolute deviations estimator have been examined, and the one based on logarithmic transformation turns out to be particularly appealing. We have shown that this estimator is asymptotically normal and unbiased. Furthermore it enjoys the standard convergence rate of n-super-1/2 regardless of whether the errors are heavy-tailed or not. Simulation lends further support to our theoretical results. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
967
975
Liang Peng
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:303-3172013-03-04RePEc:oup:biomet
article
Bayesian methods for partial stochastic orderings
We discuss two methods of making nonparametric Bayesian inference on probability measures subject to a partial stochastic ordering. The first method involves a nonparametric prior for a measure on partially ordered latent observations, and the second involves rejection sampling. Computational approaches are discussed for each method, and interpretations of prior and posterior information are discussed. An application is presented in which inference is made on the number of independently segregating quantitative trait loci present in an animal population. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
303
317
Peter D. Hoff
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:851-8602013-03-04RePEc:oup:biomet
article
A practical affine equivariant multivariate median
A robust affine equivariant estimator of location for multivariate data is proposed which becomes the univariate median for data of dimension one. The estimator is robust in the sense that it has a bounded influence function, a positive breakdown value and has high efficiency compared to the sample mean for heavy-tailed distributions. Perhaps its greatest strength is that, unlike other affine equivariant multivariate medians, it is easily computed for data in any practical dimension. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
851
860
Thomas P. Hettmansperger
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:335-3492013-03-04RePEc:oup:biomet
article
Multi-parameter automodels and their applications
Motivated by the modelling of non-Gaussian data or positively correlated data on a lattice, extensions of Besag's automodels to exponential families with multi-dimensional parameters have been proposed recently. We provide a multiple-parameter analogue of Besag's one-dimensional result that gives the necessary form of the exponential families for the Markov random field's conditional distributions. We propose estimation of parameters by maximum pseudolikelihood and give a proof of the consistency of the estimators for the multi-parameter automodel. The methodology is illustrated with examples, in particular the building of a cooperative system with beta conditional distributions. We also indicate future applications of these models to the analysis of mixed-state spatial data. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
335
349
http://hdl.handle.net/10.1093/biomet/asn016
application/pdf
Access to full text is restricted to subscribers.
Cécile Hardouin
Jian-Feng Yao
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:251-2672013-03-04RePEc:oup:biomet
article
Decomposability and selection of graphical models for multivariate time series
We derive conditions for decomposition and collapsibility of graphical interaction models for multivariate time series. These properties enable us to perform stepwise model selection under certain restrictions. For illustration, we apply the results to a multivariate time series describing the haemodynamic system as monitored in intensive care. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
251
267
Roland Fried
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:317-3352013-03-04RePEc:oup:biomet
article
A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models
A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M-super-+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M-super-+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
317
335
http://hdl.handle.net/10.1093/biomet/92.2.317
text/html
Access to full text is restricted to subscribers.
Aliye Atay-Kayis
Hel�ne Massam
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:691-7032013-03-04RePEc:oup:biomet
article
Adaptive Lasso for Cox's proportional hazards model
We investigate the variable selection problem for Cox's proportional hazards model, and propose a unified model selection and estimation procedure with desired theoretical properties and computational convenience. The new method is based on a penalized log partial likelihood with the adaptively weighted L 1 penalty on regression coefficients, providing what we call the adaptive Lasso estimator. The method incorporates different penalties for different coefficients: unimportant variables receive larger penalties than important ones, so that important variables tend to be retained in the selection process, whereas unimportant variables are more likely to be dropped. Theoretical properties, such as consistency and rate of convergence of the estimator, are studied. We also show that, with proper choice of regularization parameters, the proposed estimator has the oracle properties. The convex optimization nature of the method leads to an efficient algorithm. Both simulated and real examples show that the method performs competitively. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
691
703
http://hdl.handle.net/10.1093/biomet/asm037
application/pdf
Access to full text is restricted to subscribers.
Hao Helen Zhang
Wenbin Lu
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:573-5862013-03-04RePEc:oup:biomet
article
Differential effects and generic biases in observational studies
There are two treatments, each of which may be applied or withheld, yielding a 2 x 2 factorial arrangement with three degrees of freedom between groups. The differential effect of the two treatments is the effect of applying one treatment in lieu of the other. In randomised experiments, the differential effect is of no more or less interest than other treatment contrasts. Differential effects play a special role in certain observational studies in which treatments are not assigned to subjects at random, where differing outcomes may reflect biased assignments rather than effects caused by the treatments. Differential effects are immune to certain types of unobserved bias, called generic biases, which are associated with both treatments in a similar way. This is explored using several examples and models. Differential effects are not immune to differential biases, whose possible consequences are examined by sensitivity analysis. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
573
586
http://hdl.handle.net/10.1093/biomet/93.3.573
text/html
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Fitting binary regression models with case-augmented samples
In a case-augmented study, measurements on a random sample from a population are augmented by information from an independent sample of cases, that is units with some characteristic of interest. We show that inferences about the effect of the covariates on the probability of being a case can be made by fitting a modified prospective likelihood. We also show that this procedure is fully efficient. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/93.2.385
text/html
Access to full text is restricted to subscribers.
A. J. Lee
A. J. Scott
C. J. Wild
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:149-1622013-03-04RePEc:oup:biomet
article
Bayesian nonparametric functional data analysis through density estimation
In many modern experimental settings, observations are obtained in the form of functions and interest focuses on inferences about a collection of such functions. We propose a hierarchical model that allows us simultaneously to estimate multiple curves nonparametrically by using dependent Dirichlet process mixtures of Gaussian distributions to characterize the joint distribution of predictors and outcomes. Function estimates are then induced through the conditional distribution of the outcome given the predictors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of integrable functions. As an illustration, we consider an application to the analysis of conductivity and temperature at depth data in the north Atlantic. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
149
162
http://hdl.handle.net/10.1093/biomet/asn054
application/pdf
Access to full text is restricted to subscribers.
Abel Rodríguez
David B. Dunson
Alan E. Gelfand
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:859-8792013-03-04RePEc:oup:biomet
article
A hybrid estimator in nonlinear and generalised linear mixed effects models
A hybrid method that combines Laplace's approximation and Monte Carlo simulations to evaluate integrals in the likelihood function is proposed for estimation of the parameters in nonlinear mixed effects models that assume a normal parametric family for the random effects. Simulations show that these parametric estimates of fixed effects are close to the nonparametric estimates even though the mixing distribution is far from the assumed normal parametric family. An asymptotic theory of this hybrid method for parametric estimation without requiring the true mixing distribution to belong to the assumed parametric family is developed to explain these results. This hybrid method and its asymptotic theory are also extended to generalised linear mixed effects models. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
859
879
Tze Leung Lai
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:229-2332013-03-04RePEc:oup:biomet
article
An examination of the effect of heterogeneity on the estimation of population size using capture-recapture data
Part of the folklore of capture-recapture experiments is that ignoring heterogeneity of capture probabilities results in a downward bias. This has been based on experience and simulation studies but is often interpreted as being due to individuals with lower capture probabilities. Here estimating equation arguments are used to show that the effect on Horvitz--Thompson-type estimators of ignoring heterogeneity in capture-recapture experiments is to introduce a downward bias. The arguments are extended to continuous-time experiments and to an influence function constructed to determine the effect of a small number of individuals with heterogeneous capture probabilities in an otherwise homogeneous population and the influence function is shown to be negative. The downward bias holds even if the small number of heterogeneous individuals have capture probabilities larger than the homogeneous majority, and this is confirmed by simulations. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
229
233
http://hdl.handle.net/10.1093/biomet/92.1.229
text/html
Access to full text is restricted to subscribers.
Wen-Han Hwang
Richard Huggins
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:809-8252013-03-04RePEc:oup:biomet
article
Generalized Spatial Dirichlet Process Models
Many models for the study of point-referenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zero-mean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probability-weighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stick-breaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
809
825
http://hdl.handle.net/10.1093/biomet/asm071
application/pdf
Access to full text is restricted to subscribers.
Jason A. Duan
Michele Guindani
Alan E. Gelfand
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:83-932013-03-04RePEc:oup:biomet
article
Optimal two-level regular fractional factorial block and split-plot designs
We propose a general and unified approach to the selection of regular fractional factorial designs, which can be applied to experiments that are unblocked, blocked or have a split-plot structure. Our criterion is derived as a good surrogate for the model-robustness criterion of information capacity. In the case of random block effects, it takes the ratio of intra- and interblock variances into account. In most of the cases we have examined, there exist designs that are optimal for all values of that ratio. Examples of optimal designs that depend on the ratio are provided. We also demonstrate that our criterion can further discriminate designs that cannot be distinguished by the existing minimum-aberration criteria. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
83
93
http://hdl.handle.net/10.1093/biomet/asn066
application/pdf
Access to full text is restricted to subscribers.
Ching-Shui Cheng
Pi-Wen Tsai
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:387-4022013-03-04RePEc:oup:biomet
article
Estimating a treatment effect with repeated measurements accounting for varying effectiveness duration
To assess treatment efficacy in clinical trials, certain clinical outcomes are repeatedly measured over time for the same subject. The difference in their means may characterize a treatment effect. Since treatment effectiveness lag and saturation times may exist, erosion of treatment effect often occurs during the observation period. Instead of using models based on ad hoc parametric or purely nonparametric time-varying coefficients, we model the treatment effectiveness durations, which are the time intervals between the lag and saturation times. Then we use some mean response models to include such treatment effectiveness durations. Our methodology is demonstrated by simulations and analysis of a landmark HIV /AIDS clinical trial of short-course nevirapine against mother-to-child HIV vertical transmission during labour and delivery. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
387
402
http://hdl.handle.net/10.1093/biomet/asm019
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Chen
J. Yang
S. Cheng
J. B. Jackson
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:381-3972013-03-04RePEc:oup:biomet
article
Simultaneous confidence bands in spectral density estimation
We propose a method for the construction of simultaneous confidence bands for a smoothed version of the spectral density of a Gaussian process based on nonparametric kernel estimators obtained by smoothing the periodogram. A studentized statistic is used to determine the width of the band at each frequency and a frequency-domain bootstrap approach is employed to estimate the distribution of the supremum of this statistic over all frequencies. We prove by means of strong approximations that the bootstrap estimates consistently the distribution of the supremum deviation of interest and, consequently, that the proposed confidence bands achieve asymptotically the desired simultaneous coverage probability. The behaviour of our method in finite-sample situations is investigated by simulations and a real-life data example demonstrates its applicability in time series analysis. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
381
397
http://hdl.handle.net/10.1093/biomet/asn005
application/pdf
Access to full text is restricted to subscribers.
Michael H. Neumann
Efstathios Paparoditis
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:787-8072013-03-04RePEc:oup:biomet
article
Population-Based Reversible Jump Markov Chain Monte Carlo
We present an extension of population-based Markov chain Monte Carlo to the transdimensional case. A major challenge is that of simulating from high- and transdimensional target measures. In such cases, Markov chain Monte Carlo methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods to deal with such problems, and give a result proving the uniform ergodicity of these population algorithms, under mild assumptions. This result is used to demonstrate the superiority, in terms of convergence rate, of a population transition kernel over a reversible jump sampler for a Bayesian variable selection problem. We also give an example of a population algorithm for a Bayesian multivariate mixture model with an unknown number of components. This is applied to gene expression data of 1000 data points in six dimensions and it is demonstrated that our algorithm outperforms some competing Markov chain samplers. In this example, we show how to combine the methods of parallel chains (Geyer, 1991), tempering (Geyer & Thompson, 1995), snooker algorithms (Gilks et al., 1994), constrained sampling and delayed rejection (Green & Mira, 2001). Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
787
807
http://hdl.handle.net/10.1093/biomet/asm069
application/pdf
Access to full text is restricted to subscribers.
Ajay Jasra
David A. Stephens
Christopher C. Holmes
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:953-9662013-03-04RePEc:oup:biomet
article
Inverse probability weighting for clustered nonresponse
Correlated nonresponse within clusters arises in certain survey settings. It is often represented by a random effects model and assumed to be cluster-specific nonignorable, in the sense that survey and nonresponse outcomes are conditionally independent given cluster-level random effects. Two basic forms of inverse probability weights are considered: response propensity weights based on a marginal model, and weights based on predicted random effects. It is shown that both approaches can lead to biased estimation under cluster-specific nonignorable nonresponse, when the cluster sample sizes are small. We propose a new form of weighted estimator based upon conditional logistic regression, which can avoid this bias. An associated estimator of variance and an extension to observational studies with clustered treatment assignment are also described. Properties of the alternative estimators are illustrated in a small simulation study. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
953
966
http://hdl.handle.net/10.1093/biomet/asr058
application/pdf
Access to full text is restricted to subscribers.
C. J. Skinner
D'arrigo
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:363-3822013-03-04RePEc:oup:biomet
article
Estimating vaccine efficacy from small outbreaks
Let C-sub-V and C-sub-0 denote the number of cases among vaccinated and unvaccinated individuals, respectively, and let &ugr; be the proportion of individuals vaccinated. The quantity � = 1--(1--&ugr;)C-sub-V/(&ugr;C-sub-0) = 1--(relative attack rate) is the most used estimator of the effectiveness of a vaccine to protect against infection. For a wide class of vaccine responses, a family of transmission models and three types of community settings, this paper investigates what � actually estimates. It does so under the assumption that the community is large and the vaccination coverage is adequate to prevent major outbreaks of the infectious disease, so that only data on minor outbreaks are available. For a community of homogeneous individuals who mix uniformly, it is found that � estimates a quantity with the interpretation of 1--(mean susceptibility, per contact, of vaccinees relative to unvaccinated individuals). We provide a standard error for � in this setting. For a community with some heterogeneity � can be a very misleading estimator of the effectiveness of the vaccine. When individuals have inherent differences, � estimates a quantity that depends also on the inherent susceptibilities of different types of individual and on the vaccination coverage for different types. For a community of households, � estimates a quantity that depends on the rate of transmission within households and on the reduction in infectivity induced by the vaccine. In communities that are structured, into households or age-groups, it is possible that � estimates a value that is negative even when the vaccine reduces both susceptibility and infectivity. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
363
382
Niels G. Becker
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:157-1692013-03-04RePEc:oup:biomet
article
Random effects Cox models: A Poisson modelling approach
We propose a Poisson modelling approach to nested random effects Cox proportional hazards models. An important feature of this approach is that the principal results depend only on the first and second moments of the unobserved random effects. The orthodox best linear unbiased predictor approach to random effects Poisson modelling techniques enables us to justify appropriate consistency and optimality. The explicit expressions for the random effects given by our approach facilitate incorporation of a relatively large number of random effects. The use of the proposed methods is illustrated through the reanalysis of data from a large-scale cohort study of particulate air pollution and mortality previously reported by Pope et al. (1995). Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
157
169
Renjun Ma
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:319-3262013-03-04RePEc:oup:biomet
article
Bayesian empirical likelihood
Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
319
326
Nicole A. Lazar
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:583-5982013-03-04RePEc:oup:biomet
article
Functional mixed effects spectral analysis
In many experiments, time series data can be collected from multiple units and multiple time series segments can be collected from the same unit. This article introduces a mixed effects Cramér spectral representation which can be used to model the effects of design covariates on the second-order power spectrum while accounting for potential correlations among the time series segments collected from the same unit. The transfer function is composed of a deterministic component to account for the population-average effects and a random component to account for the unit-specific deviations. The resulting log-spectrum has a functional mixed effects representation where both the fixed effects and random effects are functions in the frequency domain.�It is shown that, when the replicate-specific spectra are smooth, the log-periodograms converge to a functional mixed effects model. A data-driven iterative estimation procedure is offered for the periodic smoothing spline estimation of the fixed effects, penalized estimation of the functional covariance of the random effects, and unit-specific random effects prediction via the best linear unbiased predictor. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
583
598
http://hdl.handle.net/10.1093/biomet/asr032
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
Martica Hall
Wensheng Guo
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:419-4342013-03-04RePEc:oup:biomet
article
Hierarchical models for assessing variability among functions
In many applications of functional data analysis, summarising functional variation based on fits, without taking account of the estimation process, runs the risk of attributing the estimation variation to the functional variation, thereby overstating the latter. For example, the first eigenvalue of a sample covariance matrix computed from estimated functions may be biased upwards. We display a set of estimated neuronal Poisson-process intensity functions where this bias is substantial, and we discuss two methods for accounting for estimation variation. One method uses a random-coefficient model, which requires all functions to be fitted with the same basis functions. An alternative method removes the same-basis restriction by means of a hierarchical Gaussian process model. In a small simulation study the hierarchical Gaussian process model outperformed the randomcoefficient model and greatly reduced the bias in the estimated first eigenvalue that would result from ignoring estimation variability. For the neuronal data the hierarchical Gaussian process estimate of the first eigenvalue was much smaller than the naive estimate that ignored variability due to function estimation. The neuronal setting also illustrates the benefit of incorporating alignment parameters into the hierarchical scheme. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
419
434
http://hdl.handle.net/10.1093/biomet/92.2.419
text/html
Access to full text is restricted to subscribers.
Sam Behseta
Robert E. Kass
Garrick L. Wallstrom
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:497-5052013-03-04RePEc:oup:biomet
article
Posterior probability intervals in Bayesian wavelet estimation
We use saddlepoint approximation to derive credible intervals for Bayesian wavelet regression estimates. Simulations show that the resulting intervals perform better than the best existing method. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
497
505
C. Semadeni
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:831-8402013-03-04RePEc:oup:biomet
article
Estimation in a simple random effects model with nonnormal distributions
A simple structural model is considered involving the addition of two random variables representing between- and within-group variation. Methods for estimating the cumulants of the two components of variation are proposed, based on homogeneous polynomials in the data. Emphasis is placed on situations in which the number of observations per group is quite small. In some cases an essentially unique estimator is available, whereas in others there is a family of possible consistent estimators. The choice of the polynomial is considered. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
831
840
D. R. Cox
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:285-2962013-03-04RePEc:oup:biomet
article
Marginal tests with sliced average variance estimation
We present a new computationally feasible test for the dimension of the central subspace in a regression problem based on sliced average variance estimation. We also provide a marginal coordinate test. Under the null hypothesis, both the test of dimension and the marginal coordinate test involve test statistics that asymptotically have chi-squared distributions given normally distributed predictors, and have a distribution that is a linear combination of chi-squared distributions in general. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
285
296
http://hdl.handle.net/10.1093/biomet/asm021
application/pdf
Access to full text is restricted to subscribers.
Yongwu Shao
R. Dennis Cook
Sanford Weisberg
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:719-7232013-03-04RePEc:oup:biomet
article
Probabilistic model for two dependent circular variables
Motivated by problems in molecular biology and molecular physics, we propose a five-parameter torus analogue of the bivariate normal distribution for modelling the distribution of two circular random variables. The conditional distributions of the proposed distribution are von Mises. The marginal distributions are symmetric around their means and are either unimodal or bimodal. The type of shape depends on the configuration of parameters, and we derive the conditions that ensure a specific shape. The utility of the proposed distribution is illustrated by the modelling of angular variables in a short linear peptide. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
719
723
Harshinder Singh
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:741-7462013-03-04RePEc:oup:biomet
article
Robust variance estimation for rate ratio parameter estimates from individually matched case-control data
The asymptotic variance and robust variance estimators of rate ratios estimated using conditional logistic regression from individually-matched case-control data are derived when the presumed proportional hazards model is misspecified. The robust variance estimators are easily computed using Schoenfeld residuals generated from standard partial likelihood estimation software for failure time data. Simulation studies indicate that the robust variance estimators perform well for typical sizes and that the 'rare disease' version should be adequate for all practical purposes. It was also found that model misspecification must be quite extreme before the model-based, i.e. inverse information, variance is significantly biased and that the robust variance estimators are somewhat more variable than the model-based. We conclude that the model-based variance estimator can be used when model misspecification is not severe. The robust estimator should be used when the presumed model clearly fits the data poorly. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
741
746
Anny Hui Xiang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:197-2102013-03-04RePEc:oup:biomet
article
Spectral methods for nonstationary spatial processes
We propose a nonstationary periodogram and various parametric approaches for estimating the spectral density of a nonstationary spatial process. We also study the asymptotic properties of the proposed estimators via shrinking asymptotics, assuming the distance between neighbouring observations tends to zero as the size of the observation region grows without bound. With this type of asymptotic model we can uniquely determine the spectral density, avoiding the aliasing problem. We also present a new class of nonstationary processes, based on a convolution of local stationary processes. This model has the advantage that the model is simultaneously defined everywhere, unlike 'moving window' approaches, but it retains the attractive property that, locally in small regions, it behaves like a stationary spatial process. Applications include the spatial analysis and modelling of air pollution data provided by the US Environmental Protection Agency. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
197
210
Montserrat Fuentes
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:539-5522013-03-04RePEc:oup:biomet
article
A sequential particle filter method for static models
Particle filter methods are complex inference procedures, which combine importance sampling and Monte Carlo schemes in order to explore consistently a sequence of multiple distributions of interest. We show that such methods can also offer an efficient estimation tool in 'static' set-ups, in which case &pgr;(&thgr; | y-sub-1, …, y-sub-N) (n
3
2002
89
August
Biometrika
539
552
Nicolas Chopin
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:567-5762013-03-04RePEc:oup:biomet
article
On the geometry of measurement error models
The problem of undertaking inference in the classical linear model when the covariates have been measured with error is investigated from a geometric point of view. Under the assumption that the measurement error is small, relative to the total variation in the data, a new model is proposed which has good inferential properties. An inference technique which exploits the geometric structure is shown to be computationally simple, efficient and robust to measurement error. The method proposed is illustrated by simulation studies. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
567
576
Paul Marriott
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:271-2822013-03-04RePEc:oup:biomet
article
Empirical-likelihood-based semiparametric inference for the treatment effect in the two-sample problem with censoring
To compare two samples of censored data, we propose a unified method of semi-parametric inference for the parameter of interest when the model for one sample is parametric and that for the other is nonparametric. The parameter of interest may represent, for example, a comparison of means, or survival probabilities. The confidence interval derived from the semiparametric inference, which is based on the empirical likelihood principle, improves its counterpart constructed from the common estimating equation. The empirical likelihood ratio is shown to be asymptotically chi-squared. Simulation experiments illustrate that the method based on the empirical likelihood substantially outperforms the method based on the estimating equation. A real dataset is analysed. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
271
282
http://hdl.handle.net/10.1093/biomet/92.2.271
text/html
Access to full text is restricted to subscribers.
Yong Zhou
Hua Liang
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:715-7272013-03-04RePEc:oup:biomet
article
A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials
This paper considers the problem of comparing a new treatment with a control based on multiple endpoints. The hypotheses are formulated with the goal of showing that the treatment is equivalent, i.e. not inferior, on all endpoints and superior on at least one endpoint compared to the control, where thresholds for equivalence and superiority are specified for each endpoint. Roy's (1953) union-intersection and Berger's (1982) intersection-union principles are employed to derive the basic test. It is shown that the critical constants required for the union-intersection test of superiority can be sharpened by a careful analysis of its type I error rate. The composite UI-IU test is illustrated by an example and compared in a simulation study to alternative tests proposed by Bloch et al. (2001) and Perlman & Wu (2004). The Bloch et al. test does not control the type I error rate because of its nonmonotone nature, and is hence not recommended. The UI-IU and the Perlman & Wu tests both control the type I error rate, but the latter test generally has a slightly higher power. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
715
727
Ajit C. Tamhane
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:737-7462013-03-04RePEc:oup:biomet
article
The Stein–James estimator for short- and long-memory Gaussian processes
We investigate the mean squared error of the Stein--James estimator for the mean when the observations are generated from a Gaussian vector stationary process with dimension greater than two. First, assuming that the process is short-memory, we evaluate the mean squared error, and compare it with that for the sample mean. Then a sufficient condition for the Stein--James estimator to improve upon the sample mean is given in terms of the spectral density matrix around the origin. We repeat the analysis for Gaussian vector long-memory processes. Numerical examples clearly illuminate the Stein--James phenomenon for dependent samples. The results have the potential to improve the usual trend estimator in time series regression models. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
737
746
http://hdl.handle.net/10.1093/biomet/92.3.737
text/html
Access to full text is restricted to subscribers.
Masanobu Taniguchi
Junichi Hirukawa
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:367-3782013-03-04RePEc:oup:biomet
article
On the inefficiency of the adaptive design for monitoring clinical trials
Adaptive designs, which allow the sample size to be modified based on sequentially computed observed treatment differences, have been advocated recently for monitoring clinical trials. Although such methods have a great deal of appeal on the surface, we show that such methods are inefficient and that one can improve uniformly on such adaptive designs using standard group-sequential tests based on the sequentially computed likelihood ratio test statistic. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
367
378
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:351-3702013-03-04RePEc:oup:biomet
article
Conditional Akaike information for mixed-effects models
This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, cAIC. The penalty term in cAIC is related to the effective degrees of freedom ρ for a linear mixed model proposed by Hodges & Sargent (2001); ρ reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The cAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data application is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
351
370
http://hdl.handle.net/10.1093/biomet/92.2.351
text/html
Access to full text is restricted to subscribers.
Florin Vaida
Suzette Blanchard
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:389-3992013-03-04RePEc:oup:biomet
article
Analysing longitudinal count data with overdispersion
In many biomedical studies, longitudinal count data comprise repeated responses and a set of multidimensional covariates for a large number of individuals. When the response variable in such models is subject to overdispersion, the overdispersion parameter influences the marginal variance. In such cases, the overdispersion parameter plays a significant role in efficient estimation of the regression parameters. This raises the need for joint estimation of the regression parameters and the overdispersion parameter, the longitudinal correlations being nuisance parameters. In this paper, we develop a generalised estimating equations approach based on a general autocorrelation structure for the repeated overdispersed data. The asymptotic properties of the estimators of the main parameters are discussed, and the estimation methodology is illustrated by analysing data on epileptic seizure counts. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
389
399
Vandna Jowaheer
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:663-6842013-03-04RePEc:oup:biomet
article
Efficient restricted estimators for conditional mean models with missing data
Consider a conditional mean model with missing data on the response or explanatory variables due to two-phase sampling or nonresponse. Robins et al. (1994) introduced a class of augmented inverse-probability-weighted estimators, depending on a vector of functions of explanatory variables and a vector of functions of coarsened data. Tsiatis (2006) studied two classes of restricted estimators, class 1 with both vectors restricted to finite-dimensional linear subspaces and class 2 with the first vector of functions restricted to a finite-dimensional linear subspace. We introduce a third class of restricted estimators, class 3, with the second vector of functions restricted to a finite-dimensional subspace. We derive a new estimator, which is asymptotically optimal in class�1, by the methods of nonparametric and empirical likelihood. We propose a hybrid strategy to obtain estimators that are asymptotically optimal in class 1 and locally optimal in class 2 or class�3. The advantages of the hybrid, likelihood estimator based on classes 1 and 3 are shown in a simulation study and a real-data example. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
663
684
http://hdl.handle.net/10.1093/biomet/asr007
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:601-6112013-03-04RePEc:oup:biomet
article
On recovering a population covariance matrix in the presence of selection bias
This paper considers the problem of using observational data in the presence of selection bias to identify causal effects in the framework of linear structural equation models. We propose a criterion for testing whether or not observed statistical dependencies among variables are generated by conditioning on a common response variable. When the answer is affirmative, we further provide formulations for recovering the covariance matrix of the whole population from that of the selected population. The results of this paper provide guidance for reliable causal inference, based on the recovered covariance matrix obtained from the statistical information with selection bias. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
601
611
http://hdl.handle.net/10.1093/biomet/93.3.601
text/html
Access to full text is restricted to subscribers.
Manabu Kuroki
Zhihong Cai
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:861-8722013-03-04RePEc:oup:biomet
article
Aalen Additive Hazards Change-Point Model
We study a test comparing the full Aalen additive hazards model and the change-point model, and suggest how to estimate the parameters of the change-point model. We also study a test for no change-point effect. Both tests are provided with large sample properties and a resampling method is applied to obtain p-values. The finite-sample properties of the proposed inference procedures and estimators are assessed through a simulation study. The methods are further applied to a dataset concerning myocardial infarction. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asm054
application/pdf
Access to full text is restricted to subscribers.
Torben Martinussen
Thomas H. Scheike
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:257-2632013-03-04RePEc:oup:biomet
article
Asymptotic inference for a nonstationary double AR (1) model
We investigate the nonstationary double ar(1) model, where ω > 0, α > 0, the η t are independent standard normal random variables and Elog |φ + η t√α| ⩾ 0. We show that the maximum likelihood estimator of (φ, α) is consistent and asymptotically normal. Combination of this result with that in Ling ([11]) for the stationary case gives the asymptotic normality of the maximum likelihood estimator of φ for any φ in the real line, with a root-n rate of convergence. This is in contrast to the results for the classical ar(1) model, corresponding to α = 0. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
257
263
http://hdl.handle.net/10.1093/biomet/asm084
application/pdf
Access to full text is restricted to subscribers.
Shiqing Ling
Dong Li
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:760-7662013-03-04RePEc:oup:biomet
article
The high-dimension, low-sample-size geometric representation holds under mild conditions
High-dimension, low-small-sample size datasets have different geometrical properties from those of traditional low-dimensional data. In their asymptotic study regarding increasing dimensionality with a fixed sample size, Hall et al. (2005) showed that each data vector is approximately located on the vertices of a regular simplex in a high-dimensional space. A perhaps unappealing aspect of their result is the underlying assumption which requires the variables, viewed as a time series, to be almost independent. We establish an equivalent geometric representation under much milder conditions using asymptotic properties of sample covariance matrices. We discuss implications of the results, such as the use of principal component analysis in a high-dimensional space, extension to the case of nonindependent samples and also the binary classification problem. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
760
766
http://hdl.handle.net/10.1093/biomet/asm050
application/pdf
Access to full text is restricted to subscribers.
Jeongyoun Ahn
J. S. Marron
Keith M. Muller
Yueh-Yun Chi
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:241-2472013-03-04RePEc:oup:biomet
article
A note on path-based variable selection in the penalized proportional hazards model
We propose an efficient and adaptive shrinkage method for variable selection in the Cox model. The method constructs a piecewise-linear regularization path connecting the maximum partial likelihood estimator and the origin. Then a model is selected along the path. We show that the constructed path is adaptive in the sense that, with a proper choice of regularization parameter, the fitted model works as well as if the true underlying submodel were given in advance. A modified algorithm of the least-angle-regression type efficiently computes the entire regularization path of the new estimator. Furthermore, we show that, with a proper choice of shrinkage parameter, the method is consistent in variable selection and efficient in estimation. Simulation shows that the new method tends to outperform the lasso and the smoothly-clipped-absolute-deviation estimators with moderate samples. We apply the methodology to data concerning nursing homes. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
241
247
http://hdl.handle.net/10.1093/biomet/asm083
application/pdf
Access to full text is restricted to subscribers.
Hui Zou
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:329-3422013-03-04RePEc:oup:biomet
article
On the accelerated failure time model for current status and interval censored data
This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time model for current status and interval censored data. The estimator is constructed by inverting a Wald-type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo based resampling method is proposed for obtaining simultaneously the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored datasets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite-sample performance of the new estimators. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
329
342
http://hdl.handle.net/10.1093/biomet/93.2.329
text/html
Access to full text is restricted to subscribers.
Lu Tian
Tianxi Cai
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:893-9042013-03-04RePEc:oup:biomet
article
The Role of Pseudo Data for Robust Smoothing with Application to Wavelet Regression
We propose a robust curve and surface estimator based on M-type estimators and penalty-based smoothing. This approach also includes an application to wavelet regression. The concept of pseudo data, a transformation of the robust additive model to the one with bounded errors, is used to derive some theoretical properties and also motivate a computational algorithm. The resulting algorithm, termed the es-algorithm, is computationally fast and provides a simple way of choosing the amount of smoothing. Moreover, it is easily described, straightforwardly implemented and can be extended to other wavelet regression settings such as irregularly spaced data and image denoising. Results from a simulation study and real data examples demonstrate the promising empirical properties of the proposed approach. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asm064
application/pdf
Access to full text is restricted to subscribers.
Hee-Seok Oh
Douglas W. Nychka
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:246-2482013-03-04RePEc:oup:biomet
article
A multi-move sampler for estimating non-Gaussian time series models: Comments on Shephard & Pitt (1997)
This note points out a problem in the multi-move sampler as proposed by Shephard & Pitt (1997) and provides an alternative correct formulation. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
246
248
Toshiaki Watanabe
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:135-1482013-03-04RePEc:oup:biomet
article
Discrete-transform approach to deconvolution problems
If Fourier series are used as the basis for inference in deconvolution problems, the effects of the errors factorise out in a way that is easily exploited empirically. This property is the consequence of elementary addition formulae for sine and cosine functions, and is not readily available when one is using methods based on other orthogonal series or on continuous Fourier transforms. It allows relatively simple estimators to be constructed, founded on the addition of finite series rather than on integration. The performance of these methods can be particularly effective when edge effects are involved, since cosine series estimators are quite resistant to boundary problems. In this context we point to the advantages of trigonometric-series methods for density deconvolution; they have better mean squared error performance when edge effects are involved, they are particularly easy to code, and they admit a simple approach to empirical choice of smoothing parameter, in which a version of thresholding, familiar in wavelet-based inference, is used in place of conventional smoothing. Applications to other deconvolution problems are briefly discussed. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
135
148
http://hdl.handle.net/10.1093/biomet/92.1.135
text/html
Access to full text is restricted to subscribers.
Peter Hall
Peihua Qiu
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:667-6782013-03-04RePEc:oup:biomet
article
Nonparametric inference in multivariate mixtures
We consider mixture models in which the components of data vectors from any given subpopulation are statistically independent, or independent in blocks. We argue that if, under this condition of independence, we take a nonparametric view of the problem and allow the number of subpopulations to be quite general, the distributions and mixing proportions can often be estimated root-n consistently. Indeed, we show that, if the data are k-variate and there are p subpopulations, then for each p ⩾ 2 there is a minimal value of k, k-sub-p say, such that the mixture problem is always nonparametrically identifiable, and all distributions and mixture proportions are nonparametrically identifiable when k ⩾ k-sub-p. We treat the case p = 2 in detail, and there we show how to construct explicit distribution, density and mixture-proportion estimators, converging at conventional rates. Other values of p can be addressed using a similar approach, although the methodology becomes rapidly more complex as p increases. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
667
678
http://hdl.handle.net/10.1093/biomet/92.3.667
text/html
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1011-10172013-03-04RePEc:oup:biomet
article
Multivariate logistic models
The multivariate logistic transform is a reparameterisation of cell probabilities in terms of marginal logistic contrasts. It is known that an arbitrary set of logistic contrasts may not correspond to a valid joint distribution. In this paper we present an efficient algorithm for detecting whether or not the inverse transform exists, and for computing it if it does. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1011
1017
http://hdl.handle.net/10.1093/biomet/93.4.1011
text/html
Access to full text is restricted to subscribers.
Bahjat F. Qaqish
Anastasia Ivanova
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:835-8482013-03-04RePEc:oup:biomet
article
Locally efficient semiparametric estimators for functional measurement error models
A class of semiparametric estimators are proposed in the general setting of functional measurement error models. The estimators follow from estimating equations that are based on the semiparametric efficient score derived under a possibly incorrect distributional assumption for the unobserved 'measured with error' covariates. It is shown that such estimators are consistent and asymptotically normal even with misspecification and are efficient if computed under the truth. The methods are demonstrated with a simulation study of a quadratic logistic regression model with measurement error. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
835
848
http://hdl.handle.net/10.1093/biomet/91.4.835
text/html
Access to full text is restricted to subscribers.
Anastasios A. Tsiatis
Yanyuan Ma
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:99-1122013-03-04RePEc:oup:biomet
article
Robust and efficient estimation under data grouping
The minimum Hellinger distance estimator is known to have desirable properties in terms of robustness and efficiency. We propose an approximate minimum Hellinger distance estimator by adapting the approach to grouped data from a continuous distribution. It is easier to compute the approximate version for either the continuous data or the grouped data. Given certain conditions on the model distribution and reasonable grouping rules, the approximate minimum Hellinger distance estimator is shown to be consistent and asymptotically normal. Furthermore, it is robust and can be asymptotically as efficient as the maximum likelihood estimator. The merit of the estimator is demonstrated through simulation studies and real data examples. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
99
112
http://hdl.handle.net/10.1093/biomet/93.1.99
text/html
Access to full text is restricted to subscribers.
Nan Lin
Xuming He
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:481-4882013-03-04RePEc:oup:biomet
article
The prognostic analogue of the propensity score
The propensity score collapses the covariates of an observational study into a single measure summarizing their joint association with treatment conditions; prognostic scores summarize covariates' association with potential responses. As with propensity scores, stratification on prognostic scores brings to uncontrolled studies a concrete and desirable form of balance, a balance that is more familiar as an objective of experimental control. Like propensity scores, prognostic scores can reduce the dimension of the covariate, yet causal inferences conditional on them are as valid as are inferences conditional only on the unreduced covariate. As a method of adjustment unto itself, prognostic scoring has limitations not shared with propensity scoring, but it holds promise as a complement to the propensity score, particularly in certain designs for which unassisted propensity adjustment is difficult or infeasible. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asn004
application/pdf
Access to full text is restricted to subscribers.
Ben B. Hansen
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:573-5852013-03-04RePEc:oup:biomet
article
Influence functions and robust Bayes and empirical Bayes small area estimation
We introduce new robust small area estimation procedures based on area-level models. We first find influence functions corresponding to each individual area-level observation by measuring the divergence between the posterior density functions of regression coefficients with and without that observation. Next, based on these influence functions, properly standardized, we propose some new robust Bayes and empirical Bayes small area estimators. The mean squared errors and estimated mean squared errors of these estimators are also found. A small simulation study compares the performance of the robust and the regular empirical Bayes estimators. When the model variance is larger than the sample variance, the proposed robust empirical Bayes estimators are superior. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
573
585
http://hdl.handle.net/10.1093/biomet/asn030
application/pdf
Access to full text is restricted to subscribers.
Malay Ghosh
Tapabrata Maiti
Ananya Roy
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:613-6272013-03-04RePEc:oup:biomet
article
Bayesian inference for Markov processes with diffusion and discrete components
Data arising in certain radio-tracking experiments consist of both a continuous spatial component and a discrete component related to behaviour. This leads naturally to stochastic models with a state space which is a product of continuous and discrete components. We consider a class of such models in continuous time, which can be thought of as diffusions in random environments. They are related to switching diffusion or hidden Markov models, but observations are made on both components at discrete time points, so that neither component is completely 'hidden'. We describe and illustrate an approach to fully Bayesian inference for these general models. The algorithm used is a hybrid Markov chain Monte Carlo method. The diffusion parameters, the environment parameters and the sample path of the environment process itself are updated separately, in sequence, and the individual steps are a mixture of Gibbs and random walk Metropolis--Hastings types. Some implementation and model checking issues are discussed, and an example using data arising from a radio-tracking experiment is described. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
613
627
P. G. Blackwell
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:799-8122013-03-04RePEc:oup:biomet
article
Covariance reducing models: An alternative to spectral modelling of covariance matrices
We introduce covariance reducing models for studying the sample covariance matrices of a random vector observed in different populations. The models are based on reducing the sample covariance matrices to an informational core that is sufficient to characterize the variance heterogeneity among the populations. They possess useful equivariance properties and provide a clear alternative to spectral models for covariance matrices. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
799
812
http://hdl.handle.net/10.1093/biomet/asn052
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:99-1122013-03-04RePEc:oup:biomet
article
Likelihood inference in nearest-neighbour classification models
Traditionally the neighbourhood size k in the k-nearest-neighbour algorithm is either fixed at the first nearest neighbour or is selected on the basis of a crossvalidation study. In this paper we present an alternative approach that develops the k-nearest-neighbour algorithm using likelihood-based inference. Our method takes the form of a generalised linear regression on a set of k-nearest-neighbour autocovariates. By defining the k-nearest-neighbour algorithm in this way we are able to extend the method to accommodate the original predictor variables as possible linear effects as well as allowing for the inclusion of multiple nearest-neighbour terms. The choice of the final model proceeds via a stepwise regression procedure. It is shown that our method incorporates a conventional generalised linear model and a conventional k-nearest-neighbour algorithm as special cases. Empirical results suggest that the method out-performs the standard k-nearest-neighbour method in terms of misclassification rate on a wide variety of datasets. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
99
112
Christopher C. Holmes
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:491-5122013-03-04RePEc:oup:biomet
article
Expected-posterior prior distributions for model selection
We consider the problem of comparing parametric models using a Bayesian approach. A new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach. The idea is to define the priors for all models from a common underlying predictive distribution, in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques. The approach has subjective Bayesian and default Bayesian implementations, and overcomes the most significant impediment to Bayesian model selection, that of ensuring that prior distributions for the various models are appropriately compatible. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
491
512
Jose M. Perez
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:75-892013-03-04RePEc:oup:biomet
article
Covariate-adjusted regression
We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate-adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/92.1.75
text/html
Access to full text is restricted to subscribers.
Damla Şenturk
Hans-Georg Muller
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:305-3192013-03-04RePEc:oup:biomet
article
Weighted estimating equations for semiparametric transformation models with censored data from a case-cohort design
In a case-cohort design introduced by Prentice (1986), covariates are assembled only for a subcohort randomly selected from the entire cohort, and any additional cases outside the subcohort. Semiparametric transformation models are considered here for failure time data from the case-cohort design. Weighted estimating equations are proposed for estimation of the regression parameters. The estimation procedure of survival probability at given covariate levels is also provided. Asymptotic properties are derived for the estimators using finite population sampling theory, U-statistics theory and martingale convergence results. The finite-sample properties of the proposed estimators, as well as the efficiency relative to the full cohort estimators, are assessed via simulation studies. A case-cohort dataset from the Atherosclerosis Risk in Communities study is used to illustrate the estimating procedure. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
305
319
Lan Kong
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:939-9522013-03-04RePEc:oup:biomet
article
A Hybrid Pairwise Likelihood Method
A modification to the pairwise likelihood method is proposed, which aims to improve the estimation of the marginal distribution parameters. This is achieved by replacing the pairwise likelihood score equations, for estimating such parameters, by the optimal linear combinations of the marginal score functions. A further advantage of the proposed estimator of marginal parameters, over pairwise likelihood, is that it is robust to misspecification of the bivariate distributions as long as the univariate marginal distributions are correctly specified. While alternating logistic regression can be seen as a special case of the proposed method, it is shown that an existing generalization of alternating logistic regression applicable to ordinal data is not the same as and is inferior to the proposed method because it replaces certain conditional densities by pseudodensities that assume working independence. The fitting of the multivariate negative binomial distribution is another scenario involving intractable likelihood that calls for the use of pairwise likelihood methods, and the superiority of the modified method is demonstrated in a simulation study. Two examples, based on the analyses of salamander mating and patient-controlled analgesia data, demonstrate the usefulness of the proposed method. The possibility of combining optimally the pairwise, rather than marginal, scores is also considered and its difficulty and potential are discussed. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
939
952
http://hdl.handle.net/10.1093/biomet/asm051
application/pdf
Access to full text is restricted to subscribers.
Anthony Y. C. Kuk
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:809-8302013-03-04RePEc:oup:biomet
article
Efficient estimation of covariance selection models
A Bayesian method is proposed for estimating an inverse covariance matrix from Gaussian data. The method is based on a prior that allows the off-diagonal elements of the inverse covariance matrix to be zero, and in many applications results in a parsimonious parameterisation of the covariance matrix. No assumption is made about the structure of the corresponding graphical model, so the method applies to both nondecomposable and decomposable graphs. All the parameters are estimated by model averaging using an efficient Metropolis--Hastings sampling scheme. A simulation study demonstrates that the method produces statistically efficient estimators of the covariance matrix, when the inverse covariance matrix is sparse. The methodology is illustrated by applying it to three examples that are high-dimensional relative to the sample size. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
809
830
Frederick Wong
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:613-6252013-03-04RePEc:oup:biomet
article
On optimal crossover designs when carryover effects are proportional to direct effects
There are a number of different models for crossover designs which take account of carryover effects. Since it seems plausible that a treatment with a large direct effect should generally have a larger carryover effect, Kempton et al. (2001) considered a model where the carryover effects are proportional to the direct effects. The advantage of this model lies in the fact that there are fewer parameters to be estimated. Its problem lies in the nonlinearity of the estimators. Kempton et al. (2001) considered the least squares estimator. They point out that this estimator is asymptotically equivalent to the estimator in a linear model which assumes the true parameters to be known. For this estimator they determine optimal designs numerically for some cases. The present paper generalises some of their results. Our results are derived with the help of a generalisation of the methods used in Kunert & Martin (2000). Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
613
625
http://hdl.handle.net/10.1093/biomet/93.3.613
text/html
Access to full text is restricted to subscribers.
R. A. Bailey
J. Kunert
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:745-7542013-03-04RePEc:oup:biomet
article
Empirical supremum rejection sampling
Rejection sampling thins out samples from a candidate density from which it is easy to simulate, to obtain samples from a more awkward target density. A prerequisite is knowledge of the finite supremum of the ratio of the target and candidate densities. This severely restricts application of the method because it can be difficult to calculate the supremum. We use theoretical argument and numerical work to show that a practically perfect sample may be obtained by replacing the exact supremum with the maximum obtained from simulated candidates. We also provide diagnostics for failure of the method caused by a bad choice of candidate distribution. The implication is that essentially no theoretical work is required to apply rejection sampling in many practical cases. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
745
754
Brian S. Caffo
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:221-2282013-03-04RePEc:oup:biomet
article
A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome
Outcome-dependent sampling designs have been shown to be a cost-effective way to enhance study efficiency. We show that the outcome-dependent sampling design with a continuous outcome can be viewed as an extension of the two-stage case-control designs to the continuous-outcome case. We further show that the two-stage outcome-dependent sampling has a natural link with the missing-data and biased-sampling frameworks. Through the use of semiparametric inference and missing-data techniques, we show that a certain semiparametric maximum-likelihood estimator is computationally convenient and achieves the semiparametric efficient information bound. We demonstrate this both theoretically and through simulation. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
221
228
http://hdl.handle.net/10.1093/biomet/asn073
application/pdf
Access to full text is restricted to subscribers.
Rui Song
Haibo Zhou
Michael R. Kosorok
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:997-10012013-03-04RePEc:oup:biomet
article
On consistency of Kendall's tau under censoring
Necessary and sufficient conditions for consistency of a simple estimator of Kendall's tau under bivariate censoring are presented. The results are extended to data subject to bivariate left truncation as well as right censoring. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asn037
application/pdf
Access to full text is restricted to subscribers.
David Oakes
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:279-2882013-03-04RePEc:oup:biomet
article
A construction method for orthogonal Latin hypercube designs
The Latin hypercube design is a popular choice of experimental design when computer simulation is used to study a physical process. These designs guarantee uniform samples for the marginal distribution of each single input. A number of methods have been proposed for extending the uniform sampling to higher dimensions.We show how to construct Latin hypercube designs in which all main effects are orthogonal. Our method can also be used to construct Latin hypercube designs with low correlation of first-order and second-order terms. Our method generates orthogonal Latin hypercube designs that can include many more factors than those proposed by Ye (1998). Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
279
288
http://hdl.handle.net/10.1093/biomet/93.2.279
text/html
Access to full text is restricted to subscribers.
David M. Steinberg
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:291-3032013-03-04RePEc:oup:biomet
article
Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data
Sequentially ordered multivariate failure time data are often observed in biomedical studies and inter-event, or gap, times are often of interest. Generally, standard hazard regression methods cannot be applied to the gap times because of identifiability issues and induced dependent censoring. We propose estimating equations for fitting proportional hazards regression models to the gap times. Model parameters are shown to be consistent and asymptotically normal. Simulation studies reveal the appropriateness of the asymptotic approximations in finite samples. The proposed methods are applied to renal failure data to assess the association between demographic covariates and both time until wait-listing and time from wait-listing to kidney transplantation. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
291
303
Douglas E. Schaubel
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:893-9122013-03-04RePEc:oup:biomet
article
Efficient balanced sampling: The cube method
A balanced sampling design is defined by the property that the Horvitz--Thompson estimators of the population totals of a set of auxiliary variables equal the known totals of these variables. Therefore the variances of estimators of totals of all the variables of interest are reduced, depending on the correlations of these variables with the controlled variables. In this paper, we develop a general method, called the cube method, for selecting approximately balanced samples with equal or unequal inclusion probabilities and any number of auxiliary variables. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
893
912
http://hdl.handle.net/10.1093/biomet/91.4.893
text/html
Access to full text is restricted to subscribers.
Jean-Claude Deville
Yves Tille
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:673-6892013-03-04RePEc:oup:biomet
article
Optimal adaptive randomized designs for clinical trials
Optimal decision-analytic designs are deterministic. Such designs are appropriately criticized in the context of clinical trials because they are subject to assignment bias. On the other hand, balanced randomized designs may assign an excessive number of patients to a treatment arm that is performing relatively poorly. We propose a compromise between these two extremes, one that achieves some of the good characteristics of both. We introduce a constrained optimal adaptive design for a fully sequential randomized clinical trial with k arms and n patients. An r-design is one for which, at each allocation, each arm has probability at least r of being chosen, 0 ⩽ r ⩽ 1/k. An optimal design among all r-designs is called r-optimal. An r 1-design is also an r 2-design if r 1 ⩾ r 2. A design without constraint is the special case r = 0 and a balanced randomized design is the special case r = 1/k. The optimization criterion is to maximize the expected overall utility in a Bayesian decision-analytic approach, where utility is the sum over the utilities for individual patients over a 'patient horizon' N. We prove analytically that there exists an r-optimal design such that each patient is assigned to a particular one of the arms with probability 1 − (k − 1)r, and to the remaining arms with probability r. We also show that the balanced design is asymptotically r-optimal for any given r, 0 ⩽ r < 1/k, as N/n → ∞. This implies that every r-optimal design is asymptotically optimal without constraint. Numerical computations using backward induction for k = 2 arms show that, in general, this asymptotic optimality feature for r-optimal designs can be accomplished with moderate trial size n if the patient horizon N is large relative to n. We also show that, in a trial with an r-optimal design, r < 1/2, fewer patients are assigned to an inferior arm than when following a balanced design, even for r-optimal designs having the same statistical power as a balanced design. We discuss extensions to various clinical trial settings. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
673
689
http://hdl.handle.net/10.1093/biomet/asm049
application/pdf
Access to full text is restricted to subscribers.
Yi Cheng
Donald A. Berry
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:775-7892013-03-04RePEc:oup:biomet
article
Nonparametric estimation of the variogram and its spectrum
In the study of intrinsically stationary spatial processes, a new nonparametric variogram estimator is proposed through its spectral representation. The methodology is based on estimation of the variogram's spectrum by solving a regularized inverse problem through quadratic programming. The estimated variogram is guaranteed to be conditionally negative-definite. Simulation shows that our estimator is flexible and generally has smaller mean integrated squared error than the parametric estimator under model misspecification. Our methodology is applied to a spatial dataset of decadal temperature changes. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
775
789
http://hdl.handle.net/10.1093/biomet/asr056
application/pdf
Access to full text is restricted to subscribers.
Chunfeng Huang
Tailen Hsing
Noel Cressie
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:711-7202013-03-04RePEc:oup:biomet
article
Sudoku-based space-filling designs
Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game board is a nine-by-nine grid of numbers from one to nine. Several entries within the grid are provided and the remaining entries must be filled in subject to no row, column, or three-by-three subsquare containing duplicate numbers. By exploiting these three types of uniformity, we propose an approach to constructing a new type of design, called a Sudoku-based space-filling design. Such a design can be divided into groups of subdesigns so that the complete design and each subdesign achieve maximum uniformity in univariate and bivariate margins. Examples are given illustrating the proposed construction method. Applications of such designs include computer experiments with qualitative and quantitative factors, linking parameters in engineering and crossvalidation. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
711
720
http://hdl.handle.net/10.1093/biomet/asr024
application/pdf
Access to full text is restricted to subscribers.
Xu Xu
BEN Haaland
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:1-172013-03-04RePEc:oup:biomet
article
Modelling pairwise dependence of maxima in space
We model pairwise dependence of temporal maxima, such as annual maxima of precipitation, that have been recorded in space, either on a regular grid or at irregularly spaced locations. The construction of our estimators stems from the variogram concept. The asymptotic properties of our pairwise dependence estimators are established through properties of empirical processes. The performance of our approach is illustrated by simulations and by the treatment of a real dataset. In addition to bringing new results about the asymptotic behaviour of copula estimators, the latter being linked to first-order variograms, one main advantage of our approach is to propose a simple connection between extreme value theory and geostatistics. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
1
17
http://hdl.handle.net/10.1093/biomet/asp001
application/pdf
Access to full text is restricted to subscribers.
Philippe Naveau
Armelle Guillou
Daniel Cooley
Jean Diebolt
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:327-3392013-03-04RePEc:oup:biomet
article
Likelihood for component parameters
For a statistical model with data, likelihood for the scalar or vector full parameter &thgr;, of dimension p say, is typically well defined and easily computed. In this paper, we investigate likelihood for a component parameter &psgr;(&thgr;) of dimension d
2
2003
90
June
Biometrika
327
339
D. A. S. Fraser
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:537-5542013-03-04RePEc:oup:biomet
article
Efficient Bayesian inference for Gaussian copula regression models
A Gaussian copula regression model gives a tractable way of handling a multivariate regression when some of the marginal distributions are non-Gaussian. Our paper presents a general Bayesian approach for estimating a Gaussian copula model that can handle any combination of discrete and continuous marginals, and generalises Gaussian graphical models to the Gaussian copula framework. Posterior inference is carried out using a novel and efficient simulation method. The methods in the paper are applied to simulated and real data. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
537
554
http://hdl.handle.net/10.1093/biomet/93.3.537
text/html
Access to full text is restricted to subscribers.
Michael Pitt
David Chan
Robert Kohn
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:243-2472013-03-04RePEc:oup:biomet
article
Construction of orthogonal and nearly orthogonal Latin hypercubes
We propose a method for constructing orthogonal or nearly orthogonal Latin hypercubes. The method yields a large Latin hypercube by coupling an orthogonal array of index unity with a small Latin hypercube. It is shown that the large Latin hypercube inherits the exact or near orthogonality of the small Latin hypercube. Thus, effort for searching for large Latin hypercubes, that are exactly or nearly orthogonal, can be focussed on finding small Latin hypercubes with the same property. We obtain a useful collection of orthogonal or nearly orthogonal Latin hypercubes, which have a large factor-to-run ratio and the results are often much more economical than existing methods. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
243
247
http://hdl.handle.net/10.1093/biomet/asn064
application/pdf
Access to full text is restricted to subscribers.
C. Devon Lin
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:199-2162013-03-04RePEc:oup:biomet
article
Estimation of a covariance matrix with zeros
We consider estimation of the covariance matrix of a multivariate random vector under the constraint that certain covariances are zero. We first present an algorithm, which we call iterative conditional fitting, for computing the maximum likelihood estimate of the constrained covariance matrix, under the assumption of multivariate normality. In contrast to previous approaches, this algorithm has guaranteed convergence properties. Dropping the assumption of multivariate normality, we show how to estimate the covariance matrix in an empirical likelihood approach. These approaches are then compared via simulation and on an example of gene expression. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
199
216
http://hdl.handle.net/10.1093/biomet/asm007
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Mathias Drton
Thomas S. Richardson
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:679-7012013-03-04RePEc:oup:biomet
article
Principal component models for correlation matrices
Distributional theory regarding principal components is less well developed for correlation matrices than it is for covariance matrices. The intent of this paper is to reduce this disparity. Methods are proposed that enable investigators to fit and to make inferences about flexible principal components models for correlation matrices. The models allow arbitrary eigenvalue multiplicities and allow the distinct eigenvalues to be modelled parametrically or nonparametrically. Local parameterisations and implicit functions are used to construct full-rank unconstrained parameterisations. First-order asymptotic distributions are obtained directly from the theory of estimating functions. Second-order accurate distributions for making inferences under normality are obtained directly from likelihood theory. Simulation studies show that the Bartlett correction is effective in controlling the size of the tests and that first-order approximations to nonnull distributions are reasonably accurate. The methods are illustrated on a dataset. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
679
701
Robert J. Boik
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:767-7672013-03-04RePEc:oup:biomet
article
'Nonparametric inference in multivariate mixtures' Biometrika (2005), 92, pp. 667–678
The left-hand side of equation (2·8), on p. 671, should read {π 1 (1 − π 1)}-super-−1/2 (2π 1 − 1) rather than {(1 − π 1)/π 1}-super-1/2 (2π 1 − 1). Reflecting this change, the left-hand side of equation (3·1) on the same page should be altered to { π Ȣ 7; 1( 1− π ∧ 1) }− 1/2 (2 π ∧ 1 − 1) , and the formula at the foot of p. 677 should be modified to {π 1 (1 − π 1)}-super-−1/2 (2π 1 − 1) + O p(n-super-−1/2). No other formula is affected, and the left-hand side of (2·8) is still increasing in π 1. The numerical results, discussed in §4, are influenced in minor ways. In the simulation study, absolute bias is reduced, and variance is either slightly increased or slightly decreased. In the real-data example, using the nonparametric approach to analysis, mean squared error is further reduced, from 0·0011 to 0·0004. We are grateful to Hiro Kasahara and Katsumi Shimotsu for pointing out the error. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
767
767
http://hdl.handle.net/10.1093/biomet/asm042
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Amnon Neeman
Reza Pakyari
Ryan Elmore
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:249-2502013-03-04RePEc:oup:biomet
article
'Statistical assessment of bilateral symmetry of shapes'
1
2005
92
March
Biometrika
249
250
http://hdl.handle.net/10.1093/biomet/92.1.249-a
text/html
Access to full text is restricted to subscribers.
K. V. Mardia
F. L. Bookstein
I. J. Moreton
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:791-8082013-03-04RePEc:oup:biomet
article
Exponential functionals and means of neutral-to-the-right priors
The mean of a random distribution chosen from a neutral-to-the-right prior can be represented as the exponential functional of an increasing additive process. This fact is exploited in order to give sufficient conditions for the existence of the mean of a neutral-to-the-right prior and for the absolute continuity of its probability distribution. Moreover, expressions for its moments, of any order, are provided. For illustrative purposes we consider a generalisation of the neutral-to-the-right prior based on the gamma process and the beta-Stacy process. Finally, by resorting to the maximum entropy algorithm, we obtain an approximation to the probability density function of the mean of a neutral-to-the-right prior. The arguments are easily extended to examine means of posterior quantities. The numerical results obtained are compared to those yielded by the application of some well-established simulation algorithms. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
791
808
Ilenia Epifani
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:953-9642013-03-04RePEc:oup:biomet
article
A Jackknife Variance Estimator for Unistage Stratified Samples with Unequal Probabilities
Existing jackknife variance estimators used with sample surveys can seriously overestimate the true variance under unistage stratified sampling without replacement with unequal probabilities. A novel jackknife variance estimator is proposed which is as numerically simple as existing jackknife variance estimators. Under certain regularity conditions, the proposed variance estimator is consistent under stratified sampling without replacement with unequal probabilities. The high entropy regularity condition necessary for consistency is shown to hold for the Rao--Sampford design. An empirical study of three unequal probability sampling designs supports our findings. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
953
964
http://hdl.handle.net/10.1093/biomet/asm072
application/pdf
Access to full text is restricted to subscribers.
Yves G. Berger
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:215-2202013-03-04RePEc:oup:biomet
article
On Bartlett correction of empirical likelihood in the presence of nuisance parameters
Lazar & Mykland (1999) showed that an empirical likelihood defined by two estimating equations with a nuisance parameter need not be Bartlett-correctable. This paper shows that Bartlett correction of empirical likelihood in the presence of a nuisance parameter depends critically on the way the nuisance parameter is removed when formulating the likelihood for the parameter of interest. We establish in the broad framework of estimating functions that the empirical likelihood is still Bartlett-correctable if the nuisance parameter is profiled out given the value of the parameter of interest. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
215
220
http://hdl.handle.net/10.1093/biomet/93.1.215
text/html
Access to full text is restricted to subscribers.
Song Xi Chen
Hengjian Cui
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:603-6122013-03-04RePEc:oup:biomet
article
A modified likelihood ratio statistic for some nonregular models
Higher-order approximations to the distribution of the likelihood ratio statistic are considered for a class of nonregular models in which the maximum likelihood estimator of the parameter of interest is asymptotically distributed according to an exponential, rather than a normal, distribution. Asymptotic behaviour of this type often arises when the boundary of the support of the distributions under consideration depends on &thgr;. A modified likelihood ratio statistic is proposed that follows its asymptotic distribution to a high degree of approximation, and this statistic is illustrated on several examples. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
603
612
Thomas A. Severini
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:529-5422013-03-04RePEc:oup:biomet
article
Integrated likelihood functions for non-Bayesian inference
Consider a model with parameter θ = (ψ, λ), where ψ is the parameter of interest, and let L(ψ, λ) denote the likelihood function. One approach to likelihood inference for ψ is to use an integrated likelihood function, in which λ is eliminated from L(ψ, λ) by integrating with respect to a density function π(λ|ψ). The goal of this paper is to consider the problem of selecting π(λ|ψ) so that the resulting integrated likelihood function is useful for non-Bayesian likelihood inference. The desirable properties of an integrated likelihood function are analyzed and these suggest that π(λ|ψ) should be chosen by finding a nuisance parameter ϕ that is unrelated to ψ and then taking the prior density for ϕ to be independent of ψ. Such an unrelated parameter is constructed and the resulting integrated likelihood is shown to be closely related to the modified profile likelihood. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
529
542
http://hdl.handle.net/10.1093/biomet/asm040
application/pdf
Access to full text is restricted to subscribers.
Thomas A. Severini
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:937-9512013-03-04RePEc:oup:biomet
article
Optimal calibration estimators in survey sampling
We show that the model-calibration estimator for the finite population mean, which was proposed by Wu & Sitter (2001) through an intuitive argument, is optimal among a class of calibration estimators. We also present optimal calibration estimators for the finite population distribution function, the population variance, the variance of a linear estimator and other quadratic finite population functions under a unified framework. The proposed calibration estimators are optimal under the true model but remain design consistent even if the working model is misspecified. A limited simulation study shows that the improvement of these optimal estimators over the conventional ones can be substantial. The question of when and how auxiliary information can be used for both the estimation of the population mean using a generalised regression estimator and the estimation of its variance through calibration is addressed clearly under the proposed general methodology. Some fundamental issues in using auxiliary information from survey data are also addressed in the context of optimal estimation. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
937
951
Changbao Wu
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:231-2422013-03-04RePEc:oup:biomet
article
Optimal sufficient dimension reduction for the conditional mean in multivariate regression
The aim of this article is to develop optimal sufficient dimension reduction methodology for the conditional mean in multivariate regression. The context is roughly the same as that of a related method by Cook & Setodji (2003), but the new method has several advantages. It is asymptotically optimal in the sense described herein and its test statistic for dimension always has a chi-squared distribution asymptotically under the null hypothesis. Additionally, the optimal method allows tests of predictor effects. A comparison of the two methods is provided. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
231
242
http://hdl.handle.net/10.1093/biomet/asm003
application/pdf
Access to full text is restricted to subscribers.
Jae Keun Yoo
R. Dennis Cook
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:485-4912013-03-04RePEc:oup:biomet
article
Generalised minimum aberration construction results for symmetrical orthogonal arrays
Generalised minimum aberration is a recently-established design criterion for the whole class of orthogonal arrays and fractional factorial designs. The criterion is, as its name suggests, a generalisation of minimum aberration for regular designs and of minimum G-sub-2-aberration for twolevel designs. The aim of the criterion is to find designs which minimise in a certain sense the aliasing between main effects and interactions. In this paper, theoretical results are developed for finding symmetrical orthogonal arrays with generalised minimum aberration for more than two factor levels. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
485
491
http://hdl.handle.net/10.1093/biomet/92.2.485
text/html
Access to full text is restricted to subscribers.
Neil A. Butler
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:365-3792013-03-04RePEc:oup:biomet
article
Modelling multiple time series via common factors
We propose a new method for estimating common factors of multiple time series. One distinctive feature of the new approach is that it is applicable to some nonstationary time series. The unobservable, nonstationary factors are identified by expanding the white noise space step by step, thereby solving a high-dimensional optimization problem by several low-dimensional sub-problems. Asymptotic properties of the estimation are investigated. The proposed methodology is illustrated with both simulated and real datasets. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
365
379
http://hdl.handle.net/10.1093/biomet/asn009
application/pdf
Access to full text is restricted to subscribers.
Jiazhu Pan
Qiwei Yao
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:487-4952013-03-04RePEc:oup:biomet
article
Testing goodness-of-fit in logistic case-control studies
We present a goodness-of-fit test for the logistic regression model under case-control sampling. The test statistic is constructed via a discrepancy between two competing kernel density estimators of the underlying conditional distributions given case-control status. The proposed goodness-of-fit test is shown to compare very favourably with previously proposed tests for case-control sampling in terms of power. The test statistic can be easily computed as a quadratic form in the residuals from a prospective logistic regression maximum likelihood fit. In addition, the proposed test is affine invariant and has an alternative representation in terms of empirical characteristic functions. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
487
495
http://hdl.handle.net/10.1093/biomet/asm033
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:359-3742013-03-04RePEc:oup:biomet
article
Permutation tests for equality of distributions in high-dimensional settings
Motivated by applications in high-dimensional settings, we suggest a test of the hypothesis H-sub-0 that two sampled distributions are identical. It is assumed that two independent datasets are drawn from the respective populations, which may be very general. In particular, the distributions may be multivariate or infinite-dimensional, in the latter case representing, for example, the distributions of random functions from one Euclidean space to another. Our test uses a measure of distance between data. This measure should be symmetric but need not satisfy the triangle inequality, so it is not essential that it be a metric. The test is based on ranking the pooled dataset, with respect to the distance and relative to any fixed data value, and repeating this operation for each fixed datum. A permutation argument enables a critical point to be chosen such that the test has concisely known significance level, conditional on the set of all pairwise distances. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
359
374
Peter Hall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:137-1462013-03-04RePEc:oup:biomet
article
Orthogonal arrays robust to nonnegligible two-factor interactions
Regular fractional factorial designs with clear two-factor interactions provide a useful class of designs that are robust to nonnegligible two-factor interactions. In this paper, the concept of clear two-factor interactions is generalised to orthogonal arrays. The new concept leads to a much wider class of designs robust to nonnegligible two-factor interactions. We study the existence and construction of such designs. The designs we construct have a structure that render themselves particularly attractive in the robust parameter design setting. We also discuss an interesting connection between designs with clear two-factor interactions and mixed orthogonal arrays. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
137
146
http://hdl.handle.net/10.1093/biomet/93.1.137
text/html
Access to full text is restricted to subscribers.
Boxin Tang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:859-8742013-03-04RePEc:oup:biomet
article
Bayesian nonparametric inference on stochastic ordering
We consider Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of restricted dependent Dirichlet process priors. These priors have full support in the space of stochastically ordered distributions, and can be used for collections of unknown mixture distributions to obtain a flexible class of mixture models. Theoretical properties are discussed, efficient methods are developed for posterior computation using Markov chain Monte Carlo simulation and the methods are illustrated using data from a study of DNA damage and repair. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
859
874
http://hdl.handle.net/10.1093/biomet/asn043
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
Shyamal D. Peddada
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:367-3832013-03-04RePEc:oup:biomet
article
Nonparametric estimation with left-truncated semicompeting risks data
Nonparametric estimators for competing risks data can be applied to semicompeting risks data, a type of multi-state data where a terminating event may censor a nonterminating event, after forcing the data into the competing risks format. Complications may arise with left truncation of the terminating event, where the competing risks analysis naively truncates the nonterminating event using the left-truncation time for the terminating event, which may lead to large efficiency losses. We propose nonparametric estimators which use all semicompeting risks information and do not require artificial truncation. The uniform consistency and weak convergence of the estimators are established and variance estimators are provided. Simulation studies and an analysis of a diabetes registry demonstrate large efficiency gains over the naive estimators. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
367
383
http://hdl.handle.net/10.1093/biomet/93.2.367
text/html
Access to full text is restricted to subscribers.
L. Peng
J. P. Fine
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:201-2112013-03-04RePEc:oup:biomet
article
On fuzzy familywise error rate and false discovery rate procedures for discrete distributions
Fuzzy multiple comparisons procedures are introduced as a solution to the problem of multiple comparisons for discrete test statistics. The critical function of the randomized p-values is proposed as a measure of evidence against the null hypotheses. The classical concept of randomized tests is extended to multiple comparisons. This approach makes all theory of multiple comparisons developed for continuously distributed statistics automatically applicable to the discrete case. Examples of familywise error rate and false discovery rate procedures are discussed and an application to linkage disequilibrium testing is given. Software for implementing the procedures is available. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
201
211
http://hdl.handle.net/10.1093/biomet/asn061
application/pdf
Access to full text is restricted to subscribers.
Elena Kulinskaya
Alex Lewin
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:861-8752013-03-04RePEc:oup:biomet
article
Influence functions and outlier detection under the common principal components model: A robust approach
The common principal components model for several groups of multivariate observations assumes equal principal axes but different variances along these axes among the groups. Influence functions for plug-in and projection-pursuit estimates under a common principal component model are obtained. Asymptotic variances are derived from them. Outlier detection is possible using partial influence functions. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
861
875
Graciela Boente
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:371-3842013-03-04RePEc:oup:biomet
article
Direction estimation in single-index regressions
We propose a general dimension-reduction method that combines the ideas of likelihood, correlation, inverse regression and information theory. We do not require that the dependence be confined to particular conditional moments, nor do we place restrictions on the predictors or on the regression that are necessary for methods like ordinary least squares and sliced-inverse regression. Although we focus on single-index regressions, the underlying idea is applicable more generally. Illustrative examples are presented. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
371
384
http://hdl.handle.net/10.1093/biomet/92.2.371
text/html
Access to full text is restricted to subscribers.
Xiangrong Yin
R. Dennis Cook
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:486-4892013-03-04RePEc:oup:biomet
article
Understanding nonparametric estimation for clustered data
In this note we give an alternative formulation of the nonparametric estimators of Wang (2003) with the identity link. This results in a closed form of the estimator that has computational advantages and gives insight into the rationale behind the estimator. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
486
489
http://hdl.handle.net/10.1093/biomet/93.2.486
text/html
Access to full text is restricted to subscribers.
Richard Huggins
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:601-6192013-03-04RePEc:oup:biomet
article
Joint modelling of paired sparse functional data using principal components
We propose a modelling framework to study the relationship between two paired longitudinally observed variables. The data for each variable are viewed as smooth curves measured at discrete time-points plus random errors. While the curves for each variable are summarized using a few important principal components, the association of the two longitudinal variables is modelled through the association of the principal component scores. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference. The proposed method can be applied in the difficult case in which the measurement times are irregular and sparse and may differ widely across individuals. Use of functional principal components enhances model interpretation and improves statistical and numerical stability of the parameter estimates. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
601
619
http://hdl.handle.net/10.1093/biomet/asn035
application/pdf
Access to full text is restricted to subscribers.
Lan Zhou
Jianhua Z. Huang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:230-2372013-03-04RePEc:oup:biomet
article
Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys
Design weights in surveys are often adjusted to accommodate auxiliary information and to meet pre-specified range restrictions, typically via some ad hoc algorithmic adjustment to a generalised regression estimator. In this paper, we present a simple solution to this problem using empirical likelihood methods or generalised regression. We first develop algorithms for computing empirical likelihood estimators and model-calibrated empirical likelihood estimators. The first algorithm solves the computational problem of the empirical likelihood method in general, both in survey and non-survey settings, and theoretically guarantees its convergence. The second exploits properties of the model-calibration method and is particularly simple. The algorithms are adapted for handling benchmark constraints and pre-specified range restrictions on the weight adjustments. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
230
237
J. Chen
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:435-4502013-03-04RePEc:oup:biomet
article
Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes
Most methods for analysing cluster-correlated biological data implicitly assume the ignorability of cluster sizes. When this assumption fails, the resulting inferences may be asymptotically invalid. Hoffman et al. (2001) proposed a simple but computationally intensive method, based on a large number of within-cluster resamples and associated separate estimating equations, that leads to asymptotically valid inferences whether the cluster sizes are ignorable or not. We study a simple method, based on a single inverse cluster size-weighted estimating equation, that avoids resampling and yet leads to asymptotically valid inferences. Simulation results are presented to assess the performance of the proposed method. We also propose Wald tests for ignorability of cluster sizes. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
435
450
http://hdl.handle.net/10.1093/biomet/92.2.435
text/html
Access to full text is restricted to subscribers.
E. Benhin
J. N. K. Rao
A. J. Scott
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:425-4382013-03-04RePEc:oup:biomet
article
Effects of the reference set on frequentist inferences
We employ second-order likelihood asymptotics to investigate how ideal frequentist inferences depend on the probability model for the data through more than the likelihood function, referring to this as the effect of the reference set. There are two aspects of higherorder corrections to first-order likelihood methods, namely (i) that involving effects of fitting nuisance parameters and leading to the modified profile likelihood, and (ii) another part pertaining to limitation in adjusted information. Generally, each of these involves a first-order adjustment depending on the reference set. However, we show that, for some important settings, likelihood-irrelevant model specifications have a second-order effect on both of these adjustments; this result includes specification of the censoring model for survival data. On the other hand, for sequential experiments the likelihood-irrelevant specification of the stopping rule has a second-order effect on adjustment (i) but a firstorder effect on adjustment (ii). These matters raise the issue of what are 'ideal' frequentist inferences, since consideration of 'exact' frequentist inferences will not suffice. We indicate that to second order ideal frequentist inferences may be based on the distribution of the ordinary likelihood ratio statistic, without commonly considered adjustments thereto. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
425
438
http://hdl.handle.net/10.1093/biomet/93.2.425
text/html
Access to full text is restricted to subscribers.
Donald A. Pierce
Ruggero Bellio
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:957-9642013-03-04RePEc:oup:biomet
article
The optimal confidence region for a random parameter
Suppose that, under a two-level hierarchical model, the distribution of the vector of random parameters is known or can be estimated well. The data are generated via a fixed, but unobservable, realisation of the vector. We derive the smallest confidence region for a specific component of this random vector under a joint Bayesian/frequentist paradigm. On average this optimal region can be much smaller than the corresponding Bayesian highest posterior density region. The new estimation procedure is especially appealing when one deals with data generated under a highly parallel structure. The new proposal is illustrated with a dataset from a multi-centre clinical study and also with one from a typical microarray experiment. The performance of our procedure is examined via simulation studies. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
957
964
http://hdl.handle.net/10.1093/biomet/92.4.957
text/html
Access to full text is restricted to subscribers.
Hajime Uno
Lu Tian
L. J. Wei
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:457-4612013-03-04RePEc:oup:biomet
article
The sampling properties of conditional independence graphs for structural vector autoregressions
Structural vector autoregressions allow contemporaneous series dependence and assume errors with no contemporaneous correlation. Models of this form, that also have a recursive structure, can be described by a directed acyclic graph. An important tool for identification of these models is the conditional independence graph constructed from the contemporaneous and lagged values of the process. We determine the large-sample properties of statistics used to test for the presence of links in this graph. A simple example illustrates how these results may be applied. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
457
461
Marco Reale
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:37-502013-03-04RePEc:oup:biomet
article
Partial and latent ignorability in missing-data problems
When an assumption of missing at random is untenable, it becomes necessary to model missing-data indicators, which carry information about the parameters of the complete-data population. Within a given application, however, researchers may believe that some aspects of missingness are ignorable but others are not. We argue that there are two different ways to formalize the notion that only part of the missingness is ignorable. These approaches correspond to assumptions that we call partially missing at random and latently missing at random. We explain these concepts and apply them in a latent-class analysis of survey questions with item nonresponse. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
37
50
http://hdl.handle.net/10.1093/biomet/asn069
application/pdf
Access to full text is restricted to subscribers.
Ofer Harel
Joseph L. Schafer
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:383-3922013-03-04RePEc:oup:biomet
article
Multimodality of the likelihood in the bivariate seemingly unrelated regressions model
We analyse the simplest two-equation seemingly unrelated regressions model and demonstrate that its likelihood may have up to five stationary points, and thus there may be up to three local modes. Consequently the estimates obtained via iterative estimation methods may depend on starting values. We further show that the probability of multimodality vanishes asymptotically. Monte Carlo simulations suggest that multimodality rarely occurs if the seemingly unrelated regressions model is true, but can become more frequent if the model is misspecified. The existence of multimodality in the likelihood for seemingly unrelated regressions models contradicts several claims in the literature. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
383
392
Mathias Drton
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:277-2912013-03-04RePEc:oup:biomet
article
Gamma frailty transformation models for multivariate survival times
We propose a class of transformation models for multivariate failure times. The class of transformation models generalize the usual gamma frailty model and yields a marginally linear transformation model for each failure time. Nonparametric maximum likelihood estimation is used for inference. The maximum likelihood estimators for the regression coefficients are shown to be consistent and asymptotically normal, and their asymptotic variances attain the semiparametric efficiency bound. Simulation studies show that the proposed estimation procedure provides asymptotically efficient estimates and yields good inferential properties for small sample sizes. The method is illustrated using data from a cardiovascular study. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
277
291
http://hdl.handle.net/10.1093/biomet/asp008
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:929-9412013-03-04RePEc:oup:biomet
article
A paradox concerning nuisance parameters and projected estimating functions
This paper is concerned with a paradox associated with parameter estimation in the presence of nuisance parameters. In a statistical model with unknown nuisance parameters, the efficiency of an estimator of a parameter usually increases when the nuisance parameters are known. However the opposite phenomenon can sometimes occur. In this paper, we elucidate the occurrence of this paradox by examining estimating functions. In particular, we focus on the projected estimating function, which is defined by the projection of the score function on to a given estimating function. A sufficient condition for the paradox to occur is the orthogonality of the two components of the projected estimating functions corresponding to parameters of interest and nuisance parameters. In addition, a numerical assessment is conducted in the context of a simple model to investigate the improvement of the asymptotic efficiency of estimators. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
929
941
http://hdl.handle.net/10.1093/biomet/91.4.929
text/html
Access to full text is restricted to subscribers.
Masayuki Henmi
Shinto Eguchi
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:399-4102013-03-04RePEc:oup:biomet
article
Optimal testing of multiple hypotheses with common effect direction
We present a theoretical basis for testing related endpoints. Typically, it is known how to construct tests of the individual hypotheses, but not how to combine them into a multiple test procedure that controls the familywise error rate. Using the closure method, we emphasize the role of consonant procedures, from an interpretive as well as a theoretical viewpoint. Surprisingly, even if each intersection test has an optimality property, the overall procedure obtained by applying closure to these tests may be inadmissible. We introduce a new procedure, which is consonant and has a maximin property under the normal model. The results are then applied to PROactive, a clinical trial designed to investigate the effectiveness of a glucose-lowering drug on macrovascular outcomes among patients with type 2 diabetes. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
399
410
http://hdl.handle.net/10.1093/biomet/asp006
application/pdf
Access to full text is restricted to subscribers.
Richard M. Bittman
Joseph P. Romano
Carlos Vallarino
Michael Wolf
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Observation-driven models for Poisson counts
This paper is concerned with a general class of observation-driven models for time series of counts whose conditional distributions given past observations and explanatory variables follow a Poisson distribution. These models provide a flexible framework for modelling a wide range of dependence structures. Conditions for stationarity and ergodicity of these processes are established from which the large-sample properties of the maximum likelihood estimators can be derived. Simulations are provided to give additional insight into the finite-sample behaviour of the estimators. Finally an application to a regression model for daily counts of asthma presentations at a Sydney hospital is described. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
777
790
Richard A. Davis
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:917-9312013-03-04RePEc:oup:biomet
article
Additive hazards models with latent treatment effectiveness lag time
In many clinical trials for evaluating treatment efficacy, it is believed that there may exist latent treatment effectiveness lag times after which medical treatment procedure or chemical compound would be in full effect. In this paper, semiparametric regression models are proposed and studied for estimating the treatment effect accounting for such latent lag times. The new models take advantage of the invariant property of the additive hazards model in marginalising over an additive latent variable; parameters in the models are thus easily estimated and interpreted, while the flexibility of not having to specify the baseline hazard function is preserved. Monte Carlo simulation studies demonstrate the appropriateness of the proposed semiparametric estimation procedure. The methodology is applied to data collected in a randomised clinical trial, which evaluates the efficacy of biodegradable carmustine polymers for treatment of recurrent brain tumours. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
917
931
Y. Q. Chen
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:411-4212013-03-04RePEc:oup:biomet
article
Goodness-of-fit test for complete spatial randomness against mixtures of regular and clustered spatial point processes
A goodness-of-fit test statistic for spatial point processes is proposed and shown to have an asymptotic chi-squared distribution if the underlying point process is Poisson. Simulations demonstrate that the test, when testing for complete spatial randomness, is more sensitive to mixtures of regular and clustered point processes than the tests using the nearest neighbour distance distribution, the second- or third-order characteristics. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
411
421
P. Grabarnik
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:721-7342013-03-04RePEc:oup:biomet
article
Semiparametric model-based inference in the presence of missing responses
We consider a semiparametric model that parameterizes the conditional density of the response, given covariates, but allows the marginal distribution of the covariates to be completely arbitrary. Responses may be missing. A likelihood-based imputation estimator and a semi-empirical-likelihood-based estimator for the parameter vector describing the conditional density are defined and proved to be asymptotically normal. Semi-empirical loglikelihood functions for the parameter vector and the response mean are derived. It is shown that the two semi-empirical loglikelihood functions are distributed asymptotically as weighted χ-super-2 and scaled χ-super-2, respectively. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
721
734
http://hdl.handle.net/10.1093/biomet/asn032
application/pdf
Access to full text is restricted to subscribers.
Qihua Wang
Pengjie Dai
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:763-7752013-03-04RePEc:oup:biomet
article
Analysing panel count data with informative observation times
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximising a conditional likelihood function of observed event counts and solving estimation equations. Large-sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumour study is presented. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
763
775
http://hdl.handle.net/10.1093/biomet/93.4.763
text/html
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Mei-Cheng Wang
Ying Zhang
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:375-3882013-03-04RePEc:oup:biomet
article
Local multiple imputation
Dealing with missing data via parametric multiple imputation methods usually implies stating several strong assumptions both about the distribution of the data and about underlying regression relationships. If such parametric assumptions do not hold, the multiply imputed data are not appropriate and might produce inconsistent estimators and thus misleading results. In this paper, a fully nonparametric and a semiparametric imputation method are studied, both based on local resampling principles. It is shown that the final estimator, based on these local imputations, is consistent under fewer or no parametric assumptions. Asymptotic expressions for bias, variance and mean squared error are derived, showing the theoretical impact of the different smoothing parameters. Simulations illustrate the usefulness and applicability of the method. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
375
388
Marc Aerts
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:23-402013-03-04RePEc:oup:biomet
article
Reference priors for discrete graphical models
The combination of graphical models and reference analysis represents a powerful tool for Bayesian inference in highly multivariate settings. It is typically difficult to derive reference priors in complex problems. In this paper we present a suitable mixed parameterisation for a discrete decomposable graphical model and derive the corresponding reference prior. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
23
40
http://hdl.handle.net/10.1093/biomet/93.1.23
text/html
Access to full text is restricted to subscribers.
Guido Consonni
Valentina Leucari
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:303-3162013-03-04RePEc:oup:biomet
article
Variable selection for multivariate failure time data
In this paper, we propose a penalised pseudo-partial likelihood method for variable selection with multivariate failure time data with a growing number of regression coefficients. Under certain regularity conditions, we show the consistency and asymptotic normality of the penalised likelihood estimators. We further demonstrate that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance. Based on a simple approximation of the penalty function, the proposed method can be easily carried out with the Newton--Raphson algorithm. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures. We illustrate the proposed method by analysing a dataset from the Framingham Heart Study. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
303
316
http://hdl.handle.net/10.1093/biomet/92.2.303
text/html
Access to full text is restricted to subscribers.
Jianwen Cai
Jianqing Fan
Runze Li
Haibo Zhou
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:451-4672013-03-04RePEc:oup:biomet
article
Model diagnosis for parametric regression in high-dimensional spaces
We study tools for checking the validity of a parametric regression model. When the dimension of the regressors is large, many of the existing tests face the curse of dimensionality or require some ordering of the data. Our tests are based on the residual empirical process marked by proper functions of the regressors. They are able to detect local alternatives converging to the null at parametric rates. Parametric and nonparametric alternatives are considered. In the latter case, through a proper principal component decomposition, we are able to derive smooth directional tests which are asymptotically distribution-free under the null model. The new tests take into account precisely the 'geometry of the model'. A simulation study is carried through and an application to a real dataset is illustrated. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
451
467
http://hdl.handle.net/10.1093/biomet/asm095
application/pdf
Access to full text is restricted to subscribers.
W. Stute
W. L. Xu
L. X. Zhu
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:209-2222013-03-04RePEc:oup:biomet
article
B�rmann expansion and test for additivity
We propose a Lagrange multiplier test for additivity based on the B�rmann expansion of a conditional mean function. The asymptotic null distribution of the test is shown to be x-super-2, under some regularity conditions. In contrast, the Lagrange multiplier test proposed by Chen et al. (1995) is based on the Volterra expansion of the conditional mean function. We discuss some desirable advantages of the B�rmann expansion over the Volterra expansion for nonlinear time series modelling. We also reported an empirical study which shows that, in terms of empirical power, the Lagrange multiplier test motivated by the B�rmann expansion outperforms the test of Chen et al. (1995) for the cases for which the Lagrange multiplier test is designed. For other cases for which none of the tests is specifically designed, the empirical powers of the two tests are comparable. Finally, we illustrated the use of the Lagrange multiplier test with a blowfly experimental system. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
209
222
K. S. Chan
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:283-2982013-03-04RePEc:oup:biomet
article
A flexible additive multiplicative hazard model
We present a new additive-multiplicative hazard model which consists of two components. The first component contains additive covariate effects through an additive Aalen model while the second component contains multiplicative covariate effects through a Cox regression model. The Aalen model allows for time-varying covariate effects, while the Cox model allows only a common time-dependence through the baseline. Approximate maximum likelihood estimators are derived by solving the simultaneous score equations for the nonparametric and parametric components of the model. The suggested estimators are provided with large-sample properties and are shown to be efficient. The efficient estimators depend, however, on some estimated weights. We therefore also consider unweighted estimators and describe their large-sample properties. We finally extend the model to allow for time-varying covariate effects in the multiplicative part of the model as well. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
283
298
Torben Martinussen
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:985-9912013-03-04RePEc:oup:biomet
article
Importance Sampling Via the Estimated Sampler
Monte Carlo importance sampling for evaluating numerical integration is discussed. We consider a parametric family of sampling distributions and propose the use of the sampling distribution estimated by maximum likelihood. The proposed method of importance sampling using the estimated sampling distribution is shown to improve the asymptotic variance of the ordinary method using the true sampling distribution. The argument is closely related to the discussion of the paradox in Henmi & Eguchi (2004). We focus on a condition under which the estimated integration value obtained by the proposed method has asymptotic zero variance. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
985
991
http://hdl.handle.net/10.1093/biomet/asm076
application/pdf
Access to full text is restricted to subscribers.
Masayuki Henmi
Ryo Yoshida
Shinto Eguchi
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:705-7222013-03-04RePEc:oup:biomet
article
Empirical Bayes block shrinkage of wavelet coefficients via the noncentral χ-super-2 distribution
Empirical Bayes approaches to the shrinkage of empirical wavelet coefficients have generated considerable interest in recent years. Much of the work to date has focussed on shrinkage of individual wavelet coefficients in isolation. In this paper we propose an empirical Bayes approach to simultaneous shrinkage of wavelet coefficients in a block, based on the block sum of squares. Our approach exploits a useful identity satisfied by the noncentral χ-super-2 density and provides some tractable Bayesian block shrinkage procedures. Our numerical results indicate that the new procedures perform very well. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
705
722
http://hdl.handle.net/10.1093/biomet/93.3.705
text/html
Access to full text is restricted to subscribers.
Xue Wang
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:873-8922013-03-04RePEc:oup:biomet
article
A General Approach to the Predictability Issue in Survival Analysis with Applications
Very often in survival analysis one has to study martingale integrals where the integrand is not predictable and where the counting process theory of martingales is not directly applicable, as for example in nonparametric and semiparametric applications where the integrand is based on a pilot estimate. We call this the predictability issue in survival analysis. The problem has been resolved by approximations of the integrand by predictable functions which have been justified by ad hoc procedures. We present a general approach to the solution of this problem. The usefulness of the approach is shown in three applications. In particular, we argue that earlier ad hoc procedures do not work in higher-dimensional smoothing problems in survival analysis. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
873
892
http://hdl.handle.net/10.1093/biomet/asm062
application/pdf
Access to full text is restricted to subscribers.
Enno Mammen
Jens Perch Nielsen
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:195-2092013-03-04RePEc:oup:biomet
article
Generalised likelihood ratio tests for spectral density
There are few techniques available for testing whether or not a family of parametric times series models fits a set of data reasonably well without serious restrictions on the forms of alternative models. In this paper, we consider generalised likelihood ratio tests of whether or not the spectral density function of a stationary time series admits certain parametric forms. We propose a bias correction method for the generalised likelihood ratio test of Fan et al. (2001). In particular, our methods can be applied to test whether or not a residual series is white noise. Sampling properties of the proposed tests are established. A bootstrap approach is proposed for estimating the null distribution of the test statistics. Simulation studies investigate the accuracy of the proposed bootstrap estimate and compare the power of the various ways of constructing the generalised likelihood ratio tests as well as some classic methods like the Cramer--von Mises and Ljung--Box tests. Our results favour the newly proposed bias reduction method using the local likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
195
209
Jianqing Fan
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:641-6542013-03-04RePEc:oup:biomet
article
Confidence intervals in group sequential trials with random group sizes and applications to survival analysis
A new ordering scheme for defining quantiles of the multivariate distribution of a stopping time and a stopped stochastic process is introduced. This ordering scheme is used in conjunction with resampling methods to construct confidence intervals for a population mean following a group sequential test with random group sizes, and for the regression parameter of a proportional hazards model following a time-sequential clinical trial with censored survival data. It is shown that this approach resolves the long-standing difficulties in inference due to two different time scales in time-sequential trials, and that the confidence intervals thus constructed have coverage probabilities close to the nominal values and provide marked improvements over those based on alternative ordering schemes and normal approximations. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/93.3.641
text/html
Access to full text is restricted to subscribers.
Tze Leung Lai
Wenzhi Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:732-7402013-03-04RePEc:oup:biomet
article
Rank-based regression with repeated measurements data
A rank-based regression method is proposed for repeated measurements data. It is a generalisation of the classical Wilcoxon--Mann--Whitney rank statistic for independent observations. The method is valid under a weak condition on the error terms that can accommodate certain heteroscedasticity and within-subject dependency. The asymptotic normality of the proposed estimator is proved using empirical process theory. A variance estimator, shown to be consistent, is also constructed. The proposed method is illustrated using data from a clinical trial on treating labour pain. Robustness and efficiency of the estimator is demonstrated in simulation studies. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
732
740
Sin-Ho Jung
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:769-7862013-03-04RePEc:oup:biomet
article
Bayesian Nonparametric Estimation of the Probability of Discovering New Species
We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cdna fragments. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
769
786
http://hdl.handle.net/10.1093/biomet/asm061
application/pdf
Access to full text is restricted to subscribers.
Antonio Lijoi
Ramsés H. Mena
Igor Prünster
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:167-1832013-03-04RePEc:oup:biomet
article
Inference for clustered data using the independence loglikelihood
We use the properties of independence estimating equations to adjust the 'independence' loglikelihood function in the presence of clustering. The proposed adjustment relies on the robust sandwich estimator of the parameter covariance matrix, which is easily calculated. The methodology competes favourably with established techniques based on independence estimating equations; we provide some insight as to why this is so. The adjustment is applied to examples relating to the modelling of wind speed in Europe and annual maximum temperatures in the U.K. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
167
183
http://hdl.handle.net/10.1093/biomet/asm015
application/pdf
Access to full text is restricted to subscribers.
Richard E. Chandler
Steven Bate
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:961-9772013-03-04RePEc:oup:biomet
article
Estimating the false discovery rate using the stochastic approximation algorithm
Testing of multiple hypotheses involves statistics that are strongly dependent in some applications, but most work on this subject is based on the assumption of independence. We propose a new method for estimating the false discovery rate of multiple hypothesis tests, in which the density of test scores is estimated parametrically by minimizing the Kullback--Leibler distance between the unknown density and its estimator using the stochastic approximation algorithm, and the false discovery rate is estimated using the ensemble averaging method. Our method is applicable under general dependence between test statistics. Numerical comparisons between our method and several competitors, conducted on simulated and real data examples, show that our method achieves more accurate control of the false discovery rate in almost all scenarios. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
961
977
http://hdl.handle.net/10.1093/biomet/asn036
application/pdf
Access to full text is restricted to subscribers.
Faming Liang
Jian Zhang
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:891-8982013-03-04RePEc:oup:biomet
article
Minimum aberration construction results for nonregular two-level fractional factorial designs
Nonregular two-level fractional factorial designs are designs which cannot be specified in terms of a set of defining contrasts. The aliasing properties of nonregular designs can be compared by using a generalisation of the minimum aberration criterion called minimum G-sub-2-aberration. Until now, the only nontrivial designs that are known to have minimum G-sub-2-aberration are designs for n runs and m >= n - 5 factors. In this paper, a number of construction results are presented which allow minimum G-sub-2-aberration designs to be found for many of the cases with n = 16, 24, 32, 48, 64 and 96 runs and m >= n/2 - 2 factors. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
891
898
Neil A. Butler
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:717-7232013-03-04RePEc:oup:biomet
article
Estimating subject-specific survival functions under the accelerated failure time model
We use the semiparametric accelerated failure time model to predict the survival function and its related quantities for future subjects with a given set of covariates. We derive the large-sample distribution for the subject-specific cumulative hazard function estimate. We then propose a simple resampling technique for constructing pointwise confidence intervals and simultaneous bands for the corresponding survival function and its quantile function over a properly selected time interval. The new proposals are illustrated with the Mayo primary biliary cirrhosis data. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
717
723
Yuhyun Park
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:303-3132013-03-04RePEc:oup:biomet
article
Linear life expectancy regression with censored data
In the statistical literature, life expectancy is usually characterised by the mean residual life function. Regression models are thus needed to study the association between the mean residual life functions and their covariates. In this paper, we consider a linear mean residual life model and develop inference procedures in the presence of potential censoring. The new model and inference procedures are applied to the Stanford heart transplant data. Semiparametric efficiency calculations and information bounds are also considered. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
303
313
http://hdl.handle.net/10.1093/biomet/93.2.303
text/html
Access to full text is restricted to subscribers.
Y. Q. Chen
S. Cheng
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:237-2422013-03-04RePEc:oup:biomet
article
A note on cause-specific residual life
In medical research, investigators often wish to characterize the distributions of remaining lifetimes. While nonparametric analyses of residual life distributions have been widely studied with independently right-censored data, residual life analysis has not been examined in the competing risks setting, with multiple, potentially dependent, failure types. We define the cause-specific residual life distribution as the residual cumulative incidence function conditionally on survival to a given time. Because of the improper form of the cause-specific distribution, the mean cause-specific residual lifetime does not exist, theoretically. We develop nonparametric inferences for the cause-specific residual life function and its corresponding quantiles, which may exist. Theoretical justification, including uniform consistency and weak convergence, is established. Simulation studies and a breast cancer data analysis demonstrate the practical utility of the methods. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asn063
application/pdf
Access to full text is restricted to subscribers.
J.-H. Jeong
J. P. Fine
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:111-1282013-03-04RePEc:oup:biomet
article
Varying-coefficient models and basis function approximations for the analysis of repeated measurements
A global smoothing procedure is developed using basis function approximations for estimating the parameters of a varying-coefficient model with repeated measurements. Inference procedures based on a resampling subject bootstrap are proposed to construct confidence regions and to perform hypothesis testing. Conditional biases and variances of our estimators and their asymptotic consistency are developed explicitly. Finite sample properties of our procedures are investigated through a simulation study. Application of the proposed approach is demonstrated through an example in epidemiology. In contrast to the existing methods, this approach applies whether or not the covariates are time-invariant and does not require binning of the data when observations are sparse at distinct observation times. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
111
128
Jianhua Z. Huang
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:87-992013-03-04RePEc:oup:biomet
article
The unobserved heterogeneity distribution in duration analysis
In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a general result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
87
99
http://hdl.handle.net/10.1093/biomet/asm013
application/pdf
Access to full text is restricted to subscribers.
Jaap H. Abbring
Gerard J. Van Den Berg
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:667-6782013-03-04RePEc:oup:biomet
article
Additive partial linear models with measurement errors
We consider statistical inference for additive partial linear models when the linear covariate is measured with error. We propose attenuation-to-correction and simulation-extrapolation, simex, estimators of the parameter of interest. It is shown that the first resulting estimator is asymptotically normal and requires no undersmoothing. This is an advantage of our estimator over existing backfitting-based estimators for semiparametric additive models which require undersmoothing of the nonparametric component in order for the estimator of the parametric component to be root-n consistent. This feature stems from a decrease of the bias of the resulting estimator, which is appropriately derived using a profile procedure. A similar characteristic in semiparametric partially linear models was obtained by Wang et al. (2005). We also discuss the asymptotics of the proposed simex approach. Finite-sample performance of the proposed estimators is assessed by simulation experiments. The proposed methods are applied to a dataset from a semen study. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
667
678
http://hdl.handle.net/10.1093/biomet/asn024
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Sally W. Thurston
David Ruppert
Tatiyana Apanasovich
Russ Hauser
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:213-2202013-03-04RePEc:oup:biomet
article
Fast block variance estimation procedures for inhomogeneous spatial point processes
We introduce two new variance estimation procedures that use non-overlapping and overlapping blocks, respectively. The non-overlapping blocks estimator can be viewed as the limit of the thinned block bootstrap estimator recently proposed in Guan Loh (2007), by letting the number of thinned processes and bootstrap samples therein both increase to infinity. The non-overlapping blocks estimator can be obtained quickly since it does not require any thinning or bootstrap steps, and it is more stable. The overlapping blocks estimator further improves the performance of the non-overlapping blocks with a modest increase in computation time. A simulation study demonstrates the superiority of the proposed estimators over the thinned block bootstrap estimator. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
213
220
http://hdl.handle.net/10.1093/biomet/asn072
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:849-8622013-03-04RePEc:oup:biomet
article
A semiparametric changepoint model
A semiparametric changepoint model is considered and the empirical likelihood method is applied to detect the change from a distribution to a weighted distribution in a sequence of independent random variables. The maximum likelihood changepoint estimator is shown to be consistent. The empirical likelihood ratio test statistic is proved to have the same limit null distribution as that with parametric models. A data-based test for the validity of the models is also proposed. Simulation shows the sensitivity and robustness of the semiparametric approach. The methods are applied to some classical datasets such as the Nile River data and stock price data. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
849
862
http://hdl.handle.net/10.1093/biomet/91.4.849
text/html
Access to full text is restricted to subscribers.
Zhong Guan
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:95-1062013-03-04RePEc:oup:biomet
article
Bayesian-inspired minimum aberration two- and four-level designs
Motivated by a Bayesian framework, we propose a new minimum aberration-type criterion for designing experiments with two- and four-level factors. The Bayesian approach helps in overcoming the ad hoc nature of effect ordering in the existing minimum aberration-type criteria. The approach is also capable of distinguishing between qualitative and quantitative factors. Numerous examples are given to demonstrate its advantages. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
95
106
http://hdl.handle.net/10.1093/biomet/asn062
application/pdf
Access to full text is restricted to subscribers.
V. Roshan Joseph
Mingyao AI
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:633-6452013-03-04RePEc:oup:biomet
article
A construction principle for multivariate extreme value distributions
We present a construction principle for the spectral density of a multivariate extreme value distribution. It generalizes the pairwise beta model introduced in the literature recently and may be used to obtain new parametric models from lower dimensional spectral densities. We illustrate the flexibility of this new class of models and apply it to a wind speed dataset. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
633
645
http://hdl.handle.net/10.1093/biomet/asr034
application/pdf
Access to full text is restricted to subscribers.
F. Ballani
M. Schlather
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:489-5072013-03-04RePEc:oup:biomet
article
Diagnostic measures for empirical likelihood of general estimating equations
We develop diagnostic measures for assessing the influence of individual observations when using empirical likelihood with general estimating equations, and we use these measures to construct goodness-of-fit statistics for testing possible misspecification in the estimating equations. Our diagnostics include case-deletion measures, local influence measures and pseudo-residuals. Our goodness-of-fit statistics include the sum of local influence measures and the processes of pseudo-residuals. Simulation studies are conducted to evaluate our methods, and real datasets are analyzed to illustrate the use of our diagnostic measures and goodness-of-fit statistics. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
489
507
http://hdl.handle.net/10.1093/biomet/asm094
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
Heping Zhang
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:455-4632013-03-04RePEc:oup:biomet
article
A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations
We introduce a family of multivariate binary distributions with certain conditional linear property. This family is particularly useful for efficient and easy simulation of correlated binary variables with a given marginal mean vector and correlation matrix. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
455
463
Bahjat F. Qaqish
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:721-7312013-03-04RePEc:oup:biomet
article
Nested orthogonal array-based Latin hypercube designs
We propose two methods for constructing a new type of design, called a nested orthogonal array-based Latin hypercube design, intended for multi-fidelity computer experiments. Such designs are two nested space-filling designs in which the large design achieves stratification in both bivariate and univariate margins and the small design achieves stratification in univariate margins. These designs have better space-filling properties than nested Latin hypercube designs in which the large design possesses uniformity in univariate margins only. The first method expands an ordinary Latin hypercube design to a larger design that achieves uniformity in any one- or two-dimensional projection. The second method uses an orthogonal array with strength two to simultaneously construct a pair of nested orthogonal array-based Latin hypercube designs. Examples are given to illustrate the proposed methods. Sampling properties of the proposed designs are derived. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
721
731
http://hdl.handle.net/10.1093/biomet/asr028
application/pdf
Access to full text is restricted to subscribers.
Xu He
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:738-7422013-03-04RePEc:oup:biomet
article
Measurement exchangeability and normal one-factor models
The one-factor model restricts the covariance structure of the observed variables on the basis of assumptions about their relationship with an unobserved variable. It is hard to justify these assumptions on substantive or empirical grounds. In this paper, alternative measurement models are proposed that are based on exchangeability of variables after admissible scale transformations. They provide an alternative interpretation of the model and do not involve unobserved variables. They also yield a new one-factor model for sum scales. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
738
742
Henk Kelderman
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:487-4932013-03-04RePEc:oup:biomet
article
Some results on D-optimal designs for nonlinear models with applications
Sufficient conditions are established for the locally D$-optimal design for a nonlinear model to have a minimal number of support points. The conditions are applied to obtain locally D-optimal designs for a one-compartment pharmacokinetic model and a Poisson regression model. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
487
493
http://hdl.handle.net/10.1093/biomet/asp004
application/pdf
Access to full text is restricted to subscribers.
Gang Li
Dibyen Majumdar
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:952-9572013-03-04RePEc:oup:biomet
article
On an exact probability matching property of right-invariant priors
The paper considers priors which are right invariant with respect to the Haar measure. It is shown that the posterior coverage probabilities of certain invariant Bayesian predictive regions exactly match the corresponding frequentist probabilities. Several examples are given to illustrate the main result. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
952
957
Thomas A. Severini
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:15-252013-03-04RePEc:oup:biomet
article
Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies
The natural likelihood to use for a case-control study is a 'retrospective' likelihood, i.e. a likelihood based on the probability of exposure given disease status. Prentice & Pyke (1979) showed that, when a logistic regression form is assumed for the probability of disease given exposure, the maximum likelihood estimators and asymptotic covariance matrix of the log odds ratios obtained from the retrospective likelihood are the same as those obtained from the 'prospective' likelihood, i.e. that based on probability of disease given exposure. We prove a similar result for the posterior distribution of the log odds ratios in a Bayesian analysis. This means that the Bayesian analysis of case-control studies may be done using a relatively simple model, the logistic regression model, which treats data as though generated prospectively and which does not involve nuisance parameters for the exposure distribution. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
15
25
Shaun R. Seaman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:551-5662013-03-04RePEc:oup:biomet
article
Generalised structured models
We present a general class of nonlinear regression and time series models that we call generalised structured models. The class is a natural generalisation of generalised additive models, and it includes generalised interaction models, structured volatility models, visual GARCH, generalised autoregressive conditional heteroscedasticity, models and varying coefficient models. We discuss estimation principles including smoothing splines and a generalisation of the projection approach of Mammen et al. (1999). We finish the paper with some theoretical considerations about the asymptotic performance of the estimator for the general class of generalised structured models. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
551
566
Enno Mammen
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:65-802013-03-04RePEc:oup:biomet
article
Quasi-variances
In statistical models of dependence, the effect of a categorical variable is typically described by contrasts among parameters. For reporting such effects, quasi-variances provide an economical and intuitive method which permits approximate inference on any contrast by subsequent readers. Applications include generalised linear models, generalised additive models and hazard models. The present paper exposes the generality of quasi-variances, emphasises the need to control relative errors of approximation, gives simple methods for obtaining quasi-variances and bounds on the approximation error involved, and explores the domain of accuracy of the method. Conditions are identified under which the quasi-variance approximation is exact, and numerical work indicates high accuracy in a variety of settings. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
65
80
David Firth
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:95-1122013-03-04RePEc:oup:biomet
article
Small-area estimation based on natural exponential family quadratic variance function models and survey weights
We propose pseudo empirical best linear unbiased estimators of small-area means based on natural exponential family quadratic variance function models when the basic data consist of survey-weighted estimators of these means, area-specific covariates and certain summary measures involving the weights. We also provide explicit approximate mean squared errors of these estimators in the spirit of Prasad & Rao (1990), and these estimators can be readily evaluated. A simulation study is undertaken to evaluate the performance of the proposed inferential procedure. We estimate also the proportion of poor children in the 5--17 years age-group for the different counties in one of the states in the United States. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
95
112
Malay Ghosh
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:283-3012013-03-04RePEc:oup:biomet
article
Additive hazards Markov regression models illustrated with bone marrow transplant data
When there are covariate effects to be considered, multi-state survival analysis is dominated either by parametric Markov regression models or by semiparametric Markov regression models using Cox's (1972) proportional hazards models for transition intensities between the states. The purpose of this research work is to study alternatives to Cox's model in a general finite-state Markov process setting. We shall look at two alternative models, Aalen's (1989) nonparametric additive hazards model and Lin & Ying's (1994) semiparametric additive hazards model. The former allows the effects of covariates to vary freely over time, while the latter assumes that the regression coefficients are constant over time. With the basic tools of the product integral and the functional delta-method, we present an estimator of the transition probability matrix and develop the large-sample theory for the estimator under each of these two models. Data on 1459 HLA identical sibling transplants for acute leukaemia from the International Bone Marrow Transplant Registry serve as illustration. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
283
301
http://hdl.handle.net/10.1093/biomet/92.2.283
text/html
Access to full text is restricted to subscribers.
Youyi Shu
John P. Klein
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:751-7572013-03-04RePEc:oup:biomet
article
Efficient recursions for general factorisable models
Let n S-valued categorical variables be jointly distributed according to a distribution known only up to an unknown normalising constant. For an unnormalised joint likelihood expressible as a product of factors, we give an algebraic recursion which can be used for computing the normalising constant and other summations. A saving in computation is achieved when each factor contains a lagged subset of the components combining in the joint distribution, with maximum computational efficiency as the subsets attain their minimum size. If each subset contains at most r+1 of the n components in the joint distribution, we term this a lag-r model, whose normalising constant can be computed using a forward recursion in O(S-super-r+1) computations, as opposed to O(S-super-n) for the direct computation. We show how a lag-r model represents a Markov random field and allows a neighbourhood structure to be related to the unnormalised joint likelihood. We illustrate the method by showing how the normalising constant of the Ising or autologistic model can be computed. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
751
757
R. Reeves
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:163-1772013-03-04RePEc:oup:biomet
article
Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data
Two-stage randomisation designs are useful in the evaluation of combination therapies where patients are initially randomised to an induction therapy and then, depending upon their response and consent, are randomised to a maintenance therapy. In this paper we derive the best regular asymptotically linear estimator for the survival distribution and related quantities of treatment regimes. We propose an estimator which is easily computable and is more efficient than existing estimators. Large-sample properties of the proposed estimator are derived and comparisons with other estimators are made using simulation. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
163
177
http://hdl.handle.net/10.1093/biomet/93.1.163
text/html
Access to full text is restricted to subscribers.
Abdus S. Wahed
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:249-2492013-03-04RePEc:oup:biomet
article
'Shape, Procrustes tangent projections and bilateral symmetry'
1
2005
92
March
Biometrika
249
249
http://hdl.handle.net/10.1093/biomet/92.1.249
text/html
Access to full text is restricted to subscribers.
J. T. Kent
K. V. Mardia
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:73-842013-03-04RePEc:oup:biomet
article
Confidence regions when the Fisher information is zero
We examine the asymptotic behaviour of confidence regions in identifiable one-dimensional parametric models with smooth likelihood function and information equal to zero at a critical point of the parameter space. Confidence regions are based on inversion of the likelihood ratio test statistic and of some common forms of the score and Wald test statistics. For fixed parameter values other than the critical point, all these statistics have limiting x-super-2-sub-(1) distributions, but for most of them the convergence is not uniform near the critical point. When it is not, confidence regions based on inverting the tests, using the x-super-2-sub-(1) approximation, do not asymptotically have the nominal level. The exception to this lack of locally uniform convergence occurs with the score test standardised by expected, rather than observed, information. For the regions based on the score test standardised by observed information and on the likelihood ratio test, conservative procedures that do not rely on the x-super-2-sub-(1) approximation can be developed, but they are much too conservative near the critical parameter value. The regions based on the Wald tests have asymptotic level less than �, regardless of the procedure used. Our results suggest that no procedure based solely on the likelihood function will be satisfactory. Whether or not this is the case is an open problem. A simulation study illustrates the results of this paper. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
73
84
Matteo Bottai
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:179-1952013-03-04RePEc:oup:biomet
article
A shrinkage estimator for spectral densities
We propose a shrinkage estimator for spectral densities based on a multilevel normal hierarchical model. The first level captures the sampling variability via a likelihood constructed using the asymptotic properties of the periodogram. At the second level, the spectral density is shrunk towards a parametric time series model. To avoid selecting a particular parametric model for the second level, a third level is added which induces an estimator that averages over a class of parsimonious time series models. The estimator derived from this model, the model averaged shrinkage estimator, is consistent, is shown to be highly competitive with other spectral density estimators via simulations, and is computationally inexpensive. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
179
195
http://hdl.handle.net/10.1093/biomet/93.1.179
text/html
Access to full text is restricted to subscribers.
Carsten H. Botts
Michael J. Daniels
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:599-6142013-03-04RePEc:oup:biomet
article
Testing parametric assumptions of trends of a nonstationary time series
The paper considers testing whether the mean trend of a nonstationary time series is of certain parametric forms. A central limit theorem for the integrated squared error is derived, and a hypothesis-testing procedure is proposed. The method is illustrated in a simulation study, and is applied to assess the mean pattern of lifetime-maximum wind speeds of global tropical cyclones from 1981 to 2006. We also revisit the trend pattern in the central England temperature series. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
599
614
http://hdl.handle.net/10.1093/biomet/asr017
application/pdf
Access to full text is restricted to subscribers.
Ting Zhang
Wei Biao Wu
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:331-3432013-03-04RePEc:oup:biomet
article
On semiparametric transformation cure models
A general class of semiparametric transformation cure models is studied for the analysis of survival data with long-term survivors. It combines a logistic regression for the probability of event occurrence with the class of transformation models for the time of occurrence. Included as special cases are the proportional hazards cure model (Farewell, 1982; Kuk & Chen, 1992; Sy & Taylor, 2000; Peng & Dear, 2000) and the proportional odds cure model. Generalised estimating equations are proposed for parameter estimation. It is shown that the resulting estimators are asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. Simulation studies show that the proposed approach is appropriate for practical use. An application to data from a breast cancer study is given to illustrate the methodology. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
331
343
Wenbin Lu
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:647-6622013-03-04RePEc:oup:biomet
article
Marginal methods for correlated binary data with misclassified responses
Misclassification is a longstanding concern in medical research. Although there has been much research concerning error-prone covariates, relatively little work has been directed to problems with response variables subject to error. In this paper we focus on misclassification in clustered or longitudinal outcomes. We propose marginal analysis methods to handle binary responses which are subject to misclassification. The proposed methods have several appealing features, including simultaneous inference for both marginal mean and association parameters, and they can handle misclassified responses for a number of practical scenarios, such as the case with a validation subsample or replicates. Furthermore, the proposed methods are robust to model misspecification in a sense that no full distributional assumptions are required. Numerical studies demonstrate satisfactory performance of the proposed methods under a variety of settings. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
647
662
http://hdl.handle.net/10.1093/biomet/asr035
application/pdf
Access to full text is restricted to subscribers.
Zhijian Chen
Grace Y. Yi
Changbao Wu
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:533-5492013-03-04RePEc:oup:biomet
article
Modified profile likelihoods in models with stratum nuisance parameters
It is well known, at least through many examples, that when there are many nuisance parameters modified profile likelihoods often perform much better than the profile likelihood. Ordinary asymptotics almost totally fail to deal with this issue. For this reason, we study asymptotic properties of the profile and modified profile likelihoods in models for stratified data in a two-index asymptotics setting. This means that both the sample size of the strata, m, and the dimension of the nuisance parameter, q, may increase to infinity. It is shown that in this asymptotic setting modified profile likelihoods give improvements, with respect to the profile likelihood, in terms of consistency of estimators and of asymptotic distributional properties. In particular, the modified profile likelihood based statistics have the usual asymptotic distribution, provided that 1/m = o(q-super- - 1/3), while the analogous condition for the profile likelihood is 1/m = o(q-super- - 1). Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
533
549
N. Sartori
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10262013-03-04RePEc:oup:biomet
article
Amendments and Corrections
Arising from an omitted term in a calculation in the Appendix, variance formulae in the paper should be adjusted. In particular, the constants in the numerators of equations (2·4) and (2·15) should be 6 rather than 18. Variances are, however, still higher than in the case of least-squares estimators. The changes are implied by the following corrections to the Appendix. On p. 423, 2cδΔ′-sub-cos(ω-super-(k)) should be included within braces on lines 11 and 17, and 2cδΔ′-sub-sin(ω-super-(k)) should be added within braces on lines 12 and 18, leading to the extra term 2cm-super- - 3/2{Δ′-sub-sin(ω-super-(k))Gamma-sub-sin-super-(k) + Δ′-sub-cos(ω-super-(k))Gamma-sub-cos-super-(k)} on line 21. We are grateful to Barry Quinn for drawing our attention to this error. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1026
http://hdl.handle.net/10.1093/biomet/93.4.1025-b
text/html
Access to full text is restricted to subscribers.
Peter Hall
Ming Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:597-6112013-03-04RePEc:oup:biomet
article
Inference about a secondary process following a sequential trial
We consider the following sequential testing problem. A group-sequential or fully-sequential test is carried out for a primary parameter, using a score process or an effective score process to eliminate nuisance parameters. After stopping, the possibility of additional parameters is considered, and appropriate tests and estimators are desired that recognise the sequential stopping rule. We formulate an asymptotic multi-dimensional Gaussian process form of such problems, and then construct tests and confidence procedures. Optimality conditions are given, and an example is summarised. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
597
611
W. J. Hall
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:37-472013-03-04RePEc:oup:biomet
article
Graphical identifiability criteria for causal effects in studies with an unobserved treatment/response variable
We consider the problem of using data in studies with an unobserved treatment/response variable in order to evaluate average causal effects, when cause-effect relationships between variables can be described by a directed acyclic graph and the corresponding recursive factorization of a joint distribution. The paper proposes graphical criteria to test whether average causal effects are identifiable even if a treatment/response variable is unobserved. If the answer is affirmative, we provide further formulations for average causal effects from the observed data. The graphical criteria enable us to evaluate average causal effects when it is difficult to observe a treatment/response variable. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
37
47
http://hdl.handle.net/10.1093/biomet/asm005
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:1002-10052013-03-04RePEc:oup:biomet
article
On an internal method for deriving a summary measure
Some preliminary comments are made about the reasons for combining component observations into composite or derived variables. A method for forming derived variables sensitive to specified changes in the underlying multivariate distribution is described and illustrated by an issue in a study of animal pathology. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asn040
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:471-4772013-03-04RePEc:oup:biomet
article
Estimating ordered binomial proportions with the use of group testing
This paper considers group testing when the probability of response is increasing across the levels of an observed covariate. We illustrate how previously known results in order-restricted inference can be extended to situations wherein data are collected according to a group-testing protocol, and we derive maximum likelihood estimators for proportions under the increasing order restriction and group-testing model. Finally, we show how the use of group testing can dramatically reduce the bias and mean squared error of isotonic regression estimators obtained from one-at-a-time testing. These proposed methods are illustrated using data from an observational HIV study conducted in Houston, Texas. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
471
477
Joshua M. Tebbs
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:821-8302013-03-04RePEc:oup:biomet
article
Forward adaptive banding for estimating large covariance matrices
We propose a simple forward adaptive banding method for estimating large covariance matrices using the modified Cholesky decomposition. This approach requires the fitting of a prespecified set of models due to the adaptive banding structure and can be efficiently implemented. Aside from its computational attractiveness, we propose a novel Bayes information criterion that gives consistent model selection for estimating high dimensional covariance matrices. The method compares favourably to its competitors in simulation study. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
821
830
http://hdl.handle.net/10.1093/biomet/asr045
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Bo Li
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:213-2272013-03-04RePEc:oup:biomet
article
Exploiting occurrence times in likelihood inference for componentwise maxima
Multivariate extreme value distributions arise as the limiting distributions of normalised componentwise maxima. They are often used to model multivariate data that can be regarded as the componentwise maxima of some unobserved underlying multivariate process. In many applications we have extra information. We often know the locations of the maxima within the underlying process. If the process is temporal this knowledge is frequently available through the dates on which the maxima are recorded. We show how to incorporate this extra information into maximum likelihood procedures. Asymptotic and small-sample efficiency results are presented for the dependence parameter in the logistic parametric sub-class of bivariate extreme value distributions. We conclude with an application to sea levels. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
213
227
http://hdl.handle.net/10.1093/biomet/92.1.213
text/html
Access to full text is restricted to subscribers.
Alec Stephenson
Jonathan Tawn
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:555-5712013-03-04RePEc:oup:biomet
article
Using calibration weighting to adjust for nonresponse under a plausible model
When we estimate the population total for a survey variable or variables, calibration forces the weighted estimates of certain covariates to match known or alternatively estimated population totals called benchmarks. Calibration can be used to correct for sample-survey nonresponse, or for coverage error resulting from frame undercoverage or unit duplication. The quasi-randomization theory supporting its use in nonresponse adjustment treats response as an additional phase of random sampling. The functional form of a quasi-random response model is assumed to be known, its parameter values estimated implicitly through the creation of calibration weights. Unfortunately, calibration depends upon known benchmark totals while the covariates in a plausible model for survey response may not be the benchmark covariates. Moreover, it may be prudent to keep the number of covariates in a response model small. We use calibration to adjust for nonresponse when the benchmark model and covariates may differ, provided the number of the former is at least as great as that of the latter. We discuss the estimation of a total for a vector of survey variables that do not include the benchmark covariates, but that may include some of the model covariates. We show how to measure both the additional asymptotic variance due to the nonresponse in a calibration-weighted estimator and the full asymptotic variance of the estimator itself. All variances are determined with respect to the randomization mechanism used to select the sample, the response model generating the subset of sample respondents, or both. Data from the U.S. National Agricultural Statistical Service's 2002 Census of Agriculture and simulations are used to illustrate alternative adjustments for nonresponse. The paper concludes with some remarks about adjustment for coverage error. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
555
571
http://hdl.handle.net/10.1093/biomet/asn022
application/pdf
Access to full text is restricted to subscribers.
Ted Chang
Phillip S. Kott
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:229-2362013-03-04RePEc:oup:biomet
article
A note on profile likelihood for exponential tilt mixture models
Suppose that independent observations are drawn from multiple distributions, each of which is a mixture of two component distributions such that their log density ratio satisfies a linear model with a slope parameter and an intercept parameter. Inference for such models has been studied using empirical likelihood, and mixed results have been obtained. The profile empirical likelihood of the slope and intercept has an irregularity at the null hypothesis so that the two component distributions are equal. We derive a profile empirical likelihood and maximum likelihood estimator of the slope alone, and obtain the usual asymptotic properties for the estimator and the likelihood ratio statistic regardless of the null. Furthermore, we show the maximum likelihood estimator of the slope and intercept jointly is consistent and asymptotically normal regardless of the null. At the null, the joint maximum likelihood estimator falls along a straight line through the origin with perfect correlation asymptotically to the first order. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
229
236
http://hdl.handle.net/10.1093/biomet/asn059
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:724-7272013-03-04RePEc:oup:biomet
article
Identifiability and censored data
It is well known that, without the assumption of independence between two nonnegative random variables X and Y, the survival function of X is not identifiable on the basis of the joint distribution function of Z = min(X, Y) and &dgr; = I(Z = Y). In this paper, we provide a simple condition in the form of conditional distribution of Y given X. We show that our condition is equivalent to the constant-sum condition proposed by Williams & Lagakos (1977). As a result the survival function of X can be identified from the joint distribution of Z and &dgr; and the Kaplan--Meier estimator with Greenwood's formula for its variance remains valid. Examples which satisfy the condition are given. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
724
727
Nader Ebrahimi
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:559-5712013-03-04RePEc:oup:biomet
article
Locally-efficient robust estimation of haplotype-disease association in family-based studies
Modelling human genetic variation is critical to understanding the genetic basis of complex disease. The Human Genome Project has discovered millions of binary DNA sequence variants, called single nucleotide polymorphisms, and millions more may exist. As coding for proteins takes place along chromosomes, organisation of polymorphisms along each chromosome, the haplotype phase structure, may prove to be most important in discovering genetic variants associated with disease. As haplotype phase is often uncertain, procedures that model the distribution of parental haplotypes can, if this distribution is misspecified, lead to substantial bias in parameter estimates even when complete genotype information is available. Using a geometric approach to estimation in the presence of nuisance parameters, we address this problem and develop locally-efficient estimators of the effect of haplotypes on disease that are robust to incorrect estimates of haplotype frequencies. The methods are demonstrated with a simulation study of a case-parent design. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
559
571
http://hdl.handle.net/10.1093/biomet/92.3.559
text/html
Access to full text is restricted to subscribers.
Andrew S. Allen
Glen A. Satten
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:679-6942013-03-04RePEc:oup:biomet
article
Improving the efficiency of the log-rank test using auxiliary covariates
Under the assumption of proportional hazards, the log-rank test is optimal for testing the null hypothesis , where denotes the logarithm of the hazard ratio. However, if there are additional covariates that correlate with survival times, making use of their information will increase the efficiency of the log-rank test. We apply the theory of semiparametrics to characterize a class of regular and asymptotically linear estimators for when auxiliary covariates are incorporated into the model, and derive estimators that are more efficient. The Wald tests induced by these estimators are shown to be more powerful than the log-rank test. Simulation studies are used to illustrate the gains in efficiency. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
679
694
http://hdl.handle.net/10.1093/biomet/asn003
application/pdf
Access to full text is restricted to subscribers.
Xiaomin Lu
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:19-362013-03-04RePEc:oup:biomet
article
Efficient nonparametric estimation of causal effects in randomized trials with noncompliance
Causal approaches based on the potential outcome framework provide a useful tool for addressing noncompliance problems in randomized trials. We propose a new estimator of causal treatment effects in randomized clinical trials with noncompliance. We use the empirical likelihood approach to construct a profile random sieve likelihood and take into account the mixture structure in outcome distributions, so that our estimator is robust to parametric distribution assumptions and provides substantial finite-sample efficiency gains over the standard instrumental variable estimator. Our estimator is asymptotically equivalent to the standard instrumental variable estimator, and it can be applied to outcome variables with a continuous, ordinal or binary scale. We apply our method to data from a randomized trial of an intervention to improve the treatment of depression among depressed elderly patients in primary care practices. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
19
36
http://hdl.handle.net/10.1093/biomet/asn056
application/pdf
Access to full text is restricted to subscribers.
Jing Cheng
Dylan S. Small
Zhiqiang Tan
Thomas R. Ten Have
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:921-9372013-03-04RePEc:oup:biomet
article
Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data
A semiparametric regression model for longitudinal data is considered. The empirical likelihood method is used to estimate the regression coefficients and the baseline function, and to construct confidence regions and intervals. It is proved that the maximum empirical likelihood estimator of the regression coefficients achieves asymptotic efficiency and the estimator of the baseline function attains asymptotic normality when a bias correction is made. Two calibrated empirical likelihood approaches to inference for the baseline function are developed. We propose a groupwise empirical likelihood procedure to handle the inter-series dependence for the longitudinal semiparametric regression model, and employ bias correction to construct the empirical likelihood ratio functions for the parameters of interest. This leads us to prove a nonparametric version of Wilks' theorem. Compared with methods based on normal approximations, the empirical likelihood does not require consistent estimators for the asymptotic variance and bias. A simulation compares the empirical likelihood and normal-based methods in terms of coverage accuracies and average areas/lengths of confidence regions/intervals. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
921
937
http://hdl.handle.net/10.1093/biomet/asm066
application/pdf
Access to full text is restricted to subscribers.
Liugen Xue
Lixing Zhu
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:927-9412013-03-04RePEc:oup:biomet
article
Modelling of covariance structures in generalised estimating equations for longitudinal data
When used for modelling longitudinal data generalised estimating equations specify a working structure for the within-subject covariance matrices, aiming to produce efficient parameter estimators. However, misspecification of the working covariance structure may lead to a large loss of efficiency of the estimators of the mean parameters. In this paper we propose an approach for joint modelling of the mean and covariance structures of longitudinal data within the framework of generalised estimating equations. The resulting estimators for the mean and covariance parameters are shown to be consistent and asymptotically Normally distributed. Real data analysis and simulation studies show that the proposed approach yields e?cient estimators for both the mean and covariance parameters. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
927
941
http://hdl.handle.net/10.1093/biomet/93.4.927
text/html
Access to full text is restricted to subscribers.
Huajun Ye
Jianxin Pan
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:105-1182013-03-04RePEc:oup:biomet
article
Theory for penalised spline regression
Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rateestablished by Stone (1982). Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
105
118
http://hdl.handle.net/10.1093/biomet/92.1.105
text/html
Access to full text is restricted to subscribers.
Peter Hall
J. D. Opsomer
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:1-132013-03-04RePEc:oup:biomet
article
Nonparametric estimation in nonlinear mixed effects models
A nonparametric approach is developed herein to estimate parameters in nonlinear mixed effects models. Asymptotic properties of the nonparametric maximum likelihood estimators and associated computational algorithms are provided. Empirical Bayes estimators of functionals of the random effects are also developed. Applications to population pharmacokinetics are given. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
1
13
Tze Leung Lai
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:519-5282013-03-04RePEc:oup:biomet
article
A note on composite likelihood inference and model selection
A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
519
528
http://hdl.handle.net/10.1093/biomet/92.3.519
text/html
Access to full text is restricted to subscribers.
Cristiano Varin
Paolo Vidoni
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:553-5662013-03-04RePEc:oup:biomet
article
Bayesian analysis of covariance matrices and dynamic models for longitudinal data
Parsimonious modelling of the within-subject covariance structure while heeding its positive-definiteness is of great importance in the analysis of longitudinal data. Using the Cholesky decomposition and the ensuing unconstrained and statistically meaningful reparameterisation, we provide a convenient and intuitive framework for developing conditionally conjugate prior distributions for covariance matrices and show their connections with generalised inverse Wishart priors. Our priors offer many advantages with regard to elicitation, positive definiteness, computations using Gibbs sampling, shrinking covariances toward a particular structure with considerable flexibility, and modelling covariances using covariates. Bayesian estimation methods are developed and the results are compared using two simulation studies. These simulations suggest simpler and more suitable priors for the covariance structure of longitudinal data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
553
566
Michael J. Daniels
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:135-1522013-03-04RePEc:oup:biomet
article
Extending conventional priors for testing general hypotheses in linear models
We consider that observations come from a general normal linear model and that it is desirable to test a simplifying null hypothesis about the parameters. We approach this problem from an objective Bayesian, model-selection perspective. Crucial ingredients for this approach are 'proper objective priors' to be used for deriving the Bayes factors. Jeffreys-Zellner-Siow priors have good properties for testing null hypotheses defined by specific values of the parameters in full-rank linear models. We extend these priors to deal with general hypotheses in general linear models, not necessarily of full rank. The resulting priors, which we call 'conventional priors', are expressed as a generalization of recently introduced 'partially informative distributions'. The corresponding Bayes factors are fully automatic, easily computed and very reasonable. The methodology is illustrated for the change-point problem and the equality of treatments effects problem. We compare the conventional priors derived for these problems with other objective Bayesian proposals like the intrinsic priors. It is concluded that both priors behave similarly although interesting subtle differences arise. We adapt the conventional priors to deal with nonnested model selection as well as multiple-model comparison. Finally, we briefly address a generalization of conventional priors to nonnormal scenarios. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
135
152
http://hdl.handle.net/10.1093/biomet/asm014
application/pdf
Access to full text is restricted to subscribers.
M.J. Bayarri
Gonzalo García-Donato
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:435-4442013-03-04RePEc:oup:biomet
article
Closed-form likelihoods for Arnason--Schwarz models
We provide a general framework for the computationally efficient analysis, both Bayesian and classical, of integrated multi-site recovery/recapture models in the presence of individual-level covariates by extending the basic Arnason--Schwarz models and deriving closed-form likelihood expressions, together with corresponding sufficient statistics. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
435
444
R. King
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:75-922013-03-04RePEc:oup:biomet
article
Predicting future responses based on possibly mis-specified working models
Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the 𝒦-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
75
92
http://hdl.handle.net/10.1093/biomet/asm078
application/pdf
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
Scott D. Solomon
L.J. Wei
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:41-522013-03-04RePEc:oup:biomet
article
Efficient Bayes factor estimation from the reversible jump output
We propose a class of estimators of the Bayes factor which is based on an extension of the bridge sampling identity of Meng & Wong (1996) and makes use of the output of the reversible jump algorithm of Green (1995). Within this class we give the optimal estimator and also a suboptimal one which may be simply computed on the basis of the acceptance probabilities used within the reversible jump algorithm for jumping between models. The proposed estimators are very easily computed and lead to a substantial gain of efficiency in estimating the Bayes factor over the standard estimator based on the reversible jump output. This is illustrated through a series of Monte Carlo simulations involving a linear and a logistic regression model. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
41
52
http://hdl.handle.net/10.1093/biomet/93.1.41
text/html
Access to full text is restricted to subscribers.
Francesco Bartolucci
Luisa Scaccia
Antonietta Mira
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:63-742013-03-04RePEc:oup:biomet
article
Shared parameter models under random effects misspecification
A common objective in longitudinal studies is the investigation of the association structure between a longitudinal response process and the time to an event of interest. An attractive paradigm for the joint modelling of longitudinal and survival processes is the shared parameter framework, where a set of random effects is assumed to induce their interdependence. In this work, we propose an alternative parameterization for shared parameter models and investigate the effect of misspecifying the random effects distribution in the parameter estimates and their standard errors. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
63
74
http://hdl.handle.net/10.1093/biomet/asm087
application/pdf
Access to full text is restricted to subscribers.
Dimitris Rizopoulos
Geert Verbeke
Geert Molenberghs
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:787-7992013-03-04RePEc:oup:biomet
article
Symmetric diagnostics for the analysis of the residuals in regression models
Typical alternative hypotheses in the analysis of residuals of a standard regression model are considered, and for each one a Bayesian diagnostic based on a symmetric form of the Kullback--Leibler divergence is determined. The results include an explicit expression for the diagnostic when the alternative hypothesis is that the errors are generated by an unknown distribution function with a Dirichlet process prior. This expression is immediately interpretable, exactly computable and endowed with important asymptotic connections. A linear approximation of the diagnostic reveals close links with the class of Lagrange multiplier test statistics. When the alternative hypothesis is that the errors are generated by an autoregressive process the linear approximation is proportional to the Box--Pierce statistic or to the Ljung--Box statistic, according to the characteristics of the prior, if the observations have zero mean; it depends on the Durbin--Watson statistic if the errors are first-order autoregressive, and it is related to the Cliff--Ord statistic if they are generated by a first-order spatial autoregression. The sensitivity to the prior of the diagnostic and of its linear approximation is also discussed. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
787
799
http://hdl.handle.net/10.1093/biomet/92.4.787
text/html
Access to full text is restricted to subscribers.
Cinzia Carota
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:17-332013-03-04RePEc:oup:biomet
article
Distortion of effects caused by indirect confounding
Undetected confounding may severely distort the effect of an explanatory variable on a response variable, as defined by a stepwise data-generating process. The best known type of distortion, which we call direct confounding, arises from an unobserved explanatory variable common to a response and its main explanatory variable of interest. It is relevant mainly for observational studies, since it is avoided by successful randomization. By contrast, indirect confounding, which we identify in this paper, is an issue also for intervention studies. For general stepwise-generating processes, we provide matrix and graphical criteria to decide which types of distortion may be present, when they are absent and how they are avoided. We then turn to linear systems without other types of distortion, but with indirect confounding. For such systems, the magnitude of distortion in a least-squares regression coefficient is derived and shown to be estimable, so that it becomes possible to recover the effect of the generating process from the distorted coefficient. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
17
33
http://hdl.handle.net/10.1093/biomet/asm092
application/pdf
Access to full text is restricted to subscribers.
Nanny Wermuth
D. R. Cox
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:133-1482013-03-04RePEc:oup:biomet
article
Model checking in regression via dimension reduction
Lack-of-fit checking for parametric and semiparametric models is essential in reducing misspecification. The efficiency of most existing model-checking methods drops rapidly as the dimension of the covariates increases. We propose to check a model by projecting the fitted residuals along a direction that adapts to the systematic departure of the residuals from the desired pattern. Consistency of the method is proved for parametric and semiparametric regression models. A bootstrap implementation is also discussed. Simulation comparisons with several existing methods are made, suggesting that the proposed methods are more efficient than the existing methods when the dimension increases. Air pollution data from Chicago are used to illustrate the procedure. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
133
148
http://hdl.handle.net/10.1093/biomet/asn074
application/pdf
Access to full text is restricted to subscribers.
Yingcun Xia
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:49-602013-03-04RePEc:oup:biomet
article
Fuzzy p-values in latent variable problems
We consider the problem of testing a statistical hypothesis where the scientifically meaningful test statistic is a function of latent variables. In particular, we consider detection of genetic linkage, where the latent variables are patterns of inheritance at specific genome locations. Introduced by Geyer & Meeden (2005), fuzzy p-values are random variables, described by their probability distributions, that are interpreted as p-values. For latent variable problems, we introduce the notion of a fuzzy p-value as having the conditional distribution of the latent p-value given the observed data, where the latent p-value is the random variable that would be the p-value if the latent variables were observed.The fuzzy p-value provides an exact test using two sets of simulations of the latent variables under the null hypothesis, one unconditional and the other conditional on the observed data. It provides not only an expression of the strength of the evidence against the null hypothesis but also an expression of the uncertainty in that expression owing to lack of knowledge of the latent variables. We illustrate these features with an example of simulated data mimicking a real example of the detection of genetic linkage. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
49
60
http://hdl.handle.net/10.1093/biomet/asm001
application/pdf
Access to full text is restricted to subscribers.
Elizabeth A. Thompson
Charles J. Geyer
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:248-2522013-03-04RePEc:oup:biomet
article
Testing hypotheses in order
In certain circumstances, one wishes to test one hypothesis only if certain other hypotheses have been rejected. This ordering of hypotheses simplifies the task of controlling the probability of rejecting any true hypothesis. In an example from an observational study, a treated group is shown to be further from both of two control groups than the two control groups are from each other. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
248
252
http://hdl.handle.net/10.1093/biomet/asm085
application/pdf
Access to full text is restricted to subscribers.
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:709-7192013-03-04RePEc:oup:biomet
article
The Benjamini--Hochberg method with infinitely many contrasts in linear models
Benjamini and Hochberg's method for controlling the false discovery rate is applied to the problem of testing infinitely many contrasts in linear models. Exact, easily calculated critical values are derived, defining a new multiple comparisons method for testing contrasts in linear models. The method is adaptive, depending on the data through the F-statistic, like the Waller--Duncan Bayesian multiple comparisons method. Comparisons with Scheffé's method are given, and the method is extended to the simultaneous confidence intervals of Benjamini and Yekutieli. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
709
719
http://hdl.handle.net/10.1093/biomet/asn033
application/pdf
Access to full text is restricted to subscribers.
Peter H. Westfall
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:53-642013-03-04RePEc:oup:biomet
article
Latent-model robustness in structural measurement error models
We present methods for diagnosing the effects of model misspecification of the true-predictor distribution in structural measurement error models. We first formulate latent-model robustness theoretically. Then we provide practical techniques for examining the adequacy of an assumed latent predictor model. The methods are illustrated via analytical examples, application to simulated data and with data from a study of coronary heart disease. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
53
64
http://hdl.handle.net/10.1093/biomet/93.1.53
text/html
Access to full text is restricted to subscribers.
Xianzheng Huang
Leonard A. Stefanski
Marie Davidian
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:861-8752013-03-04RePEc:oup:biomet
article
Covariate selection for the nonparametric estimation of an average treatment effect
Observational studies in which the effect of a nonrandomized treatment on an outcome of interest is estimated are common in domains such as labour economics and epidemiology. Such studies often rely on an assumption of unconfounded treatment when controlling for a given set of observed pre-treatment covariates. The choice of covariates to control in order to guarantee unconfoundedness should primarily be based on subject matter theories, although the latter typically give only partial guidance. It is tempting to include many covariates in the controlling set to try to make the assumption of an unconfounded treatment realistic. Including unnecessary covariates is suboptimal when the effect of a binary treatment is estimated nonparametrically. For instance, when using a n-super-1/2-consistent estimator, a loss of efficiency may result from using covariates that are irrelevant for the unconfoundedness assumption. Moreover, bias may dominate the variance when many covariates are used. Embracing the Neyman--Rubin model typically used in conjunction with nonparametric estimators of treatment effects, we characterize subsets from the original reservoir of covariates that are minimal in the sense that the treatment ceases to be unconfounded given any proper subset of these minimal sets. These subsets of covariates are shown to be identified under mild assumptions. These results lead us to propose data-driven algorithms for the selection of minimal sets of covariates. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
861
875
http://hdl.handle.net/10.1093/biomet/asr041
application/pdf
Access to full text is restricted to subscribers.
Xavier De Luna
Ingeborg Waernbaum
Thomas S. Richardson
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:491-5152013-03-04RePEc:oup:biomet
article
Uniform consistency in causal inference
There is a long tradition of representing causal relationships by directed acyclic graphs (Wright, 1934). Spirtes (1994), Spirtes et al. (1993) and Pearl & Verma (1991) describe procedures for inferring the presence or absence of causal arrows in the graph even if there might be unobserved confounding variables, and/or an unknown time order, and that under weak conditions, for certain combinations of directed acyclic graphs and probability distributions, are asymptotically, in sample size, consistent. These results are surprising since they seem to contradict the standard statistical wisdom that consistent estimators of causal effects do not exist for nonrandomised studies if there are potentially unobserved confounding variables. We resolve the apparent incompatibility of these views by closely examining the asymptotic properties of these causal inference procedures. We show that the asymptotically consistent procedures are 'pointwise consistent', but 'uniformly consistent' tests do not exist. Thus, no finite sample size can ever be guaranteed to approximate the asymptotic results. We also show the nonexistence of valid, consistent confidence intervals for causal effects and the nonexistence of uniformly consistent point estimators. Our results make no assumption about the form of the tests or estimators. In particular, the tests could be classical independence tests, they could be Bayes tests or they could be tests based on scoring methods such as BIC or AIC. The implications of our results for observational studies are controversial and are discussed briefly in the last section of the paper. The results hinge on the following fact: it is possible to find, for each sample size n, distributions P and Q such that P and Q are empirically indistinguishable and yet P and Q correspond to different causal effects. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
491
515
James M. Robins
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:939-9462013-03-04RePEc:oup:biomet
article
Forward search added-variable t-tests and the effect of masked outliers on model selection
Monitoring the t-tests for individual regression coefficients in 'forward' search fails to identify the importance of observations to the significance of the individual regressors. This failure is due to the ordering of the data by the search. We introduce an added-variable test which has the desired properties since the projection leading to residuals destroys the effect of the ordering. An example illustrates the effect of several masked outliers on model selection. Comments are given on the related test for response transformations. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
939
946
Anthony C. Atkinson
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:831-8442013-03-04RePEc:oup:biomet
article
Nonparametric estimation of large covariance matrices of longitudinal data
Estimation of an unstructured covariance matrix is difficult because of its positive-definiteness constraint. This obstacle is removed by regressing each variable on its predecessors, so that estimation of a covariance matrix is shown to be equivalent to that of estimating a sequence of varying-coefficient and varying-order regression models. Our framework is similar to the use of increasing-order autoregressive models in approximating the covariance matrix or the spectrum of a stationary time series. As an illustration, we adopt Fan & Zhang's (2000) two-step estimation of functional linear models and propose nonparametric estimators of covariance matrices which are guaranteed to be positive definite. For parsimony a suitable order for the sequence of (auto)regression models is found using penalised likelihood criteria like AIC and BIC. Some asymptotic results for the local polynomial estimators of components of a covariance matrix are established. Two longitudinal datasets are analysed to illustrate the methodology. A simulation study reveals the advantage of the nonparametric covariance estimator over the sample covariance matrix for large covariance matrices. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
831
844
Wei Biao Wu
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:961-9722013-03-04RePEc:oup:biomet
article
Isotonic logistic discrimination
We propose an isotonic logistic discrimination procedure which generalises linear logistic discrimination by allowing linear boundaries to be more flexibly shaped as monotone functions of the discriminant variables. Under each of three familiar sampling schemes for obtaining a training dataset, namely prospective, mixture and retrospective, we provide the corresponding likelihood-based inference. An application to a cancer study is given. In addition, we consider theoretical comparisons of our method with two recent algorithmic monotone discrimination procedures. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
961
972
http://hdl.handle.net/10.1093/biomet/93.4.961
text/html
Access to full text is restricted to subscribers.
Sungyoung Auh
Allan R. Sampson
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:911-9262013-03-04RePEc:oup:biomet
article
A functional-based distribution diagnostic for a linear model with correlated outcomes
In this paper we present an easy-to-implement graphical distribution diagnostic for linear models with correlated errors. Houseman et al. (2004) constructed quantile--quantile plots for the marginal residuals of such models, suitably transformed. We extend the pointwise asymptotic theory to address the global stochastic behaviour of the corresponding empirical cumulative distribution function, and describe a simulation technique that serves as a computationally efficient parametric bootstrap for generating representatives of its stochastic limit. Thus, continuous functionals of the empirical cumulative distribution function may be used to form global tests of normality. Through the use of projection matrices, we generalised our methods to include tests that are directed at assessing the normality of particular components of the error. Thus, tests proposed by Lange & Ryan (1989) follow as a special case. Our method works well both for models having independent units of sampling and for those in which all observations are correlated. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
911
926
http://hdl.handle.net/10.1093/biomet/93.4.911
text/html
Access to full text is restricted to subscribers.
E. Andres Houseman
Brent A. Coull
Louise M. Ryan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:249-2652013-03-04RePEc:oup:biomet
article
An asymptotic theory for model selection inference in general semiparametric problems
Hjort & Claeskens (2003) developed an asymptotic theory for model selection, model averaging and subsequent inference using likelihood methods in parametric models, along with associated confidence statements. In this article, we consider a semiparametric version of this problem, wherein the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model. We show that all the results of Hjort & Claeskens hold in the semiparametric context, if the Fisher information matrix for parametric models is replaced by the semiparametric information bound for semiparametric models, and if maximum likelihood estimators for parametric models are replaced by semiparametric efficient profile estimators. Our methods of proof employ Le Cam's contiguity lemmas, leading to transparent results. The results also describe the behaviour of semiparametric model estimators when the parametric component is misspecified, and also have implications for pointwise-consistent model selectors. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
249
265
http://hdl.handle.net/10.1093/biomet/asm034
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Raymond J. Carroll
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:943-9592013-03-04RePEc:oup:biomet
article
Multi-level modelling under informative sampling
We consider a model-dependent approach for multi-level modelling that accounts for informative probability sampling of first- and lower-level population units. The proposed approach consists of first extracting the hierarchical model holding for the sample data given the selected sample, as a function of the corresponding population model and the first- and lower-level sample selection probabilities, and then fitting the resulting sample model using Bayesian methods. An important implication of the use of the model holding for the sample is that the sample selection probabilities feature in the analysis as additional data that possibly strengthen the estimators. A simulation experiment is carried out in order to study the performance of this approach and compare it to the use of 'design-based' methods. The simulation study indicates that both approaches perform in general equally well in terms of point estimation, but the model-dependent approach yields confidence/credibility intervals with better coverage properties. Another simulation study assesses the impact of misspecification of the models assumed for the sample selection probabilities. The use of maximum likelihood estimation is also considered. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
943
959
http://hdl.handle.net/10.1093/biomet/93.4.943
text/html
Access to full text is restricted to subscribers.
Danny Pfeffermann
Fernando Antonio Da Silva Moura
Pedro Luis Do Nascimento Silva
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:65-742013-03-04RePEc:oup:biomet
article
Using intraslice covariances for improved estimation of the central subspace in regression
Popular methods for estimating the central subspace in regression require slicing a continuous response. However, slicing can result in loss of information and in some cases that loss can be substantial. We use intraslice covariances to construct improved inference methods for the central subspace. These methods are optimal within a class of quadratic inference functions and permit chi-squared tests of conditional independence hypotheses involving the predictors. Our experience gained through simulation is that the new method is never worse than existing methods, and can be substantially better. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
65
74
http://hdl.handle.net/10.1093/biomet/93.1.65
text/html
Access to full text is restricted to subscribers.
R. Dennis Cook
Liqiang Ni
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:937-9502013-03-04RePEc:oup:biomet
article
Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation
A traditional approach to statistical inference is to identify the true or best model first with little or no consideration of the specific goal of inference in the model identification stage. Can the pursuit of the true model also lead to optimal regression estimation? In model selection, it is well known that BIC is consistent in selecting the true model, and AIC is minimax-rate optimal for estimating the regression function. A recent promising direction is adaptive model selection, in which, in contrast to AIC and BIC, the penalty term is data-dependent. Some theoretical and empirical results have been obtained in support of adaptive model selection, but it is still not clear if it can really share the strengths of AIC and BIC. Model combining or averaging has attracted increasing attention as a means to overcome the model selection uncertainty. Can Bayesian model averaging be optimal for estimating the regression function in a minimax sense? We show that the answers to these questions are basically in the negative: for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
937
950
http://hdl.handle.net/10.1093/biomet/92.4.937
text/html
Access to full text is restricted to subscribers.
Yuhong Yang
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:1001-10062013-03-04RePEc:oup:biomet
article
Empirical likelihood and quantile regression in longitudinal data analysis
We propose a novel quantile regression approach for longitudinal data analysis which naturally incorporates auxiliary information from the conditional mean model to account for within-subject correlations. The efficiency gain is quantified theoretically and demonstrated empirically via simulation studies and the analysis of a real dataset. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
1001
1006
http://hdl.handle.net/10.1093/biomet/asr050
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:747-7582013-03-04RePEc:oup:biomet
article
Conditional properties of unconditional parametric bootstrap procedures for inference in exponential families
Higher-order inference about a scalar parameter in the presence of nuisance parameters can be achieved by bootstrapping, in circumstances where the parameter of interest is a component of the canonical parameter in a full exponential family. The optimal test, which is approximated, is a conditional one based on conditioning on the sufficient statistic for the nuisance parameter. A bootstrap procedure that ignores the conditioning is shown to have desirable conditional properties in providing third-order relative accuracy in approximation of p-values associated with the optimal test, in both continuous and discrete models. The bootstrap approach is equivalent to third-order analytical approaches, and is demonstrated in a number of examples to give very accurate approximations even for very small sample sizes. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
747
758
http://hdl.handle.net/10.1093/biomet/asn011
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
G. Alastair Young
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:831-8432013-03-04RePEc:oup:biomet
article
The Aalen additive gamma frailty hazards model
In this paper, we consider clustered right-censored time-to-event data. Such data can be analysed either using a marginal model if one is interested in population effects or using so-called frailty models if one is interested in covariate effects on the individual level and in estimation of correlation. The Cox frailty model has been studied extensively in the last decade or so and estimation techniques and large sample results are now available. It is, however, difficult to deal with time-changing covariate effects when using the Cox model. An appealing alternative model is the Aalen additive hazards model, in which it is easy to work with time dynamics. In this paper, we describe an innovative approach to estimation in the Aalen additive gamma frailty hazards model. We give the large sample properties of the estimators and investigate their small sample properties by Monte Carlo simulation. A real example is provided for illustration. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
831
843
http://hdl.handle.net/10.1093/biomet/asr049
application/pdf
Access to full text is restricted to subscribers.
Torben Martinussen
Thomas H. Scheike
David M. Zucker
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:45-632013-03-04RePEc:oup:biomet
article
Bayesian criterion based model assessment for categorical data
We propose a general Bayesian criterion for model assessment for categorical data called the weighted L measure, which is constructed from the posterior predictive distribution of the data. The measure is based on weighting the observations according to the sampling variance of their future response vector. The weight component in the weighted L measure plays the role of a penalty term in the criterion, in which a greater weight assigned to covariate values implies a greater penalty term on the dimension of the model. A detailed justification is provided for such a weighting procedure and several theoretical properties of the weighted L measure are presented for a wide variety of discrete data models. For these models, we examine properties of the weighted L measure, and show that it can perform better than the unweighted L measure in a variety of settings. In addition, we show that the weighted quadratic loss L measure is more attractive than the unweighted L measure and the deviance loss L measure for categorical data. Moreover, a calibration for the weighted L measure is motivated and proposed, which allows us to compare formally the L measure values of competing models. A detailed simulation study is presented to examine the performance of the weighted L measure, and it is compared to other established model-selection methods. Finally, the method is applied to a real dataset using a bivariate ordinal response model. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
45
63
Ming-Hui Chen
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:451-4562013-03-04RePEc:oup:biomet
article
Nonparametric state estimation of diffusion processes
The paper presents a method for estimating nonparametrically the states of one-dimensional diffusion processes. Once certain nuisance parameters have been estimated from the time series, states of a diffusion process can be estimated by the Kalman filter algorithm, so that the method is also useful for filtering and smoothing the states of the process. Numerical comparison of the method with the case of fitting a linear model to data shows that the method is clearly superior in terms of prediction errors. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
451
456
Isao Shoji
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:1-222013-03-04RePEc:oup:biomet
article
A class of logistic-type discriminant functions
In two-group discriminant analysis, the Neyman--Pearson Lemma establishes that the ROC, receiver operating characteristic, curve for an arbitrary linear function is everywhere below the ROC curve for the true likelihood ratio. The weighted area between these two curves can be used as a risk function for finding good discriminant functions. The weight function corresponds to the objective of the analysis, for example to minimise the expected cost of misclassification, or to maximise the area under the ROC. The resulting discriminant functions can be estimated by iteratively reweighted logistic regression. We investigate some asymptotic properties in the 'near-logistic' setting, where we assume the covariates have been chosen such that a linear function gives a reasonable, but not necessarily exact, approximation to the true log likelihood ratio. Some examples are discussed, including a study of medical diagnosis in breast cytology. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
1
22
Shinto Eguchi
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:427-4432013-03-04RePEc:oup:biomet
article
Double block bootstrap confidence intervals for dependent data
The block bootstrap confidence interval for dependent data can outperform the conventional normal approximation only with nontrivial studentization which, in the case of complicated statistics, calls for specialist treatment and often results in unstable endpoints. We propose two double block bootstrap approaches for improving the accuracy of the block bootstrap confidence interval under very general conditions. The first approach calibrates the nominal coverage level and the second calculates studentizing factors directly from a block bootstrap series without the need for nontrivial analytical treatment. We prove that the two approaches reduce the coverage error of the block bootstrap interval by an order of magnitude with simple tuning of block lengths at the two block bootstrapping levels. Empirical properties of the procedures are investigated by simulations and application to an econometric time series. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
427
443
http://hdl.handle.net/10.1093/biomet/asp018
application/pdf
Access to full text is restricted to subscribers.
Stephen M. S. Lee
P. Y. Lai
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:539-5532013-03-04RePEc:oup:biomet
article
A new approach to weighting and inference in sample surveys
The validity of design-based inference is not dependent on any model assumption. However, it is well known that estimators derived through design-based theory may be inefficient for the estimation of population totals when the design weights are weakly related to the variables of interest and have widely dispersed values. We propose estimators that have the potential to improve the efficiency of any estimator derived under the design-based theory. Our main focus is limited to the improvement of the Horvitz--Thompson estimator, but we also discuss the extension to calibration estimators. The new estimators are obtained by smoothing design or calibration weights using an appropriate model. Our approach to inference requires the modelling of only one variable, the weight, and it leads to a single set of smoothed weights in multipurpose surveys. This is to be contrasted with other model-based approaches, such as the prediction approach, in which it is necessary to postulate and validate a model for each variable of interest leading potentially to variable-specific sets of weights. Our proposed approach is first justified theoretically and then evaluated through a simulation study. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/asn028
application/pdf
Access to full text is restricted to subscribers.
Jean-François Beaumont
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:529-5412013-03-04RePEc:oup:biomet
article
Case-control current status data
In this paper, we show that the distribution function of survival times is identified, up to a one-parameter family of distribution functions, based on information from case-control current status data. With supplementary information on the population frequency of cases relative to controls, a simple weighted version of the nonparametric maximum likelihood estimator for prospective current status data provides a natural estimator for case-control samples. Following the parametric results of Scott & Wild (1997), we show that this estimator is, in fact, the nonparametric maximum likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
529
541
Nicholas P. Jewell
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:755-7592013-03-04RePEc:oup:biomet
article
On a generalization of a result of W. G. Cochran
A relationship due to W.G. Cochran showing the effect on least squares regression coefficients of marginalizing over or conditioning on an explanatory variable is generalized to quantile regression coefficients. The condition under which conditioning does not induce interaction or effect reversal is shown. Examples are given. The discussion is simplest when all variables are continuous; the extension to discrete variables is outlined. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
755
759
http://hdl.handle.net/10.1093/biomet/asm046
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:741-7472013-03-04RePEc:oup:biomet
article
Construction of φ p-optimal exact designs with minimum experimental run size for a linear log contrast model in mixture experiments
We propose a new method with minimum experimental run size using the properties of Hadamard matrices through which some φ p-optimal exact designs including A-, D- and E-optimal designs are constructed for a linear log contrast model in mixture experiments. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asr014
application/pdf
Access to full text is restricted to subscribers.
Baisuo Jin
Mong-Na Lo Huang
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:765-7752013-03-04RePEc:oup:biomet
article
Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function
Random effects logistic regression models are often used to model clustered binary response data. Regression parameters in these models have a conditional, subject-specific interpretation in that they quantify regression effects for each cluster. Very often, the logistic functional shape conditional on the random effects does not carry over to the marginal scale. Thus, parameters in these models usually do not have an explicit marginal, population-averaged interpretation. We study a bridge distribution function for the random effect in the random intercept logistic regression model. Under this distributional assumption, the marginal functional shape is still of logistic form, and thus regression parameters have an explicit marginal interpretation. The main advantage of this approach is that likelihood inference can be obtained for either marginal or conditional regression inference within a single model framework. The generality of the results and some properties of the bridge distribution functions are discussed. An example is used for illustration. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
765
775
Zengri Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:615-6312013-03-04RePEc:oup:biomet
article
Aggregation-cokriging for highly multivariate spatial data
Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
615
631
http://hdl.handle.net/10.1093/biomet/asr029
application/pdf
Access to full text is restricted to subscribers.
Reinhard Furrer
Marc G. Genton
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:153-1652013-03-04RePEc:oup:biomet
article
Modelling the effects of partially observed covariates on Poisson process intensity
We propose an estimating function for parameters in a model for Poisson process intensity when time- or space-varying covariates are observed for both the events of the process and at sample times or locations selected from a probability-based sampling design. We investigate the large-sample properties of the proposed estimator under increasing domain asymptotics, demonstrating that it is consistent and asymptotically normally distributed. We illustrate our approach using data from an ecological momentary assessment of smoking. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
153
165
http://hdl.handle.net/10.1093/biomet/asm009
application/pdf
Access to full text is restricted to subscribers.
Stephen L. Rathbun
Saul Shiffman
Chad J. Gwaltney
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:687-7042013-03-04RePEc:oup:biomet
article
A Haar--Fisz technique for locally stationary volatility estimation
We consider a locally stationary model for financial log-returns whereby the returns are independent and the volatility is a piecewise-constant function with jumps of an unknown number and locations, defined on a compact interval to enable a meaningful estimation theory. We demonstrate that the model explains well the common characteristics of log-returns. We propose a new wavelet thresholding algorithm for volatility estimation in this model, in which Haar wavelets are combined with the variance-stabilising Fisz transform. The resulting volatility estimator is mean-square consistent with a near-parametric rate, does not require any pre-estimates, is rapidly computable and is easily implemented. We also discuss important variations on the choice of estimation parameters. We show that our approach both gives a very good fit to selected currency exchange datasets, and achieves accurate long- and short-term volatility forecasts in comparison to the GARCH(1, 1) and moving window techniques. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
687
704
http://hdl.handle.net/10.1093/biomet/93.3.687
text/html
Access to full text is restricted to subscribers.
Piotr Fryzlewicz
Theofanis Sapatinas
Suhasini Subba Rao
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:255-2682013-03-04RePEc:oup:biomet
article
M-quantile models for small area estimation
Small area estimation techniques typically rely on regression models that use both covariates and random effects to explain variation between the areas. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier-robust inference. We describe a new approach to small area estimation that is based on modelling quantilelike parameters of the conditional distribution of the target variable given the covariates. This avoids the problems associated with specification of random effects, allowing inter-area differences to be characterised by area-specific M-quantile coefficients. The proposed approach is easily made robust against outlying data values and can be adapted for estimation of a wide range of area-specific parameters, including quantiles of the distribution of the target variable in the different small areas. The differences between M-quantile and random effects models are discussed and the alternative approaches to small area estimation are compared using both simulated and real data. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
255
268
http://hdl.handle.net/10.1093/biomet/93.2.255
text/html
Access to full text is restricted to subscribers.
Ray Chambers
Nikos Tzavidis
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:321-3302013-03-04RePEc:oup:biomet
article
Analysis of longitudinal data in case-control studies
Case-control studies for longitudinal data are considered. Among repeated binary measurements of disease status in each subject, the exposure levels of risk factors for all diseased cases are identified and the exposure levels for only a small fraction of disease-free cases, to be regarded as controls, are identified. Case-control studies for longitudinal data bring about economies in cost and time when the disease is rare and when assessing the exposure level of risk factors is difficult. We propose a way of using an ordinary logistic model to analyse case-control longitudinal data. We prove that the proposed estimator is consistent and asymptotically normally distributed provided that the choice of control observations is independent of the covariates for those subjects. We also discuss the validity of the generalised estimating equation method for case-control longitudinal data. Simulation results are provided, and a real example is presented. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
321
330
Eunsik Park
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:169-1862013-03-04RePEc:oup:biomet
article
Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorized into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinite-dimensional process, implementation of the conditional method has relied on finite approximations. In this paper, we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a non-conjugate model, different datasets and prior specifications. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
169
186
http://hdl.handle.net/10.1093/biomet/asm086
application/pdf
Access to full text is restricted to subscribers.
Omiros Papaspiliopoulos
Gareth O. Roberts
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:249-2622013-03-04RePEc:oup:biomet
article
Nonparametric Bayes local partition models for random effects
This paper focuses on the problem of choosing a prior for an unknown random effects distribution within a Bayesian hierarchical model. The goal is to obtain a sparse representation by allowing a combination of global and local borrowing of information. A local partition process prior is proposed, which induces dependent local clustering. Subjects can be clustered together for a subset of their parameters, and one learns about similarities between subjects increasingly as parameters are added. Some basic properties are described, including simple two-parameter expressions for marginal and conditional clustering probabilities. A slice sampler is developed which bypasses the need to approximate the countably infinite random measure in performing posterior computation. The methods are illustrated using simulation examples, and an application to hormone trajectory data. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
249
262
http://hdl.handle.net/10.1093/biomet/asp021
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:427-4412013-03-04RePEc:oup:biomet
article
Uncertainty in prior elicitations: a nonparametric approach
A key task in the elicitation of expert knowledge is to construct a distribution from the finite, and usually small, number of statements that have been elicited from the expert. These statements typically specify some quantiles or moments of the distribution. Such statements are not enough to identify the expert's probability distribution uniquely, and the usual approach is to fit some member of a convenient parametric family. There are two clear deficiencies in this solution. First, the expert's beliefs are forced to fit the parametric family. Secondly, no account is then taken of the many other possible distributions that might have fitted the elicited statements equally well. We present a nonparametric approach which tackles both of these deficiencies. We also consider the issue of the imprecision in the elicited probability judgements. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
427
441
http://hdl.handle.net/10.1093/biomet/asm031
application/pdf
Access to full text is restricted to subscribers.
Jeremy E. Oakley
Anthony O'Hagan
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:985-9902013-03-04RePEc:oup:biomet
article
A note on methods of restoring consistency to the bootstrap
We consider the property of consistency and its relevance for determining the performance of the bootstrap. We analyse various parametric bootstrap approximations to the distributions of the Hodges and Stein estimators, whose behaviour is typical of that of super-efficient estimators employed in wavelet regression, kernel density estimation and nonparametric curve fitting. Our results reveal not only some of the difficulties in selecting good modifications to the intuitive bootstrap, but also that inconsistent bootstrap approximations may perform better than consistent versions even in large samples. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
985
990
Richard Samworth
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:1-182013-03-04RePEc:oup:biomet
article
Maxima of discretely sampled random fields, with an application to 'bubbles'
A smooth Gaussian random field with zero mean and unit variance is sampled on a discrete lattice, and we are interested in the exceedance probability or P-value of the maximum in a finite region. If the random field is smooth relative to the mesh size, then the P-value can be well approximated by results for the continuously sampled smooth random field (Adler, 1981; Worsley, 1995a; Taylor & Adler, 2003; Adler & Taylor, 2007). If the random field is not smooth, so that adjacent lattice values are nearly independent, then the usual Bonferroni bound is very accurate. The purpose of this paper is to bridge the gap between the two, and derive a simple, accurate upper bound for intermediate mesh sizes. The result uses a new improved Bonferroni-type bound based on discrete local maxima. We give an application to the 'bubbles' technique for detecting areas of the face used to discriminate fear from happiness. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
1
18
http://hdl.handle.net/10.1093/biomet/asm004
application/pdf
Access to full text is restricted to subscribers.
J. E. Taylor
K. J. Worsley
F. Gosselin
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:399-4142013-03-04RePEc:oup:biomet
article
Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity
We consider a unified least absolute deviation estimator for stationary and nonstationary fractionally integrated autoregressive moving average models with conditional heteroscedasticity. Its asymptotic normality is established when the second moments of errors and innovations are finite. Several other alternative estimators are also discussed and are shown to be less efficient and less robust than the proposed approach. A diagnostic tool, consisting of two portmanteau tests, is designed to check whether or not the estimated models are adequate. The simulation experiments give further support to our model and the results for the absolute returns of the Dow Jones Industrial Average Index daily closing price demonstrate their usefulness in modelling time series exhibiting the features of long memory, conditional heteroscedasticity and heavy tails. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
399
414
http://hdl.handle.net/10.1093/biomet/asn014
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:585-5962013-03-04RePEc:oup:biomet
article
Using logistic regression procedures for estimating receiver operating characteristic curves
Estimation of a receiver operating characteristic, ROC, curve is usually based either on a fully parametric model such as a normal model or on a fully nonparametric model. In this paper, we explore a semiparametric approach by assuming a density ratio model for disease and disease-free densities. This model has a natural connection with the logistic regression model. The proposed semiparametric approach is more robust than a fully parametric approach and is more efficient than a fully nonparametric approach. Two real examples demonstrate that the ROC curve estimated by our semiparametric method is much smoother than that estimated by the nonparametric method. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
585
596
Jing Qin
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:603-6162013-03-04RePEc:oup:biomet
article
A simple and efficient simulation smoother for state space time series analysis
A simulation smoother in state space time series analysis is a procedure for drawing samples from the conditional distribution of state or disturbance vectors given the observations. We present a new technique for this which is both simple and computationally efficient. The treatment includes models with diffuse initial conditions and regression effects. Computational comparisons are made with the previous standard method. Two applications are provided to illustrate the use of the simulation smoother for Gibbs sampling for Bayesian inference and importance sampling for classical inference. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
603
616
J. Durbin
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:669-6822013-03-04RePEc:oup:biomet
article
A Poisson model for the coverage problem with a genomic application
Suppose a population has infinitely many individuals and is partitioned into unknown N disjoint classes. The sample coverage of a random sample from the population is the total proportion of the classes observed in the sample. This paper uses a nonparametric Poisson mixture model to give new understanding and results for inference on the sample coverage. The Poisson mixture model provides a simplified framework for inferring any general abundance-K coverage, the sum of the proportions of those classes that contribute exactly k individuals in the sample for some k in K, with K being a set of nonnegative integers. A new moment-based derivation of the well-known Turing estimators is presented. As an application, a gene-categorisation problem in genomic research is addressed. Since Turing's approach is a moment-based method, maximum likelihood estimation and minimum distance estimation are indicated as alternatives for the coverage problem. Finally, it will be shown that any Turing estimator is asymptotically fully efficient. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
669
682
Chang Xuan Mao
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:585-6012013-03-04RePEc:oup:biomet
article
Implications of influence function analysis for sliced inverse regression and sliced average variance estimation
Sliced inverse regression, sliced inverse regression II and sliced average variance estimation are three related dimension-reduction methods that require relatively mild model assumptions. As an approximation for the relative influence of single observations from large samples, the influence function is used to compare the sensitivity of the three methods to particular observational types. The analysis carried out here helps to explain why there is a lack of agreement concerning the preferability of these dimension-reduction procedures in general. An efficient sample version of the influence function is also developed and evaluated. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
585
601
http://hdl.handle.net/10.1093/biomet/asm055
application/pdf
Access to full text is restricted to subscribers.
Luke A. Prendergast
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:543-5572013-03-04RePEc:oup:biomet
article
Smooth quantile ratio estimation
We propose a novel approach to estimating the mean difference between two highly skewed distributions. The method, which we call smooth quantile ratio estimation, smooths, over percentiles, the ratio of the quantiles of the two distributions. The method defines a large class of estimators, including the sample mean difference, the maximum likelihood estimator under log-normal samples and the L-estimator. We derive asymptotic properties such as consistency and asymptotic normality, and also provide a closed-form expression for the asymptotic variance. In a simulation study, we show that smooth quantile ratio estimation has lower mean squared error than several competitors, including the sample mean difference and the log-normal parametric estimator in several realistic situations. We apply the method to the 1987 National Medicare Expenditure Survey to estimate the difference in medical expenditures between persons suffering from the smoking attributable diseases, lung cancer and chronic obstructive pulmonary disease, and persons without these diseases. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
543
557
http://hdl.handle.net/10.1093/biomet/92.3.543
text/html
Access to full text is restricted to subscribers.
Francesca Dominici
Leslie Cope
Daniel Q. Naiman
Scott L. Zeger
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:23-372013-03-04RePEc:oup:biomet
article
The analysis of retrospective family studies
Case-control samples allow straightforward calculation of estimates of the association between covariates and disease status by fitting a prospective logistic regression model. In genetic studies of disease, investigators often gather additional information on response and covariate variables from family members of cases and controls. The objective is to model the responses of all the family members in terms of the covariate data. Whittemore (1995) has discussed maximum likelihood methods for fitting a special class of logistic models to family data collected according to a particular design. In the present paper, we show that we can obtain efficient semiparametric maximum likelihood estimates for an arbitrary multivariate binary regression model by fitting a modified prospective model for a wide class of retrospective designs. However, in contrast to the situation with simple case-control studies, the prospective model will differ from the original model even when the model is logistic. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
23
37
J. Neuhaus
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:982-9842013-03-04RePEc:oup:biomet
article
Conditional and marginal association for binary random variables
The relationship between marginal and conditional distributions of binary random variables is analysed via a log-linear model. Conditions for the Yule--Simpson effect are established and the implications for latent class analysis examined. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
982
984
D. R. Cox
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:827-8412013-03-04RePEc:oup:biomet
article
Auxiliary mixture sampling for parameter-driven models of time series of counts with applications to state space modelling
We consider parameter-driven models of time series of counts, where the observations are assumed to arise from a Poisson distribution with a mean changing over time according to a latent process. Estimation of these models is carried out within a Bayesian framework using data augmentation and Markov chain Monte Carlo methods. We suggest a new auxiliary mixture sampler, which possesses a Gibbsian transition kernel, where we draw from full conditional distributions belonging to standard distribution families only. Emphasis lies on application to state space modelling of time series of counts, but we show that auxiliary mixture sampling may be applied to a wider range of parameter-driven models, including random-effects models and panel data models based on the Poisson distribution. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
827
841
http://hdl.handle.net/10.1093/biomet/93.4.827
text/html
Access to full text is restricted to subscribers.
Sylvia Fr�Hwirth-Schnatter
Helga Wagner
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:671-6862013-03-04RePEc:oup:biomet
article
Models for interval censoring and simulation-based inference for lifetime distributions
Interval-censored lifetime data arise when individuals in a study are inspected intermittently so that a lifetime is observed to lie between two successive times. In settings where only these two times are available, methods exist for nonparametric or parametric estimation of lifetime distributions. However, there has been virtually no discussion of how inspection processes may be estimated or identified. Such estimates are needed if one is to generate interval-censored data by simulation. This paper identifies which aspects of an independent inspection process are estimable from interval-censored data, and shows how to obtain nonparametric estimates. The results allow interval-censored data from any specified distribution to be generated, and give new simulation procedures for estimation or testing. A new omnibus goodness-of-fit test is introduced. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
671
686
http://hdl.handle.net/10.1093/biomet/93.3.671
text/html
Access to full text is restricted to subscribers.
J. F. Lawless
Denise Babineau
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:819-8292013-03-04RePEc:oup:biomet
article
Estimation of nonstationary spatial covariance structure
We introduce a method for estimating nonstationary spatial covariance structure from space-time data and apply the method to an analysis of Sydney wind patterns. Our method constructs a process honouring a given spatial covariance matrix at observing stations and uses one or more stationary processes to describe conditional behaviour given observing site values. The stationary processes give a localised description of the spatial covariance structure. The method is computationally attractive, and can be extended to the assessment of covariance for multivariate processes. The technique is illustrated for data describing the east-west component of Sydney winds. For this example, our own methods are contrasted with a geometrically appealing though computationally intensive technique which describes spatial correlation via an isotropic process and a deformation of the geographical space. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
819
829
David J. Nott
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:627-6462013-03-04RePEc:oup:biomet
article
Simulation and inference for stochastic volatility models driven by Lévy processes
We study Ornstein-Uhlenbeck stochastic processes driven by Lévy processes, and extend them to more general non-Ornstein-Uhlenbeck models. In particular, we investigate the means of making the correlation structure in the volatility process more flexible. For one model, we implement a method for introducing quasi long-memory into the volatility model. We demonstrate that the models can be fitted to real share price returns data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
627
646
http://hdl.handle.net/10.1093/biomet/asm048
application/pdf
Access to full text is restricted to subscribers.
Matthew P. S. Gander
David A. Stephens
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:877-8912013-03-04RePEc:oup:biomet
article
Generalised incomplete Trojan designs
Generalised incomplete (m x n)/k Trojan designs for m replicates of nk treatments based on sets of k cyclic generators are discussed. Normal equations for plots-within-columns, plots-within-blocks and blocks-within-columns treatment effects are developed. The nk treatments are divided into k subsets each of size n and the conditional plots-within-blocks and blocks-within-columns information matrix for each subset is defined. Efficient conditional treatment estimates are discussed and efficient generators for the various strata are discussed. Balanced (m x n)/k incomplete Trojan designs based on Youden generators are constructed and designs based on multiples of a single generator are discussed. Some ideas for constructing efficient general (m x n)/2 designs are outlined and some advantages of generalised incomplete Trojan designs are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
877
891
R. N. Edmondson
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:119-1332013-03-04RePEc:oup:biomet
article
Multiscale generalised linear models for nonparametric function estimation
We present a method for extracting information about both the scale and trend of local components of an inhomogeneous function in a nonparametric generalised linear model. Our multiscale framework combines recursive partitions, which allow for the incorporation of scale in a natural manner, with systems of piecewise polynomials supported on the partition intervals, which serve to summarise the smooth trend within each interval. Our estimators are formulated as solutions of complexity-penalised likelihood optimisations, where the penalty seeks to limit the number of intervals used to model the data. The actual calculation of the estimators may be accomplished using standard software routines for generalised linear models, within the context of efficient, tree-based, polynomial-time algorithms. A risk analysis shows that these estimators achieve the same asymptotic rates in the nonparametric generalised linear model as the classical wavelet-based estimators in the Gaussian 'function plus noise' model, for suitably defined ranges of Besov spaces. Numerical simulations show that the method tends to perform at least as well as, and often better than, alternative wavelet-based methodologies in the context of finite samples, while applications to gamma-ray burst data in astronomy and packet loss data in computer network tra.c analysis confirm its practical relevance. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
119
133
http://hdl.handle.net/10.1093/biomet/92.1.119
text/html
Access to full text is restricted to subscribers.
Eric D. Kolaczyk
Robert D. Nowak
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:51-652013-03-04RePEc:oup:biomet
article
Orthogonal and nearly orthogonal designs for computer experiments
We introduce a method for constructing a rich class of designs that are suitable for use in computer experiments. The designs include Latin hypercube designs and two-level fractional factorial designs as special cases and fill the vast vacuum between these two familiar classes of designs. The basic construction method is simple, building a series of larger designs based on a given small design. If the base design is orthogonal, the resulting designs are orthogonal; likewise, if the base design is nearly orthogonal, the resulting designs are nearly orthogonal. We present two generalizations of our basic construction method. The first generalization improves the projection properties of the basic method; the second generalization gives rise to designs that have smaller correlations. Sample constructions are presented and properties of these designs are discussed. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
51
65
http://hdl.handle.net/10.1093/biomet/asn057
application/pdf
Access to full text is restricted to subscribers.
Derek Bingham
Randy R. Sitter
Boxin Tang
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:996-10022013-03-04RePEc:oup:biomet
article
Identification of a competing risks model with unknown transformations of latent failure times
This paper is concerned with identification of a competing risks model with unknown transformations of latent failure times. The model includes, as special cases, competing risks versions of proportional hazards, mixed proportional hazards and accelerated failure time models. It is shown that covariate effects on latent failure times, cause-specific link functions and the joint survivor function of the disturbance terms can be identified without relying on modelling the dependence between latent failure times parametrically nor using an exclusion restriction among covariates. As a result, the paper provides an identification result about the joint survivor function of the latent failure times conditional on covariates. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
996
1002
http://hdl.handle.net/10.1093/biomet/93.4.996
text/html
Access to full text is restricted to subscribers.
Sokbae Lee
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:238-2442013-03-04RePEc:oup:biomet
article
Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure
We propose a method for comparing survival distributions when cause-of-failure information is missing for some individuals. We use multiple imputation to impute missing causes of failure, where the probability that a missing cause is that of interest may depend on auxiliary covariates, and combine log-rank statistics computed from several 'completed' datasets into a test statistic that achieves asymptotically the nominal level. Simulations demonstrate the relevance of the theory in finite samples. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
238
244
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:153-1642013-03-04RePEc:oup:biomet
article
Design sensitivity in observational studies
Outside the field of statistics, the literature on observational studies offers advice about research designs or strategies for judging whether or not an association is causal, such as multiple operationalism or a dose-response relationship. These useful suggestions are typically informal and qualitative. A quantitative measure, design sensitivity, is proposed for measuring the contribution such strategies make in distinguishing causal effects from hidden biases. Several common strategies are then evaluated in terms of their contribution to design sensitivity. A related method for computing the power of a sensitivity analysis is also developed. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
153
164
Paul R. Rosenbaum
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:465-4762013-03-04RePEc:oup:biomet
article
Saddlepoint approximations for the Bingham and Fisher–Bingham normalising constants
The Fisher--Bingham distribution is obtained when a multivariate normal random vector is conditioned to have unit length. Its normalising constant can be expressed as an elementary function multiplied by the density, evaluated at 1, of a linear combination of independent noncentral χ-sub-1-super-2 random variables. Hence we may approximate the normalising constant by applying a saddlepoint approximation to this density. Three such approximations, implementation of each of which is straightforward, are investigated: the first-order saddlepoint density approximation, the second-order saddlepoint density approximation and a variant of the second-order approximation which has proved slightly more accurate than the other two. The numerical and theoretical results we present showthat this approach provides highly accurate approximations in a broad spectrum of cases. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
465
476
http://hdl.handle.net/10.1093/biomet/92.2.465
text/html
Access to full text is restricted to subscribers.
A. Kume
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:861-8762013-03-04RePEc:oup:biomet
article
Forming post-strata via Bayesian treed capture-recapture models
For the problem of dual system estimation, we propose a Bayesian treed capture-recapture model to account for heterogeneity of capture probabilities where individual auxiliary information is available. The model uses a binary tree to partition the covariate space into 'homogeneous' regions, within each of which the capture response can be described adequately by a simple model that assumes equal catchability. The attractive features of the proposed model include reduction of correlation bias, robustness and practical flexibility as well as simplicity and interpretability. In addition, it provides a systematic and effective way of forming post-strata for the Sekar--Deming estimator of population size. We compare the performance of estimators based on this model to those of alternative estimators in three scenarios. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
861
876
http://hdl.handle.net/10.1093/biomet/93.4.861
text/html
Access to full text is restricted to subscribers.
Xinlei Wang
Johan Lim
S. Lynne Stokes
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:185-1982013-03-04RePEc:oup:biomet
article
Partially linear models with missing response variables and error-prone covariates
We consider partially linear models of the form Y = X-super-Tβ + ν(Z) + ɛ when the response variable Y is sometimes missing with missingness probability π depending on (X, Z), and the covariate X is measured with error, where ν(z) is an unspecified smooth function. The missingness structure is therefore missing not at random, rather than the usual missing at random. We propose a class of semiparametric estimators for the parameter of interest β, as well as for the population mean E(Y). The resulting estimators are shown to be consistent and asymptotically normal under general assumptions. To construct a confidence region for β, we also propose an empirical-likelihood-based statistic, which is shown to have a chi-squared distribution asymptotically. The proposed methods are applied to an AIDS clinical trial dataset. A simulation study is also reported. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
185
198
http://hdl.handle.net/10.1093/biomet/asm010
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Suojin Wang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:245-2502013-03-04RePEc:oup:biomet
article
A note on testing for nonlinearity with partially observed time series
We have implemented a Lagrange multiplier test for the alternative hypothesis of a nonlinear continuous-time autoregressive model with the instantaneous mean having multiple degrees of nonlinearity. This test is an extension of a Lagrange multiplier test proposed by Tsai & Chan (2000), with the alternative model analogous to the model used in Tsay's (1986) discrete-time work. The performance of the test in the finite-sample case is compared with several existing tests for nonlinearity including Keenan's (1985) test, Petruccelli & Davies' (1986) test, Tsay's (1986, 1989) tests and Tsai & Chan's (2000) test. The comparison is based on simulated data from some linear autoregressive models, self-exciting threshold autoregressive models, bilinear models and the nonlinear continuous-time autoregressive models for which the Lagrange multiplier test is designed. In general, the test is more powerful than all the other tests. The test is further illustrated with the annual sunspot data and the lynx data. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
245
250
Henghsiu Tsai
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:127-1372013-03-04RePEc:oup:biomet
article
Implementing matching priors for frequentist inference
Nuisance parameters do not pose any problems in Bayesian inference as marginalisation allows for study of the posterior distribution solely in terms of the parameter of interest. However, no general solution is available for removing nuisance parameters under the frequentist paradigm. In this paper, we merge the two approaches to construct a general procedure for frequentist elimination of nuisance parameters through the use of matching priors. In particular, we perform Bayesian marginalisation with respect to a prior distribution under which posterior inferences have approximate frequentist validity. Matching priors are constructed as solutions to a partial differential equation. Unfortunately, except in simple cases, these partial differential equations do not yield to analytical nor even standard numerical methods of solution. We present a numerical/Monte Carlo algorithm for obtaining the matching prior, in general, as a solution to the appropriate partial differential equation and draw posterior inferences. To be specific, we develop an automated routine through an implementation of the Metropolis--Hastings algorithm for deriving frequentist valid inferences via the matching prior. We illustrate our results in the contexts of fitting random effects models, fitting logistic regression models and fitting teratological data by beta-binomial models. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
127
137
Richard A. Levine
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:779-7862013-03-04RePEc:oup:biomet
article
Covariance decomposition in undirected Gaussian graphical models
The covariance between two variables in a multivariate Gaussian distribution is decomposed into a sum of path weights for all paths connecting the two variables in an undirected independence graph. These weights are useful in determining which variables are important in mediating correlation between the two path endpoints. The decomposition arises in undirected Gaussian graphical models and does not require or involve any assumptions of causality. This covariance decomposition is derived using basic linear algebra. The decomposition is feasible for very large numbers of variables if the corresponding precision matrix is sparse, a circumstance that arises in examples such as gene expression studies in functional genomics. Additional computational efficiences are possible when the undirected graph is derived from an acyclic directed graph. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
779
786
http://hdl.handle.net/10.1093/biomet/92.4.779
text/html
Access to full text is restricted to subscribers.
Beatrix Jones
Mike West
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:277-2902013-03-04RePEc:oup:biomet
article
Semiparametric regression analysis for doubly censored data
We analyse doubly censored data using semiparametric transformation models. We provide inference procedures for the regression parameters and derive the asymptotic distributions of the proposed estimators. Procedures for model checking and model selection are also discussed. We illustrate our approach with a viral-load dataset from a recent AIDS clinical trial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
277
290
T. Cai
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:821-8302013-03-04RePEc:oup:biomet
article
Estimating residual variance in nonparametric regression using least squares
We propose a new estimator for the error variance in a nonparametric regression model. We estimate the error variance as the intercept in a simple linear regression model with squared differences of paired observations as the dependent variable and squared distances between the paired covariates as the regressor. For the special case of a one-dimensional domain with equally spaced design points, we show that our method reaches an asymptotic optimal rate which is not achieved by some existing methods. We conduct extensive simulations to evaluate finite-sample performance of our method and compare it with existing methods. Our method can be extended to nonparametric regression models with multivariate functions defined on arbitrary subsets of normed spaces, possibly observed on unequally spaced or clustered designed points. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
821
830
http://hdl.handle.net/10.1093/biomet/92.4.821
text/html
Access to full text is restricted to subscribers.
Tiejun Tong
Yuedong Wang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:891-9052013-03-04RePEc:oup:biomet
article
Model diagnostic tests for selecting informative correlation structure in correlated data
In the generalized method of moments approach to longitudinal data analysis, unbiased estimating functions can be constructed to incorporate both the marginal mean and the correlation structure of the data. Increasing the number of parameters in the correlation structure corresponds to increasing the number of estimating functions. Thus, building a correlation model is equivalent to selecting estimating functions. This paper proposes a chi-squared test to choose informative unbiased estimating functions. We show that this methodology is useful for identifying which source of correlation it is important to incorporate when there are multiple possible sources of correlation. This method can also be applied to determine the optimal working correlation for the generalized estimating equation approach. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
891
905
http://hdl.handle.net/10.1093/biomet/asn051
application/pdf
Access to full text is restricted to subscribers.
Annie Qu
J. Jack Lee
Bruce G. Lindsay
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:75-842013-03-04RePEc:oup:biomet
article
Efficient semiparametric estimator for heteroscedastic partially linear models
We study the heteroscedastic partially linear model with an unspecified partial baseline component and a nonparametric variance function. An interesting finding is that the performance of a naive weighted version of the existing estimator could deteriorate when the smooth baseline component is badly estimated. To avoid this, we propose a family of consistent estimators and investigate their asymptotic properties. We show that the optimal semiparametric efficiency bound can be reached by a semiparametric kernel estimator in this family. Building upon our theoretical findings and heuristic arguments about the equivalence between kernel and spline smoothing, we conjecture that a weighted partial-spline estimator could also be semiparametric efficient. Properties of the proposed estimators are presented through theoretical illustration and numerical simulations. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
75
84
http://hdl.handle.net/10.1093/biomet/93.1.75
text/html
Access to full text is restricted to subscribers.
Yanyuan Ma
Jeng-Min Chiou
Naisyin Wang
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:423-4342013-03-04RePEc:oup:biomet
article
Measures for designs in experiments with correlated errors
In this paper we consider optimal design of experiments in the case of correlated observations. We use and further develop the concept of design measures introduced by P�zman & M�ller (1998) for the construction of a simple, quick and elegant design algorithm. We support the construction of this algorithm for a general correlation structure by an interpretation in terms of norms. Examples demonstrate that our results are useful for generating exact designs by sampling from the obtained design measures. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
423
434
Werner G. M�ller
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:643-6542013-03-04RePEc:oup:biomet
article
A multiple-imputation Metropolis version of the EM algorithm
In this paper we introduce a new stochastic variant of the EM algorithm. The algorithm combines the principle of multiple imputation and the theory of simulated annealing to deal with cases where the E-step and the M-step can be intractable or numerically inefficient. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
643
654
Carlo Gaetan
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:841-8502013-03-04RePEc:oup:biomet
article
Testing ignorable missingness in estimating equation approaches for longitudinal data
We address the matter of determining whether or not missing data in longitudinal studies are ignorable with regard to quasilikelihood or estimating equations approaches. This involves testing for whether or not the zero-mean property of estimating equations holds true. Chen & Little (1999) proposed testing for significant differences among parameter estimators calculated from sample subsets with different patterns of missing data, whereas we propose a more unified generalised score-type test. This avoids exhaustive estimation of parameters for each missing-data pattern, testing instead with a single quadratic score test statistic whether or not there is a common parameter under which the means of all the pattern-specific estimating equations are zero. Comparisons are made for the two approaches with both simulations and real data examples. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
841
850
Annie Qu
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:234-2392013-03-04RePEc:oup:biomet
article
Compatibility among marginal densities
In the Lancaster representation a joint density is decomposed into a sum of additive interactions. Using these interactions, we derive conditions for checking compatibility among a collection of marginal densities. The representation also shows how to construct an all-positive joint density additively from a given set of compatible marginals. An algorithm is proposed for reducing the dimension of the marginal densities so that compatibility can be checked in sequential increments. The representation may yield insights into the construction and simulation of models represented by undirected graphs. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
234
239
Yuchung J. Wang
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:187-2042013-03-04RePEc:oup:biomet
article
Two-stage sampling from a prediction point of view when the cluster sizes are unknown
We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n 0 of sampled clusters, they differ significantly, but for large n 0, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 - α and are slightly less than 1 - α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 - 2α, being raised to 1 - α for a modified interval based on the distribution. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
187
204
http://hdl.handle.net/10.1093/biomet/asm098
application/pdf
Access to full text is restricted to subscribers.
Jan F. Bjørnstad
Elinor Ytterstad
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:909-9202013-03-04RePEc:oup:biomet
article
First-order intrinsic autoregressions and the de Wijs process
We discuss intrinsic autoregressions for a first-order neighbourhood on a two-dimensional rectangular lattice and give an exact formula for the variogram that extends known results to the asymmetric case. We obtain a corresponding asymptotic expansion that is more accurate and more general than previous ones and use this to derive the de Wijs variogram under appropriate averaging, a result that can be interpreted as a two-dimensional spatial analogue of Brownian motion obtained as the limit of a random walk in one dimension. This provides a bridge between geostatistics, where the de Wijs process was once the most popular formulation, and Markov random fields, and also explains why statistical analysis using intrinsic autoregressions is usually robust to changes of scale. We briefly describe corresponding calculations in the frequency domain, including limiting results for higher-order autoregressions. The paper closes with some practical considerations, including applications to irregularly-spaced data. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
909
920
http://hdl.handle.net/10.1093/biomet/92.4.909
text/html
Access to full text is restricted to subscribers.
Julian Besag
Debashis Mondal
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:325-3332013-03-04RePEc:oup:biomet
article
Objective Bayesian analysis for the Student-t regression model
We develop a Bayesian analysis based on two different Jeffreys priors for the Student-t regression model with unknown degrees of freedom. It is typically difficult to estimate the number of degrees of freedom: improper prior distributions may lead to improper posterior distributions, whereas proper prior distributions may dominate the analysis. We show that Bayesian analysis with either of the two considered Jeffreys priors provides a proper posterior distribution. Finally, we show that Bayesian estimators based on Jeffreys analysis compare favourably to other Bayesian estimators based on priors previously proposed in the literature. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
325
333
http://hdl.handle.net/10.1093/biomet/asn001
application/pdf
Access to full text is restricted to subscribers.
Thaís C. O. Fonseca
Marco A. R. Ferreira
Helio S. Migon
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:649-6582013-03-04RePEc:oup:biomet
article
Efficient estimation in additive hazards regression with current status data
Current status data arise when the exact timing of an event is unobserved, and it is only known at a given point in time whether or not the event has occurred. Recently Lin et al. (1998) studied the additive semiparametric hazards model for current status data. They showed that the analysis of current status data under the additive hazards model reduces to ordinary Cox regression under the assumption that a proportional hazards model may be used to describe the monitoring intensity. This analysis does not make efficient use of data, and in some cases it may not be appropriate to assume a proportional hazards model for the monitoring times. We study the semiparametric hazards model for current status data but make use of the semiparametric efficient score function. The suggested approach has the advantages that it is efficient in that it reaches the semiparametric information bound, and it does not involve any modelling of the monitoring times. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
649
658
Torben Martinussen
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:183-1972013-03-04RePEc:oup:biomet
article
Nonparametric estimation from current status data with competing risks
A great deal of recent attention has focused on the estimation of survival distributions based on current status data, an extreme form of interval censored data. This particular data structure arises in a wide variety of applications where cross-sectional observation either naturally occurs or is preferred to more traditional forms of follow-up. Here we consider current status data in the context of competing risks. We briefly consider simple parametric models as a backdrop to nonparametric procedures. We make some brief comparisons and remarks regarding the nonparametric maximum likelihood estimator. The ideas are illustrated on the data of Krailo & Pike (1983) which considers estimation of the age distribution at both natural and operative menopause. We also consider the case where there is exact observation of failure times due to one of the competing risks when failure occurs prior to the monitoring time. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
183
197
Nicholas P. Jewell
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:343-3552013-03-04RePEc:oup:biomet
article
Estimating the quality-of-life-adjusted gap time distribution of successive events subject to censoring
When treatment effects are studied in the context of successive or recurrent life events, separate analyses of the quality-of-life scores and of the inter-event, gap, times might lead to possibly contradictory conclusions. In an attempt to reconcile this, we propose a unitary and more comprehensive nonparametric analysis that combines the two separate analyses by introducing the quality-of-life-adjusted gap time concept. Inverse probability of censoring estimators of the quality-of-life-adjusted gap time joint and conditional distributions are proposed and are shown to be consistent and asymptotically normal. Simulations performed in a variety of scenarios indicate that the joint and conditional quality-of-life-adjusted gap time distribution estimators are virtually unbiased, with properly estimated standard errors and asymptotic normality features. An example from the International Breast Cancer Study Group Trial V illustrates the use of the proposed estimators. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
343
355
http://hdl.handle.net/10.1093/biomet/93.2.343
text/html
Access to full text is restricted to subscribers.
Adin-Cristian Andrei
Susan Murray
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:685-7002013-03-04RePEc:oup:biomet
article
Conditional Akaike information under generalized linear and proportional hazards mixed models
We study model selection for clustered data, when the focus is on cluster specific inference. Such data are often modelled using random effects, and conditional Akaike information was proposed in Vaida & Blanchard (2005) and used to derive an information criterion under linear mixed models. Here we extend the approach to generalized linear and proportional hazards mixed models. Outside the normal linear mixed models, exact calculations are not available and we resort to asymptotic approximations. In the presence of nuisance parameters, a profile conditional Akaike information is proposed. Bootstrap methods are considered for their potential advantage in finite samples. Simulations show that the performance of the bootstrap and the analytic criteria are comparable, with bootstrap demonstrating some advantages for larger cluster sizes. The proposed criteria are applied to two cancer datasets to select models when the cluster-specific inference is of interest. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
685
700
http://hdl.handle.net/10.1093/biomet/asr023
application/pdf
Access to full text is restricted to subscribers.
M. C. Donohue
R. Overholser
R. Xu
F. Vaida
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:243-2482013-03-04RePEc:oup:biomet
article
Plant-capture estimation of the size of a homogeneous population
We consider maximum likelihood estimation of the size of a target population to which has been added a known number of planted individuals. The standard equal-catchability model used in mark-recapture is assumed to be applicable to the augmented population. After proving the unimodality of the profile likelihood for the target population size, we obtain both the maximum likelihood estimator of this size and interval estimators based on its asymptotic distribution. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
243
248
http://hdl.handle.net/10.1093/biomet/asm012
application/pdf
Access to full text is restricted to subscribers.
I. B. J. Goudie
P. E. Jupp
J. Ashbridge
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:1007-10142013-03-04RePEc:oup:biomet
article
Generalized linear time series regression
We consider a cross-section model that contains an individual component, a deterministic time trend and an unobserved latent common time series component. We show the following oracle property: the parameters of the latent time series and the parameters of the deterministic time trend can be estimated with the same asymptotic accuracy as if the parameters of the individual component were known. We consider this model in two settings: least squares fits of linear specifications of the individual component and the parameters of the deterministic time trend and, more generally, quasilikelihood estimation in a generalized linear time series model. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
1007
1014
http://hdl.handle.net/10.1093/biomet/asr044
application/pdf
Access to full text is restricted to subscribers.
Enno Mammen
Jens Perch Nielsen
Bernd Fitzenberger
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:507-5172013-03-04RePEc:oup:biomet
article
Likelihood ratio tests in curved exponential families with nuisance parameters present only under the alternative
For submodels of an exponential family, we consider likelihood ratio tests for hypotheses that render some parameters nonidentifiable. First, we establish the asymptotic equivalence between the likelihood ratio test and the score test. Secondly, the score-test representation is used to derive the asymptotic distribution of the likelihood ratio test. These results are derived for general submodels of an exponential family without assuming compactness of the parameter space. We then exemplify the results on a class of multivariate normal models, where null hypotheses concerning the covariance structure lead to loss of identifiability of a parameter. Our motivating problem throughout the paper is to test a random intercepts model against an alternative covariance structure allowing for serial correlation. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
507
517
http://hdl.handle.net/10.1093/biomet/92.3.507
text/html
Access to full text is restricted to subscribers.
Christian Ritz
Ib M. Skovgaard
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:484-4892013-03-04RePEc:oup:biomet
article
Hypothesis testing when a nuisance parameter is present only under the alternative: Linear model case
The results of Davies (1977, 1987) are extended to a linear model situation with unknown residual variance. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
484
489
Robert B. Davies
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:269-2782013-03-04RePEc:oup:biomet
article
Applying the Horvitz-Thompson criterion in complex designs: A computer-intensive perspective for estimating inclusion probabilities
A modification of the Horvitz-Thompson estimator is proposed for complex sampling designs. The inclusion probabilities are estimated by means of independent replications of the sampling scheme. The properties of the resulting estimator are derived. Guidelines for choosing the appropriate number of replications are given and some applications are considered. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
269
278
http://hdl.handle.net/10.1093/biomet/93.2.269
text/html
Access to full text is restricted to subscribers.
Lorenzo Fattorini
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:1-172013-03-04RePEc:oup:biomet
article
Semiparametric analysis of short-term and long-term hazard ratios with two-sample survival data
Standard approaches to semiparametric modelling of two-sample survival data are not appropriate when the two survival curves cross. We introduce a two-sample model that accommodates crossing survival curves. The two scalar parameters of the model have the interpretations of being the short-term and long-term hazard ratios respectively. The time-varying hazard ratio is expressed semiparametrically by the two scalar parameters and an unspecified baseline distribution. The new model includes the Cox model and the proportional odds model as submodels. For inference we use a pseudo maximum likelihood approach that can be expressed via some simple estimating equations, analogous to that for the maximum partial likelihood estimator of the Cox model, that provide consistent and asymptotically normal estimators. Simulation studies show that the estimators perform well for moderate sample sizes. We also illustrate the methods with a real-data example. The new model can be extended easily to the regression setting. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
1
17
http://hdl.handle.net/10.1093/biomet/92.1.1
text/html
Access to full text is restricted to subscribers.
Song Yang
Ross Prentice
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:295-3052013-03-04RePEc:oup:biomet
article
A family of Bayes multiple testing procedures
Under the model of independent test statistics, we propose a two-parameter family of Bayes multiple testing procedures. The two parameters can be viewed as tuning parameters. Using the Benjamini--Hochberg step-up procedure for controlling false discovery rate as a baseline for conservativeness, we choose the tuning parameters to compromise between the operating characteristics of that procedure and a less conservative procedure that focuses on alternatives that a priori might be considered likely or meaningful. The Bayes procedures do not have the theoretical and practical shortcomings of the popular stepwise procedures. In terms of the number of mistakes, simulations for two examples indicate that over a large segment of the parameter space, the Bayes procedure is preferable to the step-up procedure. Another desirable feature of the procedures is that they are computationally feasible for any number of hypotheses. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
295
305
http://hdl.handle.net/10.1093/biomet/asn013
application/pdf
Access to full text is restricted to subscribers.
Arthur Cohen
H. B. Sackrowitz
Minya Xu
Steven Buyske
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:107-1172013-03-04RePEc:oup:biomet
article
Confidence intervals for spectral mean and ratio statistics
We propose a new method, to construct confidence intervals for spectral mean and related ratio statistics of a stationary process, that avoids direct estimation of their asymptotic variances. By introducing a bandwidth, a self-normalization procedure is adopted and the distribution of the new statistic is asymptotically nuisance-parameter free. The bandwidth is chosen using information criteria and a moving average sieve approximation. Through a simulation study, we demonstrate good finite sample performance of our method when the sample size is moderate, while a comparison with an empirical likelihood-based method for ratio statistics is made, confirming a wider applicability of our method. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
107
117
http://hdl.handle.net/10.1093/biomet/asn067
application/pdf
Access to full text is restricted to subscribers.
Xiaofeng Shao
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:958-9612013-03-04RePEc:oup:biomet
article
A note on a partial empirical likelihood
A partial profile empirical likelihood for a semiparametric mixture model (Zou et al., 2002) is shown to originate in a conditional likelihood involving additional nuisance parameters. The partial likelihood is the conditional likelihood with the nuisance parameters replaced by their estimators from the full likelihood. The conditional likelihood suggests alternative estimators. We demonstrate that the partial likelihood estimator is more efficient than an estimator for which the nuisance parameters are known. The practical implications of this counter-intuitive result are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
958
961
F. Zou
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:881-8902013-03-04RePEc:oup:biomet
article
Second-order power comparisons for a class of nonparametric likelihood-based tests
This paper compares the second-order power properties of a broad class of nonparametric likelihood tests recently introduced by Baggerly (1998) as a generalisation of Owen's (1988) empirical likelihood. It is shown that in a multi-parameter setting identity of power up to first order does not imply identity up to second order unless one considers the average power criterion. It is also shown that the empirical likelihood ratio enjoys an optimality property in terms of local maximinity. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
881
890
Francesco Bravo
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:933-9462013-03-04RePEc:oup:biomet
article
Multiple imputation when records used for imputation are not used or disseminated for analysis
When some of the records used to estimate the imputation models in multiple imputation are not used or available for analysis, the usual multiple imputation variance estimator has positive bias. We present an alternative approach that enables unbiased estimation of variances and, hence, calibrated inferences in such contexts. First, using all records, the imputer samples m values of the parameters of the imputation model. Second, for each parameter draw, the imputer simulates the missing values for all records n times. From these mn completed datasets, the imputer can analyse or disseminate the appropriate subset of records. We develop methods for interval estimation and significance testing for this approach. Methods are presented in the context of multiple imputation for measurement error. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
933
946
http://hdl.handle.net/10.1093/biomet/asn042
application/pdf
Access to full text is restricted to subscribers.
Jerome P. Reiter
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:749-7542013-03-04RePEc:oup:biomet
article
On protected estimation of an odds ratio model with missing binary exposure and confounders
We describe an estimator of the parameter indexing a model for the conditional odds ratio between a binary exposure and a binary outcome given a high-dimensional vector of confounders, when the exposure and a subset of the confounders are missing, not necessarily simultaneously, in a subsample. We argue that a recently proposed estimator restricted to complete-cases confers more protection to model misspecification than existing ones in the sense that the set of data laws under which it is consistent strictly contains each set of data laws under which each of the previous estimators are consistent. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
749
754
http://hdl.handle.net/10.1093/biomet/asr027
application/pdf
Access to full text is restricted to subscribers.
E. J. Tchetgen Tchetgen
A. Rotnitzky
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:355-3662013-03-04RePEc:oup:biomet
article
A serially correlated gamma frailty model for longitudinal count data
A Poisson-gamma model is introduced to account for between-subjects heterogeneity and within-subjects serial correlation occurring in longitudinal count data. The model extends the usual time-constant shared frailty approach to allow time-varying serially correlated gamma frailty whilst retaining standard marginal assumptions. A composite likelihood approach to estimation and testing for serial correlation is proposed. The work is motivated by a clinical trial on patient-controlled analgesia where the number of analgesic doses taken by hospital patients in successive time intervals following abdominal surgery is recorded. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
355
366
Robin Henderson
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:139-1562013-03-04RePEc:oup:biomet
article
A dependence measure for multivariate and spatial extreme values: Properties and inference
We present properties of a dependence measure that arises in the study of extreme values in multivariate and spatial problems. For multivariate problems the dependence measure characterises dependence at the bivariate level, for all pairs and all higher orders up to and including the dimension of the variable. Necessary and sufficient conditions are given for subsets of dependence measures to be self-consistent, that is to guarantee the existence of a distribution with such a subset of values for the dependence measure. For pairwise dependence, these conditions are given in terms of positive semidefinite matrices and non-differentiable, positive definite functions. We construct new nonparametric estimators for the dependence measure which, unlike all naive nonparametric estimators, impose these self-consistency properties. As the new estimators provide an improvement on the naive methods, both in terms of the inferential and interpretability properties, their use in exploratory extreme value analyses should aid the identification of appropriate dependence models. The methods are illustrated through an analysis of simulated multivariate data, which shows that a lack of self-consistency is frequently a problem with the existing estimators, and by a spatial analysis of daily rainfall extremes in south-west England, which finds a smooth decay in extremal dependence with distance. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
139
156
Martin Schlather
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:205-2202013-03-04RePEc:oup:biomet
article
Predicting cumulative incidence probability by direct binomial regression
We suggest a new simple approach for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. We consider a semiparametric regression model where some effects may be time-varying and some may be constant over time. Our estimator can be implemented by standard software. Our simulation study shows that the estimator works well and has finite-sample properties comparable with the subdistribution approach. We apply the method to bone marrow transplant data and estimate the cumulative incidence of death in complete remission following a bone marrow transplantation. Here death in complete remission and relapse are two competing events. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
205
220
http://hdl.handle.net/10.1093/biomet/asm096
application/pdf
Access to full text is restricted to subscribers.
Thomas H. Scheike
Mei-Jie Zhang
Thomas A. Gerds
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:845-8602013-03-04RePEc:oup:biomet
article
Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment
It is a challenge to evaluate experimental treatments where it is suspected that the treatment effect may only be strong for certain subpopulations, such as those having a high initial severity of disease, or those having a particular gene variant. Standard randomized controlled trials can have low power in such situations. They also are not optimized to distinguish which subpopulations benefit from a treatment. With the goal of overcoming these limitations, we consider randomized trial designs in which the criteria for patient enrollment may be changed, in a preplanned manner, based on interim analyses. Since such designs allow data-dependent changes to the population enrolled, care must be taken to ensure strong control of the familywise Type I error rate. Our main contribution is a general method for constructing randomized trial designs that allow changes to the population enrolled based on interim data using a prespecified decision rule, for which the asymptotic, familywise Type I error rate is strongly controlled at a specified level α. As a demonstration of our method, we prove new, sharp results for a simple, two-stage enrichment design. We then compare this design to fixed designs, focusing on each design's ability to determine the overall and subpopulation-specific treatment effects. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
845
860
http://hdl.handle.net/10.1093/biomet/asr055
application/pdf
Access to full text is restricted to subscribers.
M. Rosenblum
M. J. van der Laan
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:947-9602013-03-04RePEc:oup:biomet
article
Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data
We consider a class of semiparametric normal transformation models for right-censored bivariate failure times. Nonparametric hazard rate models are transformed to a standard normal model and a joint normal distribution is assumed for the bivariate vector of transformed variates. A semiparametric maximum likelihood estimation procedure is developed for estimating the marginal survival distribution and the pairwise correlation parameters. This produces an efficient estimator of the correlation parameter of the semiparametric normal transformation model, which characterizes the dependence of bivariate survival outcomes. In addition, a simple positive-mass-redistribution algorithm can be used to implement the estimation procedures. Since the likelihood function involves infinite-dimensional parameters, empirical process theory is utilized to study the asymptotic properties of the proposed estimators, which are shown to be consistent, asymptotically normal and semiparametric efficient. A simple estimator for the variance of the estimates is derived. Finite sample performance is evaluated via extensive simulations. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asn049
application/pdf
Access to full text is restricted to subscribers.
Yi Li
Ross L. Prentice
Xihong Lin
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:992-9982013-03-04RePEc:oup:biomet
article
Use of the Gibbs Sampler to Obtain Conditional Tests, with Applications
A random sample is drawn from a distribution which admits a minimal sufficient statistic for the parameters. The Gibbs sampler is proposed to generate samples, called conditionally sufficient or co-sufficient samples, from the conditional distribution of the sample given its value of the sufficient statistic. The procedure is illustrated for the gamma distribution. Co-sufficient samples may be used to give exact tests of fit; for the gamma distribution these are compared for size and power with approximate tests based on the parametric bootstrap. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
992
998
http://hdl.handle.net/10.1093/biomet/asm065
application/pdf
Access to full text is restricted to subscribers.
Richard A. Lockhart
Federico J. O'Reilly
Michael A. Stephens
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:699-7082013-03-04RePEc:oup:biomet
article
Sequential tests and estimators after overrunning based on maximum-likelihood ordering
Often in sequential trials some additional data become available after a stopping boundary has been reached. A method for incorporating such information from overrunning is developed, based on a maximum-likelihood ordering of the sample space after overrunning. This yields a p-value for the primary test and a median-unbiased estimator and confidence intervals for the parameter under test. The context is that of observing a Brownian motion with drift, with either linear stopping boundaries in continuous time or discrete-time group-sequential boundaries. The methods apply to many clinical trials and are exemplified with data from a survival-analysis-based sequential clinical trial. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
699
708
W. J. Hall
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:655-6692013-03-04RePEc:oup:biomet
article
Estimating survival under a dependent truncation
The product-limit estimator calculated from data subject to random left-truncation relies on the testable assumption of quasi-independence between the failure time and the truncation time. In this paper, we propose a model for a truncated sample of pairs (X-sub-i,Y-sub-i) satisfying Y-sub-i > X-sub-i. A possible dependency between the truncation time and the variable of interest is modelled with a parametric family of copulas. The model also features a distribution function F-sub-X(.) and a survival distribution S-sub-Y(.) associated with the marginal behaviours of X and Y in the observable region Y > X. Semiparametric estimators for these two functions are proposed; they do not make any parametric assumption about either F-sub-X(.) or S-sub-Y(.). We derive an estimator for the copula parameter α based on the conditional Kendall's tau. We generalise the copula-graphic estimators of Zheng & Klein (1995) to truncated variables. The asymptotic distributions of all these estimators are then investigated. The methods are illustrated with a real dataset on HIV infection by transfusion and by simulations. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
655
669
http://hdl.handle.net/10.1093/biomet/93.3.655
text/html
Access to full text is restricted to subscribers.
Lajmi Lakhal Chaieb
Louis-Paul Rivest
Belkacem Abdous
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:251-2622013-03-04RePEc:oup:biomet
article
Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data
We study the profile-kernel and backfitting methods in partially linear models for clustered/longitudinal data. For independent data, despite the potential root-n inconsistency of the backfitting estimator noted by Rice (1986), the two estimators have the same asymptotic variance matrix, as shown by Opsomer & Ruppert (1999). In this paper, theoretical comparisons of the two estimators for multivariate responses are investigated. We show that, for correlated data, backfitting often produces a larger asymptotic variance than the profile-kernel method; that is, for clustered data, in addition to its bias problem, the backfitting estimator does not have the same asymptotic efficiency as the profile-kernel estimator. Consequently, the common practice of using the backfitting method to compute profile-kernel estimates is no longer advised. We illustrate this in detail by following Zeger & Diggle (1994) and Lin & Carroll (2001) with a working independence covariance structure for nonparametric estimation and a correlated covariance structure for parametric estimation. Numerical performance of the two estimators is investigated through a simulation study. Their application to an ophthalmology dataset is also described. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
251
262
Zonghui Hu
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:555-5712013-03-04RePEc:oup:biomet
article
Structured multicategory support vector machines with analysis of variance decomposition
The support vector machine has been a popular choice of classification method for many applications in machine learning. While it often outperforms other methods in terms of classification accuracy, the implicit nature of its solution renders the support vector machine less attractive in providing insights into the relationship between covariates and classes. Use of structured kernels can remedy the drawback. Borrowing the flexible model-building idea of functional analysis of variance decomposition, we consider multicategory support vector machines with analysis of variance kernels in this paper. An additional penalty is imposed on the sum of weights of functional subspaces, which encourages a sparse representation of the solution. Incorporation of the additional penalty enhances the interpretability of a resulting classifier with often improved accuracy. The proposed method is demonstrated through simulation studies and an application to real data. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
555
571
http://hdl.handle.net/10.1093/biomet/93.3.555
text/html
Access to full text is restricted to subscribers.
Yoonkyung Lee
Yuwon Kim
Sangjun Lee
Ja-Yong Koo
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:819-8342013-03-04RePEc:oup:biomet
article
A crossvalidation method for estimating conditional densities
We extend the idea of crossvalidation to choose the smoothing parameters of the 'double-kernel' local linear regression for estimating a conditional density. Our selection rule optimises the estimated conditional density function by minimising the integrated squared error. We also discuss three other bandwidth selection rules, an ad hoc method used by Fan et al. (1996), a bootstrap method of Hall et al. (1999) for bandwidth selection in the estimation of conditional distribution functions, modified by Bashtannyk & Hyndman (2001) to cover conditional density functions, and finally a simple approach proposed by Hyndman & Yao (2002). The performance of the new approach is compared with these three methods by simulation studies, and our method performs outstandingly well. The method is illustrated by an application to estimating the transition density and the Value-at-Risk of treasury-bill data. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
819
834
http://hdl.handle.net/10.1093/biomet/91.4.819
text/html
Access to full text is restricted to subscribers.
Jianqing Fan
Tsz Ho Yim
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:747-7622013-03-04RePEc:oup:biomet
article
Censored linear regression for case-cohort studies
Right-censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design this subcohort is elected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analysed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the pseudolikelihood estimating equation that has been used in relative risk regression for these models. The estimators so obtained are shown to be consistent and asymptotically normal. Variance estimation and numerical illustrations are also provided. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
747
762
http://hdl.handle.net/10.1093/biomet/93.4.747
text/html
Access to full text is restricted to subscribers.
Bin Nan
Menggang Yu
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:514-5202013-03-04RePEc:oup:biomet
article
A new class of average moment matching priors
We derive a new class of priors for the variance component in the Fay--Herriot model, a mixed regression model widely used in small area estimation. This class includes the well-known uniform or superharmonic prior. Through simulation we illustrate the use of our class of priors. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
514
520
http://hdl.handle.net/10.1093/biomet/asn008
application/pdf
Access to full text is restricted to subscribers.
N. Ganesh
P. Lahiri
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:1-142013-03-04RePEc:oup:biomet
article
Bayesian correlation estimation
We propose prior probability models for variance-covariance matrices in order to address two important issues. First, the models allow a researcher to represent substantive prior information about the strength of correlations among a set of variables. Secondly, even in the absence of such information, the increased flexibility of the models mitigates dependence on strict parametric assumptions in standard prior models. For example, the model allows a posteriori different levels of uncertainty about correlations among different subsets of variables. We achieve this by including a clustering mechanism in the prior probability model. Clustering is with respect to variables and pairs of variables. Our approach leads to shrinkage towards a mixture structure implied by the clustering. We discuss appropriate posterior simulation schemes to implement posterior inference in the proposed models, including the evaluation of normalising constants that are functions of parameters of interest. The normalising constants result from the restriction that the correlation matrix be positive definite. We discuss examples based on simulated data, a stock return dataset and a population genetics dataset. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
1
14
John C. Liechty
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:992-9962013-03-04RePEc:oup:biomet
article
On the consequences of overstratification
It is common, in particular in observational studies in epidemiology, to impose stratification to adjust for possible effects of age and other variables on the binary outcome of interest. Overstratification may lower the precision of the estimated effects of interest. Understratification risks bias. These issues are studied analytically. Asymptotic results show that loss of efficiency depends on the true effect and on a measure of the average imbalance across strata between exposed and unexposed individuals. Bias depends on the correlation between stratum-specific size imbalances and event rates in the unexposed. Approximate results are also given. An example is used. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
992
996
http://hdl.handle.net/10.1093/biomet/asn039
application/pdf
Access to full text is restricted to subscribers.
B. L. De Stavola
D. R. Cox
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:635-6512013-03-04RePEc:oup:biomet
article
Adjustment uncertainty in effect estimation
Often there is substantial uncertainty in the selection of confounders when estimating the association between an exposure and health. We define this type of uncertainty as `adjustment uncertainty'. We propose a general statistical framework for handling adjustment uncertainty in exposure effect estimation for a large number of confounders, we describe a specific implementation, and we develop associated visualization tools. Theoretical results and simulation studies show that the proposed method provides consistent estimators of the exposure effect and its variance. We also show that, when the goal is to estimate an exposure effect accounting for adjustment uncertainty, Bayesian model averaging with posterior model probabilities approximated using information criteria can fail to estimate the exposure effect and can over- or underestimate its variance. We compare our approach to Bayesian model averaging using time series data on levels of fine particulate matter and mortality. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
635
651
http://hdl.handle.net/10.1093/biomet/asn015
application/pdf
Access to full text is restricted to subscribers.
Ciprian M. Crainiceanu
Francesca Dominici
Giovanni Parmigiani
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:619-6322013-03-04RePEc:oup:biomet
article
Semiparametric Box–Cox power transformation models for censored survival observations
The accelerated failure time model specifies that the logarithm of the failure time is linearly related to the covariate vector without assuming a parametric error distribution. In this paper, we consider the semiparametric Box--Cox transformation model, which includes the above regression model as a special case, to analyse possibly censored failure time observations. Inference procedures for the transformation and regression parameters are proposed via a resampling technique. Prediction of the survival function of future subjects with a specific covariate vector is also provided via pointwise and simultaneous interval estimates. All the proposals are illustrated with datasets from two clinical studies. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
619
632
http://hdl.handle.net/10.1093/biomet/92.3.619
text/html
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
L. J. Wei
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:831-8462013-03-04RePEc:oup:biomet
article
Model-assisted estimation for complex surveys using penalised splines
Estimation of finite population totals in the presence of auxiliary information is considered. A class of estimators based on penalised spline regression is proposed. These estimators are weighted linear combinations of sample observations, with weights calibrated to known control totals. They allow straightforward extensions to multiple auxiliary variables and to complex designs. Under standard design conditions, the estimators are design consistent and asymptotically normal, and they admit consistent variance estimation using familiar design-based methods. Data-driven penalty selection is considered in the context of unequal probability sampling designs. Simulation experiments show that the estimators are more efficient than parametric regression estimators when the parametric model is incorrectly specified, while being approximately as efficient when the parametric specification is correct. An example using Forest Health Monitoring survey data from the U.S. Forest Service demonstrates the applicability of the methodology in the context of a two-phase survey with multiple auxiliary variables. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
831
846
http://hdl.handle.net/10.1093/biomet/92.4.831
text/html
Access to full text is restricted to subscribers.
F. J. Breidt
G. Claeskens
J. D. Opsomer
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:987-9942013-03-04RePEc:oup:biomet
article
The distribution of the difference between two t-variates
In this paper, the difference between two correlated t variables is divided by a function of their sample correlation and the distribution of the resulting quantity is examined. Functions of the sample correlation are found for which this quantity is approximately pivotal and has a t distribution, asymptotically. Simulations show that the asymptotic results hold well for small sample sizes. The results yield a useful test for comparing the difference in standardised scores of an individual with those of a group of controls. The test assumes that sampling is from a bivariate normal distribution and robustness of the test to departure from normality is examined. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
987
994
http://hdl.handle.net/10.1093/biomet/91.4.987
text/html
Access to full text is restricted to subscribers.
Paul H. Garthwaite
John R. Crawford
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:591-6022013-03-04RePEc:oup:biomet
article
Testing model adequacy for dynamic panel data with intercorrelation
We give several definitions of residual autocorrelations and derive their joint asymptotic distribution for the panel time series model of Hjellvik & Tj�stheim (1999a). A portmanteau goodness-of-fit test arises naturally from the asymptotic distribution. Simulation results show that the asymptotic standard errors compared satisfactorily with the empirical standard errors, that the goodness-of-fit test has reasonable empirical size, and that it is powerful enough to be useful with a modest sample size. The results of this paper are illustrated with a real-data example. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
591
602
Bo Fu
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:979-9852013-03-04RePEc:oup:biomet
article
False discovery rate for scanning statistics
The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
979
985
http://hdl.handle.net/10.1093/biomet/asr057
application/pdf
Access to full text is restricted to subscribers.
D. O. Siegmund
N. R. Zhang
B. Yakir
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:717-7232013-03-04RePEc:oup:biomet
article
Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models
Hierarchical likelihood provides a statistically efficient procedure for frailty models. Recently, a method using the computationally attractive orthodox best linear unbiased predictor has been proposed; this uses Pearson-type estimation. We compare both approaches and discuss their relative merits. With semiparametric frailty models difficulties can arise for the orthodox method, if the number of nuisance parameters increases with the sample size. This difficulty is avoided by the use of the hierarchical-likelihood method. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
717
723
http://hdl.handle.net/10.1093/biomet/92.3.717
text/html
Access to full text is restricted to subscribers.
Il Do Ha
Youngjo Lee
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:559-5782013-03-04RePEc:oup:biomet
article
Fractional hot deck imputation
To compensate for item nonresponse, hot deck imputation procedures replace missing values with values that occur in the sample. Fractional hot deck imputation replaces each missing observation with a set of imputed values and assigns a weight to each imputed value. Under the model in which observations in an imputation cell are independently and identically distributed, fractional hot deck imputation is shown to be an effective imputation procedure. A consistent replication variance estimation procedure for estimators computed with fractional imputation is suggested. Simulations show that fractional imputation and the suggested variance estimator are superior to multiple imputation estimators in general, and much superior to multiple imputation for estimating the variance of a domain mean. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
559
578
Jae Kwang Kim
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:809-8252013-03-04RePEc:oup:biomet
article
Bayesian model selection for partially observed diffusion models
We present an approach to Bayesian model selection for finitely observed diffusion processes. We use data augmentation by treating the paths between observed points as missing data. For a fixed model formulation, the strong dependence between the missing paths and the volatility of the diffusion can be broken down by adopting the method of Roberts & Stramer (2001). We describe how this method may be extended to the case of model selection via reversible jump Markov chain Monte Carlo. In addition we extend the formulation of a diffusion model to capture a potential non-Markov state dependence in the drift. Issues of appropriate choices of priors and efficient transdimensional proposal distributions for the reversible jump algorithm are also addressed. The approach is illustrated using simulated data and an example from finance. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
809
825
http://hdl.handle.net/10.1093/biomet/93.4.809
text/html
Access to full text is restricted to subscribers.
Petros Dellaportas
Nial Friel
Gareth O. Roberts
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:425-4462013-03-04RePEc:oup:biomet
article
Bayes linear kinematics and Bayes linear Bayes graphical models
Probability kinematics (Jeffrey, 1965, 1983) furnishes a method for revising a prior probability specification based upon new probabilities over a partition. We develop a corresponding Bayes linear kinematic for a Bayes linear analysis given information which changes our beliefs about a random vector in some generalised way. We derive necessary and sufficient conditions for commutativity of successive Bayes linear kinematics which depend upon the eigenstructure of the joint kinematic resolution transform. As an application we introduce the Bayes linear Bayes graphical model, which is a mixture of fully Bayesian and Bayes linear graphical models, combining the simplicity of Gaussian graphical models with the ability to allow conditioning on marginal distributions of any form, and exploit Bayes linear kinematics to embed full conditional updates within Bayes linear belief adjustments. The theory is illustrated with a treatment of partition testing for software reliability. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
425
446
Michael Goldstein
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:199-2082013-03-04RePEc:oup:biomet
article
A nonparametric test for panel count data
Panel count data arise when a recurrent event is under investigation and each study subject is observed only at discrete time points. In this situation, observed data include only the numbers of occurrences of the event of interest between observation time points and no information is available on subjects between their observation time points. We propose a nonparametric test for comparing the point processes characterising the recurrent event when only panel count data are available. The asymptotic distribution of the test statistic is derived and a simulation study is conducted to evaluate its performance. The method is illustrated using data from a medical follow-up study. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
199
208
Jianguo Sun
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:494-4962013-03-04RePEc:oup:biomet
article
Dimension reduction in time series and the dynamic factor model
This note shows that the dimension reduction method proposed by Li & Shedden (2002) is equivalent to the dynamic factor model introduced by Peña & Box (1987). Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
494
496
http://hdl.handle.net/10.1093/biomet/asp009
application/pdf
Access to full text is restricted to subscribers.
Daniel Peña
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:567-5822013-03-04RePEc:oup:biomet
article
Semiparametric inference in mixture models with predictive recursion marginal likelihood
Predictive recursion is an accurate and computationally efficient algorithm for nonparametric estimation of mixing densities in mixture models. In semiparametric mixture models, however, the algorithm fails to account for any uncertainty in the additional unknown structural parameter. As an alternative to existing profile likelihood methods, we treat predictive recursion as a filter approximation by fitting a fully Bayes model, whereby an approximate marginal likelihood of the structural parameter emerges and can be used for inference. We call this the predictive recursion marginal likelihood. Convergence properties of predictive recursion under model misspecification also lead to an attractive construction of this new procedure. We show pointwise convergence of a normalized version of this marginal likelihood function. Simulations compare the performance of this new approach with that of existing profile likelihood methods and with Dirichlet process mixtures in density estimation. Mixed-effects models and an empirical Bayes multiple testing application in time series analysis are also considered. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
567
582
http://hdl.handle.net/10.1093/biomet/asr030
application/pdf
Access to full text is restricted to subscribers.
Ryan Martin
Surya T. Tokdar
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:847-8622013-03-04RePEc:oup:biomet
article
Bivariate current status data with univariate monitoring times
For bivariate current status data with univariate monitoring times, the identifiable part of the joint distribution is three univariate cumulative distribution functions, namely the two marginal distributions and the bivariate cumulative distribution function evaluated on the diagonal. We show that smooth functionals of these univariate cumulative distribution functions can be efficiently estimated with easily computed nonparametric maximum likelihood estimators based on reduced data consisting of univariate current status observations. This theory is then applied to functionals that address independence of the two survival times and the goodness-of-fit of a copula model used by Wang & Ding (2000). Some brief simulations are provided along with an illustration based on data on HIV transmission. Extension of the ideas to incorporate covariates, possibly time-dependent, are discussed. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
847
862
http://hdl.handle.net/10.1093/biomet/92.4.847
text/html
Access to full text is restricted to subscribers.
Nicholas P. Jewell
Mark van der Laan
Xiudong Lei
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:59-742013-03-04RePEc:oup:biomet
article
Local polynomial regression analysis of clustered data
This paper proposes a classical weighted least squares type of local polynomial smoothing for the analysis of clustered data, with the key idea of using generalised inverses of correlation matrices. The estimator has a simple closed-form expression. Simplicity is achieved also for nonparametric generalised linear models with arbitrary link function via a transformation. Our approach can be characterised by 'local observations with local variances', which yields intuitively correct results in the sense that correct/incorrect specification of within-cluster correlation has respective positive/negative effects. The approach is a natural extension of classical local polynomial smoothing. Consequently, existing theory can be largely carried over and important issues such as bandwidth selection can be tackled in the classical fashion. Moreover, the approach can handle various types of covariate, such as cluster-level, subject-level or partially cluster-level. Numerical studies support the theoretical results. The method is illustrated with a real example on luteinising hormone levels in cows. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
59
74
http://hdl.handle.net/10.1093/biomet/92.1.59
text/html
Access to full text is restricted to subscribers.
Kani Chen
Zhezhen Jin
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:253-2562013-03-04RePEc:oup:biomet
article
A Note on repeated p-values for group sequential designs
One-sided confidence intervals and overall p-values for group-sequential designs are typically based on a sample space ordering which determines both the overall p-value and the corresponding confidence bound. Accordingly, the strength of evidence against the null hypothesis is consistently measured by both quantities such that the order of the p-values of two distinct sample points is consistent with the order of the respective confidence bounds. An exception is the commonly used repeated p-values and repeated confidence intervals. We show that they are not ordering-consistent in the above sense and propose an alternative repeated p-value which is ordering-consistent and has the monitoring property of the classical repeated p-value in being valid even when deviating from the prefixed stopping rule. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
253
256
http://hdl.handle.net/10.1093/biomet/asm080
application/pdf
Access to full text is restricted to subscribers.
Martin Posch
Gernot Wassmer
Werner Brannath
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:995-9992013-03-04RePEc:oup:biomet
article
Wild bootstrap for quantile regression
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
995
999
http://hdl.handle.net/10.1093/biomet/asr052
application/pdf
Access to full text is restricted to subscribers.
Xingdong Feng
Xuming He
Jianhua Hu
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:1006-10082013-03-04RePEc:oup:biomet
article
A note on nonparametric quantile inference for competing risks and more complex multistate models
Nonparametric quantile inference for competing risks has recently been studied by Peng & Fine (2007). Their key result establishes uniform consistency and weak convergence of the inverse of the Aalen--Johansen estimator of the cumulative incidence function, using the representation of the cumulative incidence estimator as a sum of independent and identically distributed random variables. The limit process is of a form similar to that of the standard survival result, but with the cause-specific hazard of interest replacing the all-causes hazard. We show that this fact is not a coincidence, but can be derived from a general Hadamard differentiation result. We discuss a simplified proof and extensions of the approach to more complex multistate models. As a further consequence, we find that the bootstrap works. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
1006
1008
http://hdl.handle.net/10.1093/biomet/asn044
application/pdf
Access to full text is restricted to subscribers.
Jan Beyersmann
Martin Schumacher
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:85-982013-03-04RePEc:oup:biomet
article
Covariance matrix selection and estimation via penalised normal likelihood
We propose a nonparametric method for identifying parsimony and for producing a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L-sub-1 and L-sub-2 penalities are shown to be closely related to Tibshirani's (1996) LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L-sub-1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed for computing the estimator and selecting the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 � 102 covariance matrix. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
85
98
http://hdl.handle.net/10.1093/biomet/93.1.85
text/html
Access to full text is restricted to subscribers.
Jianhua Z. Huang
Naiping Liu
Mohsen Pourahmadi
Linxu Liu
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:683-7032013-03-04RePEc:oup:biomet
article
Temporal process regression
We consider regression for response and covariates which are temporal processes observed over intervals. A functional generalised linear model is proposed which includes extensions of standard models in multi-state survival analysis. Simple nonparametric estimators of time-indexed parameters are developed using 'working independence' estimating equations and are shown to be uniformly consistent and to converge weakly to Gaussian processes. The procedure does not require smoothing or a Markov assumption, unlike approaches based on transition intensities. The usual definition of optimal estimating equations for parametric models is then generalised to the functional model and the optimum is identified in a class of functional generalised estimating equations. Simulations demonstrate large efficiency gains relative to working independence at times where censoring is heavy. The estimators are the basis for new tests of the covariate effects and for the estimation of models in which greater structure is imposed on the parameters, providing novel goodness-of-fit tests. The methodology's practical utility is illustrated in a data analysis. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
683
703
J. P. Fine
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:543-5512013-03-04RePEc:oup:biomet
article
The weighted log-rank class of permutation tests: P-values and confidence intervals using saddlepoint methods
Test statistics from the weighted log-rank class are commonly used to compare treatment with control when there is right censoring. This paper uses saddlepoint methods to determine mid-p-values from the null permutation distributions of tests from the weighted log-rank class. Analytical saddlepoint computations replace the permutation simulations and provide mid-p-values that are virtually exact for all practical purposes. The speed of these saddlepoint computations makes it practicable to invert the weighted log-rank tests to determine nominal 95% confidence intervals for the treatment effect with right-censored data. Such analytical inversions lead to permutation confidence intervals that are easily computed and virtually identical to the exact intervals that would normally require massive amounts of simulation. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
543
551
http://hdl.handle.net/10.1093/biomet/asm060
application/pdf
Access to full text is restricted to subscribers.
Ehab F. Abd-Elfattah
Ronald W. Butler
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:19-352013-03-04RePEc:oup:biomet
article
Model selection and estimation in the Gaussian graphical model
We propose penalized likelihood methods for estimating the concentration matrix in the Gaussian graphical model. The methods lead to a sparse and shrinkage estimator of the concentration matrix that is positive definite, and thus conduct model selection and estimation simultaneously. The implementation of the methods is nontrivial because of the positive definite constraint on the concentration matrix, but we show that the computation can be done effectively by taking advantage of the efficient maxdet algorithm developed in convex optimization. We propose a BIC -type criterion for the selection of the tuning parameter in the penalized likelihood methods. The connection between our methods and existing methods is illustrated. Simulations and real examples demonstrate the competitive performance of the new methods. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
19
35
http://hdl.handle.net/10.1093/biomet/asm018
application/pdf
Access to full text is restricted to subscribers.
Ming Yuan
Yi Lin
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:731-7432013-03-04RePEc:oup:biomet
article
On the applicability of regenerative simulation in Markov chain Monte Carlo
We consider the central limit theorem and the calculation of asymptotic standard errors for the ergodic averages constructed in Markov chain Monte Carlo. Chan & Geyer (1994) established a central limit theorem for ergodic averages by assuming that the underlying Markov chain is geometrically ergodic and that a simple moment condition is satisfied. While it is relatively straightforward to check Chan & Geyer's conditions, their theorem does not lead to a consistent and easily computed estimate of the variance of the asymptotic normal distribution. Conversely, Mykland et al. (1995) discuss the use of regeneration to establish an alternative central limit theorem with the advantage that a simple, consistent estimator of the asymptotic variance is readily available. However, their result assumes a pair of unwieldy moment conditions whose verification is difficult in practice. In this paper, we show that the conditions of Chan & Geyer's theorem are sufficient to establish the central limit theorem of Mykland et al. This result, in conjunction with other recent developments, should pave the way for more widespread use of the regenerative method in Markov chain Monte Carlo. Our results are illustrated in the context of the slice sampler. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
731
743
James P. Hobert
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:93-1062013-03-04RePEc:oup:biomet
article
Flexible generalized t-link models for binary response data
A critical issue in modelling binary response data is the choice of the links. We introduce a new link based on the generalized t-distribution. There are two parameters in the generalized t-link: one parameter purely controls the heaviness of the tails of the link and the second parameter controls the scale of the link. Two major advantages are offered by the generalized t-links. First, a symmetric generalized t-link with an unknown shape parameter is much more identifiable than a Student t-link with unknown degrees of freedom and a known scale parameter. Secondly, skewed generalized t-links with both unknown shape and scale parameters provide much more flexible and improved skewed link regression models than the existing skewed links. Various theoretical properties and attractive features of the proposed links are examined and explored in detail. An efficient Markov chain Monte Carlo algorithm is developed for sampling from the posterior distribution. The deviance information criterion measure is used for guiding the choice of links. The proposed methodology is motivated and illustrated by prostate cancer data. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
93
106
http://hdl.handle.net/10.1093/biomet/asm079
application/pdf
Access to full text is restricted to subscribers.
Sungduk Kim
Ming-Hui Chen
Dipak K. Dey
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:893-9072013-03-04RePEc:oup:biomet
article
Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures
We propose probabilistic lower bounds for the number of false null hypotheses when testing multiple hypotheses of association simultaneously. The bounds are valid under general and unknown dependence structures between the test statistics. The power of the proposed estimator to detect the full proportion of false null hypotheses is discussed and compared to other estimators. The proposed estimator is shown to deliver a tight probabilistic lower bound for the number of false null hypotheses in a multiple testing situation even under strong dependence between test statistics. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
893
907
http://hdl.handle.net/10.1093/biomet/92.4.893
text/html
Access to full text is restricted to subscribers.
Nicolai Meinshausen
Peter Buhlmann
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:313-3342013-03-04RePEc:oup:biomet
article
Inference on fractal processes using multiresolution approximation
We consider Bayesian inference via Markov chain Monte Carlo for a variety of fractal Gaussian processes on the real line. These models have unknown parameters in the covariance matrix, requiring inversion of a new covariance matrix at each Markov chain Monte Carlo iteration. The processes have no suitable independence properties so this becomes computationally prohibitive. We surmount these difficulties by developing a computational algorithm for likelihood evaluation based on a 'multiresolution approximation' to the original process. The method is computationally very efficient and widely applicable, making likelihood-based inference feasible for large datasets. A simulation study indicates that this approach leads to accurate estimates for underlying parameters in fractal models, including fractional Brownian motion and fractional Gaussian noise, and functional parameters in the recently introduced multifractional Brownian motion. We apply the method to a variety of real datasets and illustrate its application to prediction and to model selection. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
313
334
http://hdl.handle.net/10.1093/biomet/asm025
application/pdf
Access to full text is restricted to subscribers.
Kenneth Falconer
Carmen Fern�ndez
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:411-4212013-03-04RePEc:oup:biomet
article
Weighted chi-squared tests for partial common principal component subspaces
We consider tests of the null hypothesis that g covariance matrices have a partial common principal component subspace of dimension s. Our approach uses a dimensionality matrix which has its rank equal to s when the hypothesis holds. The test can then be based on a statistic computed from the eigenvalues of an estimate of this dimensionality matrix. The asymptotic distribution of this statistic is that of a linear combination of independent one-degree-of-freedom chi-squared random variables. Simulation results indicate that this test yields significance levels that come closer to the nominal level than do those of a previously proposed method. The procedure is also extended to a test that g correlation matrices have a partial common principal component subspace. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
411
421
James R. Schott
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:976-9812013-03-04RePEc:oup:biomet
article
Conditional likelihood inference under complex ascertainment using data augmentation
In many applications, particularly in genetics, samples are drawn under complex ascertainment rules. For example, families may only be selected for study if two or more siblings have trait values exceeding some threshold. The correct likelihood for inference in such situations involves the probabilities of ascertainment, and these are frequently intractable. A consistent, but not fully efficient, method of analysis of such studies is proposed. The main idea is to augment the data with additional pseudo-observations simulated under the ascertainment scheme, and to analyse using a conditional likelihood for discrimination between true observations and pseudo-observations. Ascertainment probabilities cancel in this likelihood. The method is illustrated with a simple example involving left-truncated failure times. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
976
981
David Clayton
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:482-4882013-03-04RePEc:oup:biomet
article
On sufficient conditions for Bayesian consistency
This paper contributes to the theory of Bayesian consistency for a sequence of posterior and predictive distributions arising from an independent and identically distributed sample. A new sufficient condition for posterior Hellinger consistency is presented which provides motivation for recent results appearing in the literature. Such motivation is important since current sufficient conditions are not known to be necessary. It also provides new insights into Bayesian consistency. A new consistency theorem for the sequence of predictive densities is given. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
482
488
Stephen Walker
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:233-2402013-03-04RePEc:oup:biomet
article
Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data
We propose an alternative representation of the cause-specific cross hazard ratio for bivariate competing risks data. The representation leads to a simple plug-in estimator, unlike an existing ad hoc procedure. The large sample properties of the resulting inferences are established. Simulations and a real data example demonstrate that the proposed methodology may substantially reduce the computational burden of the existing procedure, while maintaining similar efficiency properties. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
233
240
http://hdl.handle.net/10.1093/biomet/asm089
application/pdf
Access to full text is restricted to subscribers.
Yu Cheng
Jason P. Fine
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:451-4582013-03-04RePEc:oup:biomet
article
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent