Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the Oil Flow dataset and a dataset of protein electrostatic potentials for the Major ...
Understanding user behavior in software applications is of significant interest to software developers and companies. By having a better understanding of the user needs and usage patterns, the developers can design a more efficient workflow, add new features, or even automate the users workflow. In this thesis, I propose novel latent variable models to understand, predict and eventually automate the user interaction with a software application. I start by analyzing users clicks using time series models; I introduce models and inference algorithms for time series segmentation which are scalable to large-scale user datasets. Next, using deep generative models (e.g. conditional variational autoencoder) and some related models, I introduce a framework for automating the user interaction with a software application. I focus on photo enhancement applications, but this framework can be applied to any domain where segmentation, prediction and personalization is valuable. Finally, by combining ...
Items where Subject is 5. Quantitative Data Handling and Data Analysis , 5.10 Latent Variable Models , 5.10.7 Confirmatory factor analysis ...
Statistics - Exponential distribution - Exponential distribution or negative exponential distribution represents a probability distribution to describe the time between events in a Poisson process. In
Finite mixture models have now been used for more than hundred years (Newcomb (1886), Pearson (1894)). They are a very popular statistical modeling technique given that they constitute a flexible and-easily extensible model class for (1) approximating general distribution functions in a semi-parametric way and (2) accounting for unobserved heterogeneity. The number of applications has tremendously increased in the last decades as model estimation in a frequentist as well as a Bayesian framework has become feasible with the nowadays easily available computing power. The simplest finite mixture models are finite mixtures of distributions which are used for model-based clustering. In this case the model is given by a convex combination of a finite number of different distributions where each of the distributions is referred to as component. More complicated mixtures have been developed by inserting different kinds of models for each component. An obvious extension is to estimate a generalized linear model
Downloadable (with restrictions)! The multivariate probit model is very useful for analyzing correlated multivariate dichotomous data. Recently, this model has been generalized with a confirmatory factor analysis structure for accommodating more general covariance structure, and it is called the MPCFA model. The main purpose of this paper is to consider local influence analysis, which is a well-recognized important step of data analysis beyond the maximum likelihood estimation, of the MPCFA model. As the observed-data likelihood associated with the MPCFA model is intractable, the famous Cooks approach cannot be applied to achieve local influence measures. Hence, the local influence measures are developed via Zhu and Lees [Local influence for incomplete data model, J. Roy. Statist. Soc. Ser. B 63 (2001) 111-126.] approach that is closely related to the EM algorithm. The diagnostic measures are derived from the conformal normal curvature of an appropriate function. The building blocks are computed via a
At the crossroads between statistics and machine learning, probabilistic graphical models provide a powerful formal framework to model complex data. Probabilistic graphical models are probabilistic ... More. At the crossroads between statistics and machine learning, probabilistic graphical models provide a powerful formal framework to model complex data. Probabilistic graphical models are probabilistic models whose graphical components denote conditional independence structures between random variables. The probabilistic framework makes it possible to deal with data uncertainty while the conditional independence assumption helps process high dimensional and complex data. Examples of probabilistic graphical models are Bayesian networks and Markov random fields, which represent two of the most popular classes of such models. With the rapid advancements of high-throughput technologies and the ever decreasing costs of these next generation technologies, a fast-growing volume of biological data of ...
This MATLAB function returns a logical value (h) with the rejection decision from conducting a likelihood ratio test of model specification.
Downloadable! Monetary policy rule parameters are usually estimated at the mean of the interest rate distribution conditional on inflation and an output gap. This is an incomplete description of monetary policy reactions when the parameters are not uniform over the conditional distribution of the interest rate. I use quantile regressions to estimate parameters over the whole conditional distribution of the Federal Funds Rate. Inverse quantile regressions are applied to deal with endogeneity. Realtime data of inflation forecasts and the output gap are used. I find significant and systematic variations of parameters over the conditional distribution of the interest rate.
In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit.[1] The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model. A probit model is a popular specification for an ordinal[2] or a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. The probit model, which employs a probit link function, is most often estimated using the standard maximum likelihood procedure, such an estimation being called a probit regression. ...
A Goodness-of-Fit Test for Multivariate Normal Distribution Using Modified Squared Distance - Multivariate normal distribution;goodness-of-fit test;empirical distribution function;modified squared distance;
TY - JOUR. T1 - Adaptive Piecewise Linear Bits Estimation Model for MPEG Based Video Coding. AU - Cheng, Jia Bao. AU - Hang, Hsueh-Ming. PY - 1997/3. Y1 - 1997/3. N2 - In many video compression applications, it is essential to control precisely the bit rate produced by the encoder. One critical element in a bits/buffer control algorithm is the bits model that predicts the number of compressed bits when a certain quantization stepsize is used. In this paper, we propose an adaptive piecewise linear bits estimation model with a tree structure. Each node in the tree is associated with a linear relationship between the compressed bits and the activity measure divided by stepsize. The parameters in this relationship are adjusted by the least mean squares algorithm. The effectiveness of this algorithm is demonstrated by an example of digital VCR application. Simulation results indicate that this bits model has a fast adaptation speed even during scene changes. Compared to the bits model derived from ...
This paper analyzes the consistency properties of classical estimators for limited dependent variables models, under conditions of serial correlation in the unobservables. A unified method of proof is used to show that for certain cases (e.g., Probit, Tobit and Normal Switching Regimes models, which are normality-based) estimators that neglect particular types of serial dependence (specifically, corresponding to the class of
Theory and lecture notes of Chi-square goodness-of-fit test all along with the key concepts of chi-square goodness-of-fit test, Interpreting the Claim. Tutorsglobe offers homework help, assignment help and tutors assistance on Chi-square goodness-of-fit test.
Analysis of potentially multimodal data is a natural application of finite mixture models. In this case, the modeling is complicated by the question of the variance for each of the components. Using identical variances for each component could obscure underlying structure, but the additional flexibility granted by component-specific variances might introduce spurious features. You can use PROC HPFMM to prepare analyses for equal and unequal variances and use one of the available fit statistics to compare the resulting models. You can use the model selection facility to explore models with varying numbers of mixture components-say, from three to seven as investigated in Roeder (1990). The following statements select the best unequal-variance model using Akaikes information criterion (AIC), which has a built-in penalty for model complexity: ...
Yes, but more than that -- they tend to be heavily right skew and the variability tends to increase when the mean gets larger.. Heres an example of a claim-size distribution for vehicle claims:. https://ars.els-cdn.com/content/image/1-s2.0-S0167668715303358-gr5.jpg. (Fig 5 from Garrido, Genest & Schulz (2016) Generalized linear models for dependent frequency and severity of insurance claims, Insurance: Mathematics and Economics, Vol 70, Sept., p205-215. https://www.sciencedirect.com/science/article/pii/S0167668715303358). This shows a typical right-skew and heavy right tail. However we must be very careful because this is a marginal distribution, and we are writing a model for the conditional distribution, which will typically be much less skew (the marginal distribution we look at if we just do a histogram of claim sizes being a mixture of these conditional distributions). Nevertheless it is typically the case that if we look at the claim size in subgroups of the predictors (perhaps ...
As you have described it, there is not enough information to know how to conditional probability of the child from the parents. You have described that you have the marginal probabilities of each node; this tells you nothing about the relationship between nodes. For example, if you observed that 50% of people in a study take a drug (and the others take placebo), and then you later note that 20% of the people in the study had an adverse outcome, you do not have enough information to know how the probability of the child (adverse outcome) depends on the probability of the parent (taking the drug). You need to know the joint distribution of the parents and child to learn the conditional distribution. The joint distribution requires that you know the probability of the combination of all possible values for the parents and the children. From the joint distribution, you can use the definition of conditional probability to find the conditional distribution of the child on the parents.. ...
BACKGROUND: In addition to their use in detecting undesired real-time PCR products, melting temperatures are useful for detecting variations in the desired target sequences. Methodological improvements in recent years allow the generation of high-resolution melting-temperature (Tm) data. However, there is currently no convention on how to statistically analyze such high-resolution Tm data. RESULTS: Mixture model analysis was applied to Tm data. Models were selected based on Akaikes information criterion. Mixture model analysis correctly identified categories in Tm data obtained for known plasmid targets. Using simulated data, we investigated the number of observations required for model construction. The precision of the reported mixing proportions from data fitted to a preconstructed model was also evaluated. CONCLUSION: Mixture model analysis of Tm data allows the minimum number of different sequences in a set of amplicons and their relative frequencies to be determined. This approach allows Tm data
div class=share-contianer, ,ul class=social-icons social-icons-color custom-share-buttons, ,li, ,a href=https://www.facebook.com/sharer/sharer.php?u=https://ei.is.tuebingen.mpg.de/~janzing/janzingmzlzdss2012 onclick=popupCenter($(this).attr(href), , 580, 470); return false; class=popup social_facebook,,/a, ,!-- ,a href=https://www.facebook.com/sharer/sharer.php?s=100&p[title]=While conventional approaches to causal inference are mainly based on conditional (in)dependences, recent methods also account for the shape of (conditional) distributions. The idea is that the causal hypothesis X causes Y imposes that the marginal distribution PX and the conditional distribution PY,X represent independent mechanisms of nature. Recently it has been postulated that the shortest description of the joint distribution PX,Y should therefore be given by separate descriptions of PX and PY,X. Since description length in the sense of Kolmogorov complexity is uncomputable, practical implementations ...
Guilin Li, Szu Hui Ng, Matthias Hwai-yong Tan (2018-11-20). Bayesian Optimal Designs for Efficient Estimation of the Optimum Point with Generalised Linear Models. Quality Technology & Quantitative Management 17 (01) : 89-107. [email protected] Repository. https://doi.org/10.1080/16843703.2018. ...
This Appendix describes a method for fitting a random coefficient model by maximum likelihood. We prefer to use the scaled variance matrices $\bW_j = \sigma^{-2} \bV_j$ and $\bOmega = \sigma^{-2} \bSigma_{\rm B\,}$, so that $\bW_j = \bI_{n_d} + \bZ_j \bOmega \bZ_j \tra$ does not depend on $\sigma^2$. The log-likelihood in \ref{Fisc1} is equal to $$ \label{Fisc2} l \left (\bbeta, \sigma^2, \bOmega \right ) \,=\, -\frac{1}{2} \sum_{j=1}^m \left [ n \log \left ( \sigma^2 \right ) + \log \left \{ \det \left ( \bW_j \right ) \right \} \,+\, \frac{1}{\sigma^2} \, \be_j \tra \bW_j^{-1} \be_j \right ] \,. \tag{6}$$ where $\be_j = \by_j - \bX_j \bbeta$ is the vector of residuals for cluster $j$. We have the following closed-form expressions for the inverse and determinant of $\bW_{j\,}$: \begin{eqnarray} \label{FiscI} \bW_j^{-1} &=& \bI_{n_d} - \bZ_j \bOmega \bG_j^{-1} \bZ_j \tra \nonumber \\ \det \left ( \bW_j \right ) &=& \sigma^{2n_d} \det \left ( \bG_j \right ) \,, \end{eqnarray} where $\bG_j = ...
Conservative robust estimation methods do not appear to be currently available in the standard mixed model methods for R, where by conservative robust estimation I mean methods which work almost as well as the methods based on assumptions of normality when the assumption of normality *IS* satisfied. We are considering adding such a conservative robust estimation option for the random effects to our AD Model Builder mixed model package, glmmADMB, for R, and perhaps extending it to do robust estimation for linear mixed models at the same time. An obvious candidate is to assume something like a mixture of normals. I have tested this in a simple linear mixed model using 5% contamination with a normal with 3 times the standard deviation, which seems to be a common assumption. Simulation results indicate that when the random effects are normally distributed this estimator is about 3% less efficient, while when the random effects are contaminated with 5% outliers the estimator is about 23% more ...
Abstract: Estimating survival functions has interested statisticians for numerous years. A survival function gives information on the probability of a time-to-event of interest. Research in the area of survival analysis has increased greatly over the last several decades because of its large usage in areas related to biostatistics and the pharmaceutical industry. Among the methods which estimate the survival function, several are widely used and available in popular statistical software programs. One purpose of this research is to compare the efficiency between competing estimators of the survival function. Results are given for simulations which use nonparametric and parametric estimation methods on censored data. The simulated data sets have right-, left-, or interval-censored time points. Comparisons are done on various types of data to see which survival function estimation methods are more suitable. We consider scenarios where distributional assumptions or censoring type assumptions are ...
Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework ...
The primary area of statistical expertise in the Qian-Li Xue Lab is the development and application of statistical methods for: (1) handling the truncation of information on underlying or unobservable outcomes (e.g., disability) as a result of screening, (2) missing data, including outcome (e.g., frailty) censoring by a competing risk (e.g., mortality) and (3) trajectory analysis of multivariate outcomes. Other areas of methodologic research interests include multivariate, latent variable models. In Womens Health and Aging Studies, we have closely collaborated with scientific investigators on the design and analysis of longitudinal data relating biomarkers of inflammation, hormonal dysregulation and micronutrient deficiencies to the development and progression of frailty and disability, as well as characterizing the natural history of change in cognitive and physical function over time.. Research Areas: epidemiology, disabilities, longitudinal data, hormonal dysregulation, womens health, ...
THE CHI-SQUARE GOODNESS-OF-FIT TEST The chi-square goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single
Latent variable models have a broad set of applications in domains such as social networks, natural language processing, computer vision and computational biology. Training them on a large scale is challenging due to non-convexity of the objective function. We propose a unified framework that exploits tensor algebraic constraints of the (low order) moments of the models.
The objective of this course is to familiarize students with the basic concepts of probability and the most common distributions. This knowledge will be useful not only for future courses of Statistics of Stochastic Processes, but is also directly applicable in many situations where chance or randomness prevail. Combinatory Methods. Binomial coefficients. Sample Spaces. Probability, rules. Conditional probability, independence. Bayes Theorem. Probability distributions. . Continuous random variables, density functions. Multivariate distributions. Marginal distributions. Conditional distributions. Expected value. Moments, Chebyshevs Theorem. Moment generating functions. Product moments. Comb moments. Linear moments, conditional expectation. Uniform, Bernoulli, Binomial. Negative binomial, geometric, hyper-geometric. Poisson. Multinomial, multivariate hyper-geometric. Uniform, gamma, exponential, j-I squared. Beta distribution. Normal distribution. Normal to binomial approximation. Normal ...
Parametric statistical methods are traditionally employed in functional magnetic resonance imaging (fMRI) for identifying areas in the brain that are active with a certain degree of statistical significance. These parametric methods, however, have two major drawbacks. First, it isassumed that the observed data are Gaussian distributed and independent; assumptions that generally are not valid for fMRI data. Second, the statistical test distribution can be derived theoretically only for very simple linear detection statistics. In this work it is shown how the computational power of the Graphics Processing Unit (GPU) can be used to speedup non-parametric tests, such as random permutation tests. With random permutation tests it is possible to calculate significance thresholds for any test statistics. As an example, fMRI activity maps from the General Linear Model (GLM) and Canonical Correlation Analysis (CCA) are compared at the same significance level.. ...
Sufficient replication within subpopulations is required to make the Pearson and deviance goodness-of-fit tests valid. When there are one or more continuous predictors in the model, the data are often too sparse to use these statistics. Hosmer and Lemeshow (2000) proposed a statistic that they show, through simulation, is distributed as chi-square when there is no replication in any of the subpopulations. This test is available only for binary response models. First, the observations are sorted in increasing order of their estimated event probability. The event is the response level specified in the response variable option EVENT= , or the response level that is not specified in the REF= option, or, if neither of these options was specified, then the event is the response level identified in the Response Profiles table as Ordered Value 1. The observations are then divided into approximately 10 groups according to the following scheme. Let N be the total number of subjects. Let M be the ...
A family of scaling corrections aimed to improve the chi-square approximation of goodness-of-fit test statistics in small samples, large models, and nonnormal data was proposed in Satorra and Bentler (1994). For structural equations models, Satorra-Bentlers (SB) scaling corrections are available in standard computer software. Often, however, the interest is not on the overall fit of a model, but on a test of the restrictions that a null model say ${\cal M}_0$ implies on a less restricted one ${\cal M}_1$. If $T_0$ and $T_1$ denote the goodness-of-fit test statistics associated to ${\cal M}_0$ and ${\cal M}_1$, respectively, then typically the difference $T_d = T_0 - T_1$ is used as a chi-square test statistic with degrees of freedom equal to the difference on the number of independent parameters estimated under the models ${\cal M}_0$ and ${\cal M}_1$. As in the case of the goodness-of-fit test, it is of interest to scale the statistic $T_d$ in order to improve its chi-square approximation in ...
Mayo and Gray introduced the leverage residual-weighted elemental (LRWE) classification of regression estimators and a new method of estimation called trimmed elemental estimation (TEE), showing the efficiency and robustness of TEE point estimates. Using bootstrap methods, properties of various trimmed elemental estimator interval estimates to allow for inference are examined and estimates with ordinary least squares (OLS) and least sum of absolute values (LAV) are compared. Confidence intervals and coverage probabilities for the estimators using a variety of error distributions, sample sizes, and number of parameters are examined. To reduce computational intensity, randomly selecting elemental subsets to calculate the parameter estimates were investigated. For the distributions considered, randomly selecting 50% of the elemental regressions led to highly accurate estimates.
The aim of this thesis is to extend some methods of change-point analysis, where classically, measurements in time are examined for structural breaks, to random field data which is observed over a grid of points in multidimensional space. The thesis is concerned with the a posteriori detection and estimation of changes in the marginal distribution of such random field data. In particular, the focus lies on constructing nonparametric asymptotic procedures which take the possible stochastic dependence into account. In order to avoid having to restrict the results to specific distributional assumptions, the tests and estimators considered here use a nonparametric approach where the inference is justified by the asymptotic behavior of the considered procedures (i.e. their behavior as the sample size goes towards infinity). This behavior can often be derived from functional central limit theorems which make it possible to write the limit variables of the statistics as functionals of Wiener processes, ...
I am always struck by this same issue. Heres what I think is going on:. 1. What goes in a paper is up to the author. If the author struggled with a step or found it a bit tricky to think about themselves, then the struggle goes into the paper. Even if it might be obvious to someone with more experience in a field. I was just reading a paper with a very detailed exposition of EM for a latent logistic regression problem with conditional probability derivations, etc. (JMLR paper Learning from Crowds by Raykar et al., which is an awesome paper, even if it suffers from this flaw and number 3.). 2. What goes in a paper is up to editors. If the editors dont understand something, theyll ask for details, even if they should be obvious to the entire field. This is agreeing with Roberts point, I think. Editors like to see the author sweat, because of some kind of no-pain, no-gain esthetic that seems to permeate academic journal publishing. Its so hard to boil something down, then when you do, you get ...
By. G. Jogesh Babu and C.R. Rao, The Pennsylvania State University, University Park, USA. SUMMARY. Several nonparametric goodness-of-fit tests are based on the empirical distribution function. In the presence of nuisance parameters, the tests are generally constructed by first estimating these nuisance parameters. In such a case, it is well known that critical values shift, and the asymptotic null distribution of the test statistic may depend in a complex way on the unknown parameters. In this paper we use bootstrap methods to estimate the null distribution. We shall consider both parametric and nonparametric bootstrap methods. We shall first demonstrate that, under very general conditions, the process obtained by subtracting the population distribution function with estimated parameters from the empirical distribution has the same weak limit as the corresponding bootstrap version. Of course in the nonparametric bootstrap case a bias correction is needed. This result is used to show that the ...
We evaluate data on choices made from Convex Time Budgets (CTB) in Andreoni and Sprenger (2012a) and Augenblick et al (2015), two influential studies that proposed and applied this experimental technique. We use the Weak Axiom of Revealed Preference (WARP) to test for external consistency relative to pairwise choice, and demand, wealth and impatience monotonicity to test for internal consistency. We find that choices made by subjects in the original Andreoni and Sprenger (2012a) paper violate WARP frequently; violations of all three internal measures of monotonicity are concentrated in subjects who take advantage of the novel feature of CTB by making interior choices. Wealth monotonicity violations are more prevalent and pronounced than either demand or impatience monotonicity violations. We substantiate the importance of our desiderata of choice consistency in examining effort allocation choices made in Augenblick et al (2015), where we find considerably more demand monotonicity violations, as ...
Non-parametric smoothers can be used to test parametric models. Forms of tests: differences in in-sample performance; differences in generalization performance; whether the parametric models residuals have expectation zero everywhere. Constructing a test statistic based on in-sample performance. Using bootstrapping from the parametric model to find the null distribution of the test statistic. An example where the parametric model is correctly specified, and one where it is not. Cautions on the interpretation of goodness-of-fit tests. Why use parametric models at all? Answers: speed of convergence when correctly specified; and the scientific interpretation of parameters, if the model actually comes from a scientific theory. Mis-specified parametric models can predict better, at small sample sizes, than either correctly-specified parametric models or non-parametric smoothers, because of their favorable bias-variance characteristics; an example. Reading: Notes, chapter 10 Advanced Data Analysis ...
Get network security expert advice on VPN risk analysis and learn the risk estimation model necessary to assess SSL VPN implementation.
KernelMixtureDistribution[{x1, x2, ...}] represents a kernel mixture distribution based on the data values xi. KernelMixtureDistribution[{{x1, y1, ...}, {x2, y2, ...}, ...}] represents a multivariate kernel mixture distribution based on data values {xi, yi, ...}. KernelMixtureDistribution[..., bw] represents a kernel mixture distribution with bandwidth bw. KernelMixtureDistribution[..., bw, ker] represents a kernel mixture distribution with bandwidth bw and smoothing kernel ker.
Courvoisier, D. S., Eid, M., & Nussbeck, F. W. (2007). Mixture distribution latent state-trait analysis: Basic ideas and applications. Psychological Methods, 12(1), 80-104. doi:10.1037/1082-989X.12.1. ...
This talk will focus on the use of advanced multivariate latent variable models to aid the accelerated development of the product and the process, as
Right, so my understanding of how fractional randomness is implemented is that for each interspike interval, the interval is dependent on the values set for s.interval and s.noise (assuming s = new NetStim(x)). Specifically, the magnitude of the value of s.noise (from 0 to 1) will control the proportion by which the interval is dependent on s.interval vs random values sampled from a negative exponential distribution. For example if s.noise = 0.2, the actual interval will be 0.8*s.interval + a random duration sampled from a negative exponential distribution with a mean duration of 0.2*s.interval. I am mostly unsure about how this negative exponential distribution (i.e. X) is represented mathematically. Honestly, from my limited understanding I wouldve written it as follows ...
Motivated by problems in molecular biology and molecular physics, we propose a five-parameter torus analogue of the bivariate normal distribution for modelling the distribution of two circular random variables. The conditional distributions of the proposed distribution are von Mises. The marginal distributions are symmetric around their means and are either unimodal or bimodal. The type of shape d
The naive Bayes model is a simple model that has been used for many decades, often as a baseline, for both supervised and unsupervised learning. With a latent class variable it is one of the simplest latent variable models ...
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. depicted in a 2 Il1a × 2 table. Let denotes the number of subjects under study. Define and introduced by Cohen [1] is calculated as follows: of the kappa statistic can be estimated by method since bootstrap sampling is conducted on clusters only [24 25 26 In our study a cluster is a physician and observations within the cluster are patients. 2.2 Bootstrap sampling of clusters (physicians) 1. Assume that there are clusters (physicians) and they are indexed by {1 … clusters with replacement from the original data. The selected clusters are indexed by {1* 2 … * (= 1 … and … times to generate independent bootstrap samples Z1 … ZB. Calculate the kappa statistic corresponding to each bootstrap sample Zb following formula (1). Calculate bootstrap estimate by denotes bootstrap standard error estimate of is the 100(1 ? confidence interval following [23] with some ...
The fundamental construct in probability is random variable; the fundamental construct in statistics is random sample.. Random sample is a sampling process from a hypothetical population. Traditional statistics assumes large $n$, small $p$ ($n$ for observations, $p$ for parameters measured.) While in modern statistics, the problem typically is small $n$, large $p$.. Model in statistics is a probability distribution of one or more variables: univariate models; regression models;. Parametric and nonparametric methods do not have essential difference or comparative superiority: both are collections of models and take random samples as the sole input for estimation (frequentist). Parametric methods are algorithms selecting one from a subspace of probabilistic models, which is indexed by model parameters. Nonparametric methods are algorithms selecting one from another subspace of probabilistic models, only without an index. Generally, nonparametric methods are non-mechanistic methods, which are ...
Many studies have reported on the pattern of neuropsychological test performance across varied seizure diagnosis populations. Far fewer studies have evaluated the accuracy of the clinical neuropsychologist in formulating an impression of the seizure diagnosis based on results of neuropsychological assessment, or compared the accuracy of clinical neuropsychological judgment to results of statistical prediction. Accuracy of clinical neuropsychological versus statistical prediction was investigated in four seizure classification scenarios. While both methods outperformed chance, accuracy of clinical neuropsychological classification was either equivalent or superior to statistical prediction. Results support the utility and validity of clinical neuropsychological judgment in epilepsy treatment settings
TY - BOOK. T1 - Limiting conditional distributions for transient Markov chains on the nonnegative integers conditioned on recurrence to zero. AU - Coolen-Schrijner, Pauline. PY - 1994. Y1 - 1994. KW - METIS-142900. M3 - Report. T3 - Memorandum Faculty of Mathematical Sciences. BT - Limiting conditional distributions for transient Markov chains on the nonnegative integers conditioned on recurrence to zero. PB - University of Twente, Faculty of Mathematical Sciences. ER - ...
This article reviews the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in model selection and the appraisal of psychological theory. The focus is on latent variable models, given their growing use in theory testing and construction. Theoretical statistical results i …
We discuss the use of the beta-binomial distribution for the description of plant disease incidence data, collected on the basis of scoring plants as either diseased or healthy . The beta-binomial is a discrete probability distribution derived by regarding the probability of a plant being diseased (a constant in the binomial distribution) as a beta-distributed variable. An important characteristic of the beta-binomial is that its variance is larger than that of the binomial distribution with the same mean. The beta-binomial distribution, therefore, may serve to describe aggregated disease incidence data. Using maximum likelihood, we estimated beta-binomial parameters p (mean disease incidence) and ϑ (an index of aggregation) for four previously published sets of disease incidence data in which there were some indications of aggregation. Goodness-of-fit tests showed that, in all these cases, the beta-binomial provided a good description of the observed data and resulted in a better fit than did ...
A zero-inflated model assumes that zero outcome is due to two different processes. For instance, in the example of fishing presented here, the two processes are that a subject has gone fishing vs. not gone fishing. If not gone fishing, the only outcome possible is zero. If gone fishing, it is then a count process. The two parts of the a zero-inflated model are a binary model, usually a logit model to model which of the two processes the zero outcome is associated with and a count model, in this case, a negative binomial model, to model the count process. The expected count is expressed as a combination of the two processes. Taking the example of fishing again:. $$ E(n_{\text{fish caught}} = k) = P(\text{not gone fishing}) * 0 + P(\text{gone fishing}) * E(y = k , \text{gone fishing}) $$. To understand the zero-inflated negative binomial regression, lets start with the negative binomial model. There are multiple parameterizations of the negative binomial model, we focus on NB2. The negative ...
TY - JOUR. T1 - On estimation of partially linear transformation models. AU - Lu, Wenbin. AU - Zhang, Hao Helen. PY - 2010/6. Y1 - 2010/6. N2 - We study a general class of partially linear transformation models, which extend linear transformation models by incorporating nonlinear covariate effects in survival data analysis. A new martingale-based estimating equation approach, consisting of both global and kernelweighted local estimation equations, is developed for estimating the parametric and nonparametric covariate effects in a unified manner. We show that with a proper choice of the kernel bandwidth parameter, one can obtain the consistent and asymptotically normal parameter estimates for the linear effects. Asymptotic properties of the estimated nonlinear effects are established as well.We further suggest a simple resampling method to estimate the asymptotic variance of the linear estimates and show its effectiveness. To facilitate the implementation of the new procedure, an iterative ...
Introduction: Measurement errors can seriously affect quality of clinical practice and medical research. It is therefore important to assess such errors by conduct- ing studies to estimate a coefficients reliability and assessing its precision. The intraclass correlation coefficient (ICC), defined on a model that an observation is a sum of information and random error, has been widely used to quantify reliability for continuous measurements. Sample formulas have been derived for explicitly incorporation of a prespecified probability of achieving the prespecified precision, i.e., the width or lower limit of a confidence interval for ICC. Although the concept of ICC is applicable to binary outcomes, existed sample size formulas for this case can only provide about 50% assurance probability to achieve the desired precision. Methods: A common correlation model was adopted to characterize binary data arising from reliability studies. A large sample variance estimator for ICC was derived, which was then used
Marginal structural models are a class of statistical models used for causal inference in epidemiology. Such models handle the issue of time-dependent confounding in evaluation of the efficacy of interventions by inverse probability weighting for receipt of treatment. For instance, in the study of the effect of zidovudine in AIDS-related mortality, CD4 lymphocyte is used both for treatment indication, is influenced by treatment, and affects survival. Time-dependent confounders are typically highly prognostic of health outcomes and applied in dosing or indication for certain therapies, such as body weight or lab values such as alanine aminotransferase or bilirubin. Robins, James; Hernán, Miguel; Brumback, Babette (September 2000). Marginal Structural Models and Causal Inference in Epidemiology (PDF). Epidemiology. 11 (5): 550-60. doi:10.1097/00001648-200009000-00011. PMID 10955408. https://epiresearch.org/ser50/serplaylists/introduction-to-marginal-structural-models ...
Preface xiii. 1 Introduction: Distributions and Inference for Categorical Data 1. 1.1 Categorical Response Data, 1. 1.2 Distributions for Categorical Data, 5. 1.3 Statistical Inference for Categorical Data, 8. 1.4 Statistical Inference for Binomial Parameters, 13. 1.5 Statistical Inference for Multinomial Parameters, 17. 1.6 Bayesian Inference for Binomial and Multinomial Parameters, 22. Notes, 27. Exercises, 28. 2 Describing Contingency Tables 37. 2.1 Probability Structure for Contingency Tables, 37. 2.2 Comparing Two Proportions, 43. 2.3 Conditional Association in Stratified 2 × 2 Tables, 47. 2.4 Measuring Association in I × J Tables, 54. Notes, 60. Exercises, 60. 3 Inference for Two-Way Contingency Tables 69. 3.1 Confidence Intervals for Association Parameters, 69. 3.2 Testing Independence in Two-way Contingency Tables, 75. 3.3 Following-up Chi-Squared Tests, 80. 3.4 Two-Way Tables with Ordered Classifications, 86. 3.5 Small-Sample Inference for Contingency Tables, 90. 3.6 Bayesian ...
TY - GEN. T1 - Empirical Analysis of the performance of variance estimators in sequential single-run ranking & selection. T2 - 2016 Winter Simulation Conference, WSC 2016. AU - Pedrielli, Giulia. AU - Zhu, Yinchao. AU - Lee, Loo Hay. AU - Li, Haobin. PY - 2017/1/17. Y1 - 2017/1/17. N2 - Ranking and Selection has acquired an important role in the Simulation-Optimization field, where the different alternatives can be evaluated by discrete event simulation (DES). Black box approaches have dominated the literature by interpreting the DES as an oracle providing i.i.d. observations. Another relevant family of algorithms, instead, runs each simulator once and observes time series. This paper focuses on such a method, Time Dilation with Optimal Computing Budget Allocation (TD-OCBA), recently developed by the authors. One critical aspect of TD-OCBA is estimating the response given correlated observations. In this paper, we are specifically concerned with the estimator of the variance of the response ...
Finite mixture models emerge in many applications, particularly in biology, psychology and genetics. This dissertation focused on detecting associations between a quantitative explanatory variable and a dichotomous response variable in a situation where the population consists of a mixture. That is, there is a fraction of the population for whom there is an association between the quantitative predictor and the response and there is a fraction of individuals for whom there is no association between the quantitative predictor and the response. We developed the Likelihood Ratio Test (LRT) in the context of ordinary logistic regression models and logistic regression mixture models. However, the classical theorem for the null distribution of the LRT statistics can not be applied to finite mixture alternatives. Thus, we conjectured that the asymptotic null distribution of the LRT statistics held. We investigated how the empirical and fitted null distribution of the LRT statistics compared with our ...
In this paper, we propose a new ridge-type estimator called the new mixed ridge estimator (NMRE) by unifying the sample and prior information in linear measurement error model with additional stochastic linear restrictions. The new estimator is a generalization of the mixed estimator (ME) and ridge estimator (RE). The performances of this ...
Abstract: In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems. First, the Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes cross-validation error is asymptotically equal to ...
Bivariate multinomial data such as the left and right eyes retinopathy status data are analyzed either by using a joint bivariate probability model or by exploiting certain odds ratio-based association models. However, the joint bivariate probability model yields marginal probabilities, which are complicated functions of marginal and association parameters for both variables, and the odds ratio-based association model treats the odds ratios involved in the joint probabilities as working parameters, which are consequently estimated through certain arbitrary working regression models. Also, this later odds ratio-based model does not provide any easy interpretations of the correlations between two categorical variables. On the basis of pre-specified marginal probabilities, in this paper, we develop a bivariate normal type linear conditional multinomial probability model to understand the correlations between two categorical variables. The parameters involved in the model are consistently estimated
Yongli Shuai of the Department of Biostatistics defends his dissertation on Multinomial Logistic Regression and Prediction Accuracy for Interval-Censored Competing Risks Data. Graduate faculty of the University and all other interested parties are invit...
TY - GEN. T1 - Clustering patient length of stay using mixtures of Gaussian models and phase type distributions. AU - Garg, Lalit. AU - McClean, Sally. AU - Meenan, BJ. AU - El-Darzi, Elia. AU - Millard, Peter. PY - 2009. Y1 - 2009. N2 - Gaussian mixture distributions and Coxian phase type distributions have been popular choices model based clustering of patients length of stay data. This paper compares these models and presents an idea for a mixture distribution comprising of components of both of the above distributions. Also a mixed distribution survival tree is presented. A stroke dataset available from the English Hospital Episode Statistics database is used as a running example.. AB - Gaussian mixture distributions and Coxian phase type distributions have been popular choices model based clustering of patients length of stay data. This paper compares these models and presents an idea for a mixture distribution comprising of components of both of the above distributions. Also a mixed ...
The count data model studied in the paper extends the Poisson model by allowing for overdispersion and serial correlation. Alternative approaches to estimate nuisance parameters, required for the correction of the Poisson maximum likelihood covariance matrix estimator and for a quasi-likelihood estimator, are studied. The estimators are evaluated by finite sample Monte Carlo experimentation. It is found that the Poisson maximum likelihood estimator with corrected covariance matrix estimators provide reliable inferences for longer time series. Overdispersion test statistics are wellbehaved, while conventional portmanteau statistics for white noise have too large sizes. Two empirical illustrations are included.. ...
Changes in the spatial distributions of vegetation across the globe are routinely monitored by satellite remote sensing, in which the reflectance spectra over land surface areas are measured with spatial and temporal resolutions that depend on the satellite instrumentation. The use of multiple synchronized satellite sensors permits long-term monitoring with high spatial and temporal resolutions. However, differences in the spatial resolution of images collected by different sensors can introduce systematic biases, called scaling effects, into the biophysical retrievals. This study investigates the mechanism by which the scaling effects distort normalized difference vegetation index (NDVI). This study focused on the monotonicity of the area-averaged NDVI as a function of the spatial resolution. A monotonic relationship was proved analytically by using the resolution transform model proposed in this study in combination with a two-endmember linear mixture model. The monotonicity allowed the inherent
Cardiovascular Risk Factors, Aging, and Incidence of Dementia (CAIDE) risk score is the only currently available midlife risk score for dementia. We compared CAIDE to Framingham cardiovascular Risk Score (FRS) and FINDRISC diabetes score as predictors of dementia and assessed the role of age in their associations with dementia. We then examined whether these risk scores were associated with dementia in those free of cardiometabolic disease over the follow-up. A total of 7553 participants, 39-63 years in 1991-1993, were followed for cardiometabolic disease (diabetes, coronary heart disease, stroke) and dementia (N = 318) for a mean 23.5 years. Cox regression was used to model associations of age at baseline, CAIDE, FRS, and FINDRISC risk scores with incident dementia. Predictive performance was assessed using Roystons R2, Harrells C-index, Akaikes information criterion (AIC), the Greenwood-Nam-DAgostino (GND) test, and calibration-in-the-large. Age effect was also assessed by stratifying analyses by
Following the tradition of Carleton University yet another International Conference on Nonparametric methods for Measurement Error Models and Related topics has been arranged. In recent years, the scope of application of measurement error models has widened in biostatistics, bio -pharmakinetics , and DNA analysis to name a few among others. Our aim is to rejuvenate the research activity in nonparametric statistics by bringing scholars from all around the globe in this conference to exchange ideas for future developments to meet the need of application . Although, there is vast amount of literature on this topic, there is slow growth in developing robust inference techniques such as rank test and estimation , shrinkage estimation and S-test and estimation in measurement error models. This conference will focus on the current activities in parametric and nonparametric methods and consider new directions on the approach one should take developing these area of research ...
Alessi, L., Barigozzi, M. and Capassoc, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statistics and Probability Letters, 80, 1806-1813.. Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica. 71 135-171.. Bai, J. and Li, K. (2012). Statistical analysis of factor models of high dimension. Ann. Statist. 40, 436-465.. Bai, J. and Ng, S.(2002). Determining the number of factors in approximate factor models. Econometrica. 70 191-221.. Bickel, P. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604.. Bickel, P. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227.. Bien, J. and Tibshirani, R. (2011). Sparse estimation of a covariance matrix. Biometrika. 98, 807-820.. Breitung, J. and Tenhofen, J. (2011). GLS estimation of dynamic factor models. J. Amer. Statist. Assoc. 106, 1150-1166.. Cai, T. and Liu, W. (2011). Adaptive ...
Statistical Inference Using Maximum Likelihood Estimation and the Generalized Likelihood Ratio when the True Parameter is on the Boundary of the Parameter Space* (Feng, Ziding; McCulloch, Charles E.) 13 ...
TY - JOUR. T1 - Statistical analysis of a class of factor time series models. AU - Taniguchi, Masanobu. AU - Maeda, Kousuke. AU - Puri, Madan L.. PY - 2006/7/1. Y1 - 2006/7/1. N2 - For a class of factor time series models, which is called a multivariate time series variance component (MTV) models, we consider the problem of testing whether an observed time series belongs to this class. We propose the test statistic, and derive its symptotic null distribution. Asymptotic optimality of the proposed test is discussed in view of the local asymptotic normality. Also, numerical evaluation of the local power illuminates some interesting features of the test.. AB - For a class of factor time series models, which is called a multivariate time series variance component (MTV) models, we consider the problem of testing whether an observed time series belongs to this class. We propose the test statistic, and derive its symptotic null distribution. Asymptotic optimality of the proposed test is discussed in ...
DescriptionIn analyzing human genetic disorders, association analysis is one of the most commonly used approaches. However, there are challenges with association analysis, including differential misclassification in data that inflates the false-positive rate. In this thesis, I present a new statistical method for testing the association between disease phenotypes and multiple single nucleotide polymorphisms (SNPs). This method uses next-generation sequencing (NGS) raw data and is robust to sequencing differential misclassification. By incorporating expectation-maximization (EM) algorithm, this method computes the test statistic and estimates important parameters of the model, including misclassification. By performing simulation studies, I report that this method maintains correct type I error rates and may obtain high statistical power. ...
MUP-three is on the download quantum statistical theory of, and three on the storage with the campaign Life. We will like to make couples to create the phenomena of UK bends in the installer of Silicon Photonics. download quantum statistical theory of superconductivity from 44 customers, 19 regulations, 15 archival children, 9 drive partners.
function [logPrior,gradient] = logPDFBVS(params,mu,vin,vout,pGamma,a,b) %logPDFBVS Log joint prior for Bayesian variable selection % logPDFBVS is the log of the joint prior density of a % normal-inverse-gamma mixture conjugate model for a Bayesian linear % regression model with numCoeffs coefficients. logPDFBVS passes % params(1:end-1), the coefficients, to the PDF of a mixture of normal % distributions with hyperparameters mu, vin, vout, and pGamma, and also % passes params(end), the disturbance variance, to an inverse gamma % density with shape a and scale b. % % params: Parameter values at which the densities are evaluated, a % (numCoeffs + 1)-by-1 numeric vector. The first numCoeffs % elements correspond to the regression coefficients and the last % element corresponds to the disturbance variance. % % mu: Multivariate normal component means, a numCoeffs-by-1 numeric % vector of prior means for the regression coefficients. % % vin: Multivariate normal component scales, a numCoeffs-by-1 vector ...
During functional magnetic resonance imaging (fMRI) brain examinations, the signal extraction from a large number of images is used to evaluate changes in blood oxygenation levels by applying statistical methodology. Image registration is essential as it assists in providing accurate fractional positioning accomplished by using interpolation between sequentially acquired fMRI images. Unfortunately, current subvoxel registration methods found in standard software may produce significant bias in the variance estimator when interpolating with fractional, spatial voxel shifts. It was found that interpolation schemes, as currently applied during the registration of functional brain images, could introduce statistical bias, but there is a possible correction scheme. This bias was shown to result from the weighted-averaging process employed by conventional implementation of interpolation schemes. The most severe consequence of inaccurate variance estimators is the undesirable violation of the ...
In survival analysis, the estimation of patient-specific survivor functions that are conditional on a set of patient characteristics is of special interest. In general, knowledge of the conditional survival probabilities of a patient at all relevant time points allows better assessment of the patients risk than summary statistics, such as median survival time. Nevertheless, standard methods for analysing survival data seldom estimate the survivor function directly. Therefore, we propose the application of conditional transformation models (CTMs) for the estimation of the conditional distribution function of survival times given a set of patient characteristics. We used the inverse probability of censoring weighting approach to account for right-censored observations. Our proposed modelling approach allows the prediction of patient-specific survivor functions. In addition, CTMs constitute a flexible model class that is able to deal with proportional as well as non-proportional hazards. The ...
In Part I titled Empirical Bayes Estimation, we discuss the estimation of a heteroscedastic multivariate normal mean in terms of the ensemble risk. We first derive the ensemble minimax properties of various estimators that shrink towards zero through the empirical Bayes method. We then generalize our results to the case where the variances are given as a common unknown but estimable chi-squared random variable scaled by different known factors. We further provide a class of ensemble minimax estimators that shrink towards the common mean. We also make comparison and show differences between results from the heteroscedastic case and those from the homoscedastic model.In Part II titled Causal Inference Analysis, we study the estimation of the causal effect of treatment on survival probability up to a given time point among those subjects who would comply with the assignment to both treatment and control when both administrative censoring and noncompliance occur. In many clinical studies with a survival
The self-controlled case series (SeeS) method is commonly used to investigate associations between vaccine exposures and adverse events (side effects). It is an alternative to cohort and case control study designs. It requires information only on cases, individuals who have experienced the adverse event at least once, and automatically controls all fixed confounders that could modify the true association between exposure and adverse event. However, timevarying confounders (age, season) are not automatically controlled. The sees method has parametric and semi-parametric versions in terms of controlling the age effect. The parametric method uses piecewise constant functions with a priori chosen age ~ . groups and the semi-parametric method leaves the age effect unspecified. Mis-specification of age groups in the parametric version may lead to biased estimates of the exposure effect, and the semi-parametric approach runs into computational problems when the sample size is moderately large . ...
Many real data are naturally represented as a multidimensional array called a tensor. In classical regression and time series models, the predictors and covariate variables are considered as a vector. However, due to high dimensionality of predictor variables, these types of models are inefficient for analyzing multidimensional data. In contrast, tensor structured models use predictors and covariate variables in a tensor format. Tensor regression and tensor time series models can reduce high dimensional data to a low dimensional framework and lead to efficient estimation and prediction. In this thesis, we discuss the modeling and estimation procedures for both tensor regression models and tensor time series models. The results of simulation studies and a numerical analysis are provided.
Several information criterions, Schwarz information criterion (SIC), Akaike information criterion (AIC), and the modified Akaike information criterion (|TEX|$AIC_c$|/TEX|), are proposed to locate a change point in the multiple linear regression model. These methods are applied to a stock Exchange data set and compared to the results.
The other two models assumed mixture distributions for the SNP effects reflecting the assumption that there is a large number of SNPs with zero or near zero effects and a second smaller set of SNPs with larger significant effects. A Bayes A/B hybrid method was used. This approximation to Bayes B [1] was used to keep computational and time demands reasonable. In this algorithm, after every k Bayes A iterations, Bayes B via the reverse jump algorithm is employed. The Reverse Jump algorithm [3] is run multiple times per SNP and then any SNP with a final state of zero in the current Bayes B iterations is set to zero for the subsequent k iterations of the Bayes A. This maintains the correct transitions between models of differing dimensionality. The prior distributions are identical to that of the original Bayes B using a mixture prior distribution for the SNP variance allowing a proportion, 1-π, to be set to zero. The other proportion π is sampled from the same mixture distribution as Bayes A. ...
The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. . . . the wealth of material on statistics concerning the multivariate normal distribution is quite exceptional. As such it is a very useful source of information for the general statistician and a must for anyone wanting to penetrate deeper into the multivariate field. -Mededelingen van het Wiskundig Genootschap This book is a comprehensive and clearly written text on multivariate analysis from a theoretical point of view. -The Statistician Aspects of Multivariate Statistical Theory presents a classical mathematical treatment of the techniques, distributions, and inferences based on multivariate normal distribution. Noncentral
TY - JOUR. T1 - Statistical models for genetic susceptibility in toxicological and epidemiological investigations. AU - Piegorsch, W. W.. PY - 1994/1/1. Y1 - 1994/1/1. N2 - Models are presented for use in assessing genetic susceptibility to cancer (or other diseases) with animal or human data. Observations are assumed to be in the form of proportions, hence a binomial sampling distribution is considered. Generalized linear models are employed to model the response as a function of the genetic component; these include logistic and complementary log forms. Susceptibility is measured via odds ratios of response. relative to a background genetic group. Significance tests and confidence intervals for these odds ratios are based on maximum likelihood estimates of the regression parameters. Additional consideration is given to the problem of gene-environment interactions and to testing whether certain genetic identifiers/categories may be collapsed into a smaller set of categories. The collapsibility ...
Let X, Y be independent, standard normal random variables, and let U = X + Y and V = X - Y. (a) Find the joint probability density function of (U, V) and specify its domain. (b) Find the marginal probability density function of U.
The bootstrap method is a computer intensive statistical method that is widely used in performing nonparametric inference. Categorica ldata analysis,inparticular the analysis of contingency tables, is commonly used in applied field. This work considers nonparametric bootstrap tests for the analysis of contingency tables. There are only a few research papers which exploit this field. The p-values of tests in contingency tables are discrete and should be uniformly distributed under the null hypothesis. The results of this article show that corresponding bootstrap versions work better than the standard tests. Properties of the proposed tests are illustrated and discussed using Monte Carlo simulations. This article concludes with an analytical example that examines the performance of the proposed tests and the confidence interval of the association coefficient.. ...
ABSTRACT. In operational risk measurement, the estimation of severity distribution parameters is the main driver of capital estimates, yet this remains a nontrivial challenge for many reasons. Maximum likelihood estimation (MLE) does not adequately meet this challenge because of its well-documented nonrobustness to modest violations of idealized textbook model assumptions: specifically, that the data is independent and identically distributed (iid), which is clearly violated by operational loss data. Yet, even using iid data, capital estimates based on MLE are biased upward, sometimes dramatically, due to Jensens inequality. This overstatement of the true risk profile increases as the heaviness of the severity distribution tail increases, so dealing with data collection thresholds by using truncated distributions, which have thicker tails, increases MLE-related capital bias considerably. Truncation also augments correlation between a distributions parameters, and this exacerbates the ...
TY - JOUR. T1 - A flexible two-part random effects model for correlated medical costs. AU - Liu, Lei. AU - Strawderman, Robert L.. AU - Cowen, Mark E.. AU - Shih, Ya Chen T. PY - 2010/1/1. Y1 - 2010/1/1. N2 - In this paper, we propose a flexible two-part random effects model (Olsen and Schafer, 2001; Tooze et al., 2002) for correlated medical cost data. Typically, medical cost data are right-skewed, involve a substantial proportion of zero values, and may exhibit heteroscedasticity. In many cases, such data are also obtained in hierarchical form, e.g., on patients served by the same physician. The proposed model specification therefore consists of two generalized linear mixed models (GLMM), linked together by correlated random effects. Respectively, and conditionally on the random effects and covariates, we model the odds of cost being positive (Part I) using a GLMM with a logistic link and the mean cost (Part II) given that costs were actually incurred using a generalized gamma regression ...