Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.
Published in | American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4) |
DOI | 10.11648/j.ajtas.20160504.12 |
Page(s) | 173-179 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2016. Published by Science Publishing Group |
Discriminant Analysis, Error Rate, Monte Carlo Simulation, Error Rate Estimators
[1] | Anderson, T. W. (1951), Classification by Multivariate analysis, Psychometric, 16, 631-650. |
[2] | Efron, B. (1983), Estimating the error rate of a prediction rule: improvement on cross validation. Journal of the American Statistical Association, 78, 316-331. |
[3] | Fisher, R. A. (1936). The use of multiple measurements in taxanomic problem. Annals of Eugenics, 7, 179-188. |
[4] | Glick, N. (1978), Additive estimators for probabilities of correct classification. Pattern Recognition, 10, 211-222. |
[5] | John, N. (1961) “Errors in discrimination” Annals of Mathematical Statistics, 32, 1125-1144 |
[6] | Lachenbruch, P. A. (1967), an almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. Biometrics, 23, 639-645. |
[7] | Lachenbruch, P. A. & Michey, M. R. (1968), Estiamtion of error rates in discriminant analysis, Technometrics, 10, 1-11. |
[8] | McLachlan, G. J. (1972), An Asymptotic Unbiased Techniques. |
[9] | McLachlan, G. J. (1974),” The Asymptotic Unbiased distribution of the conditional error rate and risk in Discriminant Analysis”, Biometrics 61, 239-249. |
[10] | Moore, D. H. (1973) “Evaluation of five Discriminant procedures for binary variables’ Journal of the American Statistical Association, 68, 399-404. |
[11] | Okamoto, M. (1963), An Asymptotic Expansion for distribution of linear Discriminant function, Ann Math Stat, 34, 1286-1301. |
[12] | Okamoto, M. (1971) “Correction to the Asymptotic expansion for distribution of the linear Discriminant function” Annals of Mathematical Statistics 39, 1358-1359. |
[13] | Quenouille, M. (1949), Approximate tests of correlation in time series. Journal of the Royal Statistical Society Series B, 11, pp 18-84. |
[14] | Sayre, J. W. (1980) “The distributions of the actual error rates in linear Discriminant Analysis”. Journal of American Statistical Association, 75, 201-205. |
[15] | Sedranski, N. &Okamoto, M. (1971) “Estimation of the probabilities of misclassification for a linear Discriminant function in the Univariate normal case. Annals of the Institute of Statistical Mathematics, 23, 419-435. |
[16] | Lachenbruch, P. & Mickey, M. (1968) “Estimation of error rates in discriminant analysis”. Technometrics, vol 10, pp 167-178. |
[17] | Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical approach, Englenood cliffs, NJ: Prentice-Hall international. |
[18] | Efron, B. & Gong, G. (1983). Estimating the error rate of prediction rule, Improvement on Cross validation. Journal of American Statistical Association, vol 78, pp 316-331. |
[19] | Dongherty, E. R. & Braga-Neto, U. M. (2006). Epistemology of computational Biology: Mathematical models and Experimental prediction as the Basis of their validity. Biological Systems, vol 14 no. 1, pp 65-90. |
[20] | Vishwa Nath Maurya; Madaki, U. Y.; Vijay, V. S. 7 Babagana, M. (2015). Application of Discriminant Analysis onb Broncho-pulmonary Dysplasia among infants: A case study of UMTH and UDUS Hospitals in Maiduguri, Nigeria. American Journal of Theoretical and Applied Statistics, 4 (2-1): 44-51. |
[21] | Vishwa N. M.; Ram, B. M.; Chandra, K. J. & Avadhesh, K. M. (2015). Performance analysis of powers osskewness and kurtosis based multivariate normality tests and use of estended Monte Carlo Simulation for proposed novelty algorithm. American Journal of Theoretical and Applied Statistics, 4 (2-1): 11-18. |
[22] | Egbo, I.; Onyeagu, S. I.; Ekezie, D. D. & Uzoma, P. O. (2014). A comparison of the optimal classification Rule and maximum likelihood Rule for Binary Variables. Journal of Mathematics Research, vol 6 No. 4. |
[23] | Egbo, I.; Onyeagu, S. I. & Ekezie, D. D. (2014). A comparison of multinomial classification Rules for Binary variables. International Journal of Maths. Sci. & Eng. Appls., vol 8 No V. |
[24] | Egbo, I.; Egbo, M. &Onyeagu, S. I. (2015). Performance of Robust linear classifier with multivariate Binary variables. Journal of Mathematics Research, vol 7 No 4. |
[25] | Egbo, I. (2015). Discriminant analysis procedures under non-optimal conditions for Binary variables. American Journal of Theoretical and Applied Statistics, 4 (6): 602-609. |
APA Style
Egbo Ikechukwu. (2016). Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. American Journal of Theoretical and Applied Statistics, 5(4), 173-179. https://doi.org/10.11648/j.ajtas.20160504.12
ACS Style
Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am. J. Theor. Appl. Stat. 2016, 5(4), 173-179. doi: 10.11648/j.ajtas.20160504.12
AMA Style
Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am J Theor Appl Stat. 2016;5(4):173-179. doi: 10.11648/j.ajtas.20160504.12
@article{10.11648/j.ajtas.20160504.12, author = {Egbo Ikechukwu}, title = {Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {5}, number = {4}, pages = {173-179}, doi = {10.11648/j.ajtas.20160504.12}, url = {https://doi.org/10.11648/j.ajtas.20160504.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.12}, abstract = {Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.}, year = {2016} }
TY - JOUR T1 - Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables AU - Egbo Ikechukwu Y1 - 2016/06/04 PY - 2016 N1 - https://doi.org/10.11648/j.ajtas.20160504.12 DO - 10.11648/j.ajtas.20160504.12 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 173 EP - 179 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20160504.12 AB - Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D. VL - 5 IS - 4 ER -