Bayesian Network Parameter Learning from Data Contains Missing Values
Subject Areas : electrical and computer engineeringK. Etminani 1 * , M. Naghibzadeh 2 , M. Emadi 3 , A. R. Razavi 4
1 - Ferdosi University
2 - Ferdosi University
3 - Ferdosi University
4 -
Keywords: Bayesian network parameters likelihood Bayesian network missing value,
Abstract :
Learning Bayesian network structure from data has attracted a great deal of research in recent years. It is shown that finding the optimal network is an NP-hard problem when data is complete. This problem gets worse when data is incomplete i.e. contains missing values and/or hidden variables. Generally, there are two cases of learning Bayesian networks from incomplete data: in a known structure, and unknown structure. In this paper, we try to find the best parameters for a known structure by introducing the “effective parameter”, in a way that the likelihood of the network structure given the completed data being maximized. This approach can be attached to any algorithm such as SEM (structural expectation maximization) that needs the best parameters to be known to reach the optimal Bayesian network structure. We prove that the proposed method gains the optimal parameters with respect to the likelihood function. Results of applying the proposed method to some known Bayesian networks show the speed of the proposed method compared to the well-known methods.
[1] N. Friedman, M. Linial, I. Nachman, and D. Peer, "Using Bayesian networks to analyze expression data," Computational Biology, vol. 7, no. 3-4, pp. 601-620, 2000.
[2] J. Uebersax, Breast Cancer Risk Modeling: An Application of Bayes Networks, Technical Report, Ravenpack International, Spain, 2004.
[3] L. M. de Campos, J. M. Fernandez - Luna, and J. F. Huete, "Bayesian networks and information retrieval: an introduction to the special issue," Information Processing and Management, vol. 40, no. 5, pp. 727-733, Sep. 2004.
[4] F. J. Diez, J. Mira, E. Iturralde, S. Zubillaga, and A. Diaval, "A Bayesian expert system for echocardiography," Artificial Intelligence in Medicine, vol. 10, no. 1, pp. 59-73, May 1997.
[5] D. M. Chickering, "Learning bayesian network is NP - complete," Learning from Data: Artificial Intelligence and Statistics V, pp. 121-130, 1996.
[6] D. M. Chickering, C. Meek, and D. Heckerman, "Large-sample learning of Bayesian networks is NP-Hard," in Proc. 19th Conf. on Uncertainty in Artificial Intelligence, UAI'03, pp. 124-133, 2003.
[7] Z. Kebaili and A. Aussem, "A novel hybrid Bayesian network structure learning algorithm based on correlated itemset mining techniques," Int. J. of Computational Intelligence Research, vol. 5, no. 1, pp. 16-21, 2009.
[8] I. Tsamardinos, L. E. Brown, and C. F. Aliferis, "The max - min hill - climbing Bayesian network structure learning algorithm," Machine Learning, vol. 65, no. 1, pp. 31-78, 2006.
[9] D. Rubin, "Inference and missing data," Biometrika, vol. 63, no. 3, pp. 581-592, 1976.
[10] R. Little and D. Rubin, Statistical Analysis with Missing Data, Wiley - Interscience, 2002.
[11] G. Elidan, I. Nachman, and N. Friedman, "Ideal parent structure learning for continuous variable Bayesian networks," J. of Machine Learning Research, vol. 8, pp. 1799-1833, 2007.
[12] W. Buntine, "Theory refinement on Bayesian networks," in Proc. 7th Conf. on Uncertainty in Artificial Intelligence, UAI'91, pp. 52-60, 1991.
[13] T. Silander and P. Myllymaki, "A simple approach for finding the globally optimal Bayesian network structure," in Proc. 22nd Conf. on Uncertainty in Artificial Intelligence, UAI'06, pp. 445-452, 2006.
[14] A. P. Singh and A. W. Moore, Finding Optimal Bayesian Networks by Dynamic Programming, Technical Report, Carnegie Mellon University CALD-05-106, 2005.
[15] M. Koivisto and K. Sood, "Exact bayesian structure discovery in bayesian networks," J. of Machine Learning Research, vol. 5, pp. 549-573, 2004.
[16] J. Suzuki, "Learning Bayesian belief networks based on the minimum description length principle: an efficient algorithm using the branch and bound technique," IEICE Trans. Inf. & Syst., vol. E82-D, no. 2, pp. 356-367, Feb. 1999.
[17] C. P. de Campos, Z. Zeng, and Q. Ji, "Structure learning of Bayesian networks using constraints," in Proc. 26th Int. Conf. on Machine Learning, ICML'09, 2009.
[18] K. Etminani, M. Naghibzadeh, and A. R. Razavi, "Globally optimal structure learning of bayesian networks from data," Lecture Notes in Computer Science, vol. 6352, pp. 101-106, 2010.
[19] D. M. Chickering and D. Heckerman, "Efficient approximations for the marginal likelihood of bayesian networks with hidden variables," Machine Learning, vol. 29, no. 2-3, pp. 181-212, 1997.
[20] M. Ramoni and P. Sebastiani, "Learning Bayesian networks from incomplete databases," in Proc. of the Conf. on Uncertainty in AI, pp. 401-408, 1997.
[21] N. Friedman, "Learning belief network in the presence of missing values and hidden variables," in Proc.14th Int. Conf. on Machine Learning, pp. 125-133, 1997.
[22] N. Friedman, "The bayesian structural EM algorithm," in Proc. 14th Conf. on Uncertainty in Artificial Intelligence, 1998.
[23] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. of the Royal Statistical Society B, vol. 39, no. 1, pp. 1-39, 1977.
[24] P. Leray and O. Franois, "Bayesian network structural learning and incomplete data," in Proc. Int. and Interdisciplinary Conf. on Adaptive Knowledge Representation and Reasoning, 2005.
[25] N. Friedman and M. Goldszmidt, "Discretizing continuous attributes while learning bayesian networks," in Proc. 13th Int. Conf. on Machine Learning, ICML'96, pp. 157-165, 1996.
[26] G. Cooper and E. Herskovits, "A bayesian method for the induction of probabilistic networks from data," Machine Learning, vol. 9, no. 4, pp. 309-347, 1992.
[27] D. Heckerman, D. Geiger, and D. M. Chickering, "Learning bayesian networks: the combination of knowledge and statistical data," Machine Learning, vol. 20, no. 3, pp. 197-243, 1995.
[28] G. Schwartz, "Estimating the dimensions of a model," Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978.
[29] J. Rissanen, "Stochastic complexity (with discussion)," J. of the Royal Statistical Society, vol. 49, no. 3, pp. 223-239, 1987.
[30] H. Akaike, "A new look at the statistical model identification," IEEE Trans. on Automatic Control, vol. 19, no. 6, pp. 716-723, 1974.
[31] S. L. Lauritzen and D. J. Spiegelhalter, "Local computations with probabilitieson graphical structures and their application to expert systems," J. of Royal Statistics Society, vol. 50, no. 2, pp. 157-224, 1988.
[32] R. E. Neapolitan, Probabilistic Reasoning in Expert Systems: Theory and Algorithms, John Wiley & Sons, Inc., pp. 179-180, 1990.