Association Rule Mining in Big Data using MapReduce approach in Hadoop

J.Jenifer Nancy, Kalasalingam University; M.Jansi Rani ,; Dr.D.Devaraj ,

Hadoop, MapReduce, Association rule mining, Data mining, big data

The concept of Association rule mining is an important task in data mining. In case of big data the large volume of data makes is impossible to generate rules at a faster pace. By making use of parallel execution in Hadoop using the MapReduce framework, the rules can be generated much faster and in an efficient way. The existing method transforms the input dataset into binomial representation before processing them using MapReduce. But binomial conversion is not user-friendly since it is complex in case of continuous values. In this paper, an improved and scalable algorithm is proposed for association rule mining that will convert the input dataset into key-value pairs instead of binomial. All the stages of proposed association rule mining algorithm are parallelized using MapReduce. The proposed algorithm works on high cardinality features and so no dimension detection is needed.
    [1] Ashrafi, M.Z.,Taniar,D., Smith,K., “ODAM:An Optimized Distributed Association Rule Mining Algorithm”, Distributed Systems Online, IEEE, Volume 5, Issue 3, 2004. [2] R.Agrawal, R.Srikant, “Fast Algorithms for Mining Association Rules” , In Proceedings of International Conference on Very Large DataBases ,pp.487-499, Santiago,Chile,September1994. [3] JongSooPark, Ming-SyanChen, PhilipS. Yu,“An Effective Hash-based Algorithm for Mining Association Rules”, In Proceedings of the ACMSIGMOD International Conference on Management of Data, Michael Carey and Donovan Schneider, ACM, 1995. [4] Ozel,S.A., Guvenir,H.A., “An Algorithm for Mining Association Rules using Perfect Hashing and Database Pruning”,10th Turkish Symposiumon Artificial Intelligence and Neural Networks , Gazimagusa, Springer, pp. 257-264, 2001. [5] KaramGouda, Mohammed JaveedZaki, “Efficiently Mining Maximal Frequent Itemsets”, In Proceedings of the IEEE International Conference on DataMining, pp.163-170, November29-December 02 , 2001. [6] J.Han,J. Pei,Y. Yin, “Mining Frequent Patterns without Candidate Generation”, ACMSIGMOD International Conference,Dallas,2000. [7] D.W.Cheung, Jiawei Han, V.T. Ng, A.W. Fu, Yongjian Fu, "Afast Distributed Algorithm for Mining Association Rules”, In Proceedings of International Conference on Parallel and Distributed Information Systems, IEEE CS Press, 1996. [8] AnsariE, DastghaibifardG, KeshtkaranM, KaabiH, “Distributed Frequent Itemset Mining using Trie Data Structure ”,International Journal of Computer Science, Volume 35, Issue 3, pp. 337-381, 2008. [9] Park,J.S.,Chen,M. S., Yu,P. S., “Efficient Paralle l Data Mining for Association Rules”, In Proceedings of the Fourth International Conference on Information and Knowledge Management,pp.31-33, 1995. [10] Woo, J., Xu, Y, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing”, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2001. [11] Lin, Ming-Yen, Pei-Yu Lee, Sue-Chen Hsueh, "Apriori-based Frequent Itemset Mining Algorithms on MapReduce", In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ACM, 2012. [12] PeddiKishor, SammulalPorika, “Literature Survey on Association Rule Discovery in Data Mining”, International Journal of Computer Science and Management Research, Volume 2, Issue 1, January 2013. [13] Zhang C.S, Li Z.Y, Zheng D.S., “An Improved Algorithm for Apriori”, In Proceedings of the 1st International Workshop on Education Technology and Computer Science, Volume 1, pp. 995-998, 2009. [14] C.Jin, C.Vecchiola, R.Buyya, “MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms”, Fourth IEEE International Conference on eScience, pp. 214-221, 2008. [15] T.Elsayed, J.Lin, Douglas W. Oard, “Pairwise Document Similarity in Large Collections with MapReduce”, In Proceedings of 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009. [16] J.H.C. Yeung, C.C. Tsang, K.H. Tsoi, B.Kwan, C. Cheung, A.P.C. Chan P.H.W. Leong, “Map-reduce as a Programming Model for Custom Computing Machines”, In Proceedings of the 16th IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 149-159, 2008. [17] M.Zaharia, A.Konwinski, A. D. Joseph, R. Katz, I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments”, EECS Department University of California, Berkeley Technical Report Number UCB/EECS-2008-99 August 19, 2008. [18] MohammadhosseinBarkhordari, Mahdi Niamanesh, “ScadiBino: An Effective MapReduce-based Association Rule Mining Method”, ACM 16th International Conference on Electronic Commerce, August 2014. [19] P.Ganesh Kumar, D.Devaraj, “Intrusion Detection using Artificial Neural Network with Reduced Input Features”, International Journal on Soft Computing, ICTACT, Issue 1, pp. 30-36, July 2010.
Paper ID: GRDCF002039
Published in: Conference : International Conference on Innovations in Engineering and Technology (ICIET - 2016)
Page(s): 179 - 186