Alliance Rules- based Algorithm on Detecting Duplicate Entry Email
DOI:
https://doi.org/10.35134/jcsitech.v7i2.7Keywords:
Data cleaning, Algorithm, Alliance rule, DuplicationAbstract
The way that email has extraordinary significance in present day business communication is certain. Consistently, a bulk of emails is sent from organizations to clients and suppliers, from representatives to their managers and starting with one colleague then onto the next. In this way there is vast of email in data warehouse. Data cleaning is an activity performed on the data sets of data warehouse to upgrade and keep up the quality and consistency of the data. This paper underlines the issues related with dirty data, detection of duplicatein email column. The paper identifies the strategy of data cleaning from adifferent point of view. It provides an algorithm to the discovery of error and duplicates entries in the data sets of existing data warehouse. The paper characterizes the alliance rules based on the concept of mathematical association rules to determine the duplicate entries in email column in data sets.
References
Whittaker, S. and Sidner, C., 1996, April. Email overload: exploring personal information management of email. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 276-283). ACM.
Rose, A.N.M., Awang, M.I., Hassan, H., Zakaria, A.H., Herawan, T. and Deris, M.M., 2011, August. Hybrid reduction in soft set decision making. In International Conference on Intelligent Computing (pp. 108-115). Springer Berlin Heidelberg.
Herawan, T., Rose, A.N.M. and Deris, M.M., 2009. Soft set theoretic approach for dimensionality reduction. In Database Theory and Application (pp. 171-178). Springer Berlin Heidelberg.
Herawan, T., Ghazali, R. and Deris, M.M., 2010. Soft set theoretic approach for dimensionality reduction. International Journal of Database Theory and Application, 3(2), pp.4-60.
Ma, X., Qin, H., Sulaiman, N., Herawan, T. and Abawajy, J.H., 2014. The parameter reduction of the interval-valued fuzzy soft sets and its related algorithms. IEEE Transactions on Fuzzy Systems, 22(1), pp.57-71.
Ma, X., Sulaiman, N., Qin, H., Herawan, T. and Zain, J.M., 2011. A new efficient normal parameter reduction algorithm of soft sets. Computers & Mathematics with Applications, 62(2), pp.588-598.
Zhang,S.,C.Zhang,andQ.Yang,Datapreparationfordata mining.AppliedArtificialIntelligence, 2003. 17(5-6):p. 375-381.
Brandt, R., & Chong, G. (2010). Design informed: Driving innovation with evidenced-based design. Hoboken, N.J.: John Wiley & Sons.
Yang, Q., Yuan, S., & Rajasekera, J. (2008). An Important Issue in Data Mining-Data Cleaning. (2002): 455-464.
Y.Patil, R., & Kulkarni, D. (2012). A Review of Data Cleaning Algorithms for Data Warehouse Systems. International Journal of Computer Science and Information Technologies, 3(5212 - 5214), 5-5.
Hellerstein, J. (2008). Quantitative Data Cleaning for Large Database.United Nations Economic Commission for Europe (UNECE).
Choudary, N. (2014). A Study over Problems and Approaches of Data Cleansing/Cleaning. International Journal of Advanced Research in Computer Science and Software Engineering, 4(2).
R. Arora,P. Pahwa and S. Bansal,‖Alliance Rules for Data Warehouse Cleansing‖, 2009.IEEE Press, Pages 743-747
Arindam, P., and Varuni, Ganesan,‖HADCLEAN:A Hybrid Approach to Data Cleaning in Data Warehouses‖,2012.IEEE Press,Pages 136-142.
Adu-Manu Sarpong, K., Davis, J., & Panford, J. (2013). A Conceptual Framework for Data Cleansing – A Novel Approach to Support the Cleansing Process International Journal of Computer Applications (0975 – 8887)., 77(12).
Amini, A., Saboohi, H., Herawan, T. and Wah, T.Y., 2016. MuDi-Stream: A multi density clustering algorithm for evolving data stream. Journal of Network and Computer Applications, 59, pp.370-385.
Mohebi, A., Aghabozorgi, S., Ying Wah, T., Herawan, T. and Yahyapour, R., 2016. Iterative big data clustering algorithms: a review. Software: Practice and Experience, 46(1), pp.107- 129.
Qin, H., Ma, X., Herawan, T. and Zain, J.M., 2014. MGR: An information theory based hierarchical divisive clustering algorithm for categorical data. Knowledge-Based Systems, 67, pp.401-411.
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y. and Herawan, T., 2014, June. Big data clustering: a review. In International Conference on Computational Science and Its Applications (pp. 707-720). Springer International Publishing.
Abdullah, Z., Herawan, T., Ahmad, N., Ghazali, R. and Deris, M.M., 2014, June. Mining Indirect Least Association Rule from Students‘ Examination Datasets. In International
Conference on Computational Science and Its Applications (pp. 783-797). Springer International Publishing.
Abdullah, Z., Mohd, F., Saman, M.Y.M., Deris, M.M., Herawan, T. and Hamdan, A.R., 2014. Mining critical least association rule from oral cancer dataset. In Recent Advances on Soft Computing and Data Mining (pp. 529-538). Springer International Publishing.
Abdullah, Z., Herawan, T. and Deris, M.M., 2014. Detecting Definite Least Association Rule in Medical Database. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 127-134). Springer Singapore.
Abdullah, Z., Herawan, T. and Deris, M.M., 2014. Mining Indirect Least Association Rule. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 159-166). Springer Singapore.
Abdullah, Z., Herawan, T., Chiroma, H. and Deris, M.M., 2014, June. A sequential data preprocessing tool for data mining. In International Conference on Computational Science and Its Applications (pp. 734-746). Springer International Publishing.



