Validation de clustering des donneés dans un contexte big data
dc.contributor.author | Medfouni, Hayet | |
dc.contributor.author | Khantoul, Bilel | |
dc.date.accessioned | 2018-12-05T11:06:11Z | |
dc.date.available | 2018-12-05T11:06:11Z | |
dc.date.issued | 2018 | |
dc.description.abstract | For more than five (05) decades, computing has become the heart of our businesses, our hospitals, our ministries, our homes.....Etc. This strong use of computing has generated large volumes of data that are not manageable by conventional software and hardware. Take the case of human-sized companies like Google and Microsoft, these major subsidiaries that must have billions of data to keep. This perplexity in the management of these large volumes of data gave birth to Big Data. The quantities of potentially infinite data and the constraints that derive from it pose many problems of treatment. Among these constraints include the impossibility of storing all these massive data, the difficulty of partitioning them into homogeneous groups without knowing a priori the number of clusters, or the need to produce these clusters in real time. In this work, we propose a distributed parallel approach to solve the problem of scaling external clustering validation approaches to allow the use of large data setsby considering the following index: Jaccard coefficient. To do this, we will use the Hadoop platform which is one of the best Big Data platforms and relies on the MapReduce paradigm.The results obtained show the validity of the models developed on the Hadoop platform. | ar |
dc.identifier.uri | http://hdl.handle.net/123456789/6933 | |
dc.language.iso | fr | ar |
dc.publisher | Université Oum El Bouaghi | ar |
dc.subject | Big Data | ar |
dc.subject | Clustering | ar |
dc.subject | Clusteringvalidation | ar |
dc.subject | External validation | ar |
dc.subject | Jaccard coefficient | ar |
dc.title | Validation de clustering des donneés dans un contexte big data | ar |
dc.type | Other | ar |