An Improved Data Clustering Algorithm for Mining Web Documents

This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept o...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Odukoya, O. H., Aderounmu, Ganiyu A., Adagunodo, E. R.
Formato:	Revista
Lenguaje:	inglés
Publicado:	2023
Materias:	improved data clustering algorithm mining web documents
Acceso en línea:	https://ir.oauife.edu.ng/123456789/5537
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1810764579721969664
author	Odukoya, O. H. Aderounmu, Ganiyu A. Adagunodo, E. R.
author_facet	Odukoya, O. H. Aderounmu, Ganiyu A. Adagunodo, E. R.
author_sort	Odukoya, O. H.
collection	DSpace
description	This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm. The proposed algorithm was simulated using the fuzzy logic and statistical toolbox in Matlab 7.0. The simulated results were compared with the existing data clustering algorithm using accuracy, response time, adjusted rand index and entropy as performance parameters. The results show an improved data clustering algorithm with a new initialization method based on finding a set of medians extracted from a dimension with maximum variances. The results of the simulation showed that the accuracy is at its peak when the number of clusters is 3 and reduces as the number of clusters increases. When compared with existing algorithm, the proposed clustering algorithm showed an accuracy of 89.3% while the existing had an accuracy of 88.9%. The entropy was stable for both algorithms with a value of 0.2485 at k = 3. This also decreases as the number of clusters increase until when the number of clusters reached eight where it increased slightly. The adjusted rand index values varied from 0 to 1 for both clustering algorithms. The existing method achieved a value of 53% as compared with the proposed method which achieved an adjusted rand index value of 63.7%, when the number of clusters was five. In addition, the response time decreased from 0.0451 seconds to 0.0439 seconds when the number of clusters was three. This showed that the proposed data clustering algorithm decreased by 2.7% in response time as compared to the K-means data clustering. This study has shown that the proposed data clustering algorithm could be adapted by web search engine developers for more efficient web sea- - rch result clustering.
format	Journal
id	oai:ir.oauife.edu.ng:123456789-5537
institution	My University
language	English
publishDate	2023
record_format	dspace
spelling	oai:ir.oauife.edu.ng:123456789-55372023-05-13T17:55:08Z An Improved Data Clustering Algorithm for Mining Web Documents Odukoya, O. H. Aderounmu, Ganiyu A. Adagunodo, E. R. improved data clustering algorithm mining web documents This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm. The proposed algorithm was simulated using the fuzzy logic and statistical toolbox in Matlab 7.0. The simulated results were compared with the existing data clustering algorithm using accuracy, response time, adjusted rand index and entropy as performance parameters. The results show an improved data clustering algorithm with a new initialization method based on finding a set of medians extracted from a dimension with maximum variances. The results of the simulation showed that the accuracy is at its peak when the number of clusters is 3 and reduces as the number of clusters increases. When compared with existing algorithm, the proposed clustering algorithm showed an accuracy of 89.3% while the existing had an accuracy of 88.9%. The entropy was stable for both algorithms with a value of 0.2485 at k = 3. This also decreases as the number of clusters increase until when the number of clusters reached eight where it increased slightly. The adjusted rand index values varied from 0 to 1 for both clustering algorithms. The existing method achieved a value of 53% as compared with the proposed method which achieved an adjusted rand index value of 63.7%, when the number of clusters was five. In addition, the response time decreased from 0.0451 seconds to 0.0439 seconds when the number of clusters was three. This showed that the proposed data clustering algorithm decreased by 2.7% in response time as compared to the K-means data clustering. This study has shown that the proposed data clustering algorithm could be adapted by web search engine developers for more efficient web sea- - rch result clustering. 2023-05-13T17:55:08Z 2023-05-13T17:55:08Z 2010-12 Journal Odukoya O. H, Aderounmu G.A, Adagunodo E. R.(2010)An Improved Data Clustering Algorithm for Mining Web Documents.International Conference on Computational Intelligence and Software Engineering, CiSE 2010. DOI: 10.1109/CISE.2010.5676720 https://ir.oauife.edu.ng/123456789/5537 en text/plain
spellingShingle	improved data clustering algorithm mining web documents Odukoya, O. H. Aderounmu, Ganiyu A. Adagunodo, E. R. An Improved Data Clustering Algorithm for Mining Web Documents
title	An Improved Data Clustering Algorithm for Mining Web Documents
title_full	An Improved Data Clustering Algorithm for Mining Web Documents
title_fullStr	An Improved Data Clustering Algorithm for Mining Web Documents
title_full_unstemmed	An Improved Data Clustering Algorithm for Mining Web Documents
title_short	An Improved Data Clustering Algorithm for Mining Web Documents
title_sort	improved data clustering algorithm for mining web documents
topic	improved data clustering algorithm mining web documents
url	https://ir.oauife.edu.ng/123456789/5537
work_keys_str_mv	AT odukoyaoh animproveddataclusteringalgorithmforminingwebdocuments AT aderounmuganiyua animproveddataclusteringalgorithmforminingwebdocuments AT adagunodoer animproveddataclusteringalgorithmforminingwebdocuments AT odukoyaoh improveddataclusteringalgorithmforminingwebdocuments AT aderounmuganiyua improveddataclusteringalgorithmforminingwebdocuments AT adagunodoer improveddataclusteringalgorithmforminingwebdocuments

An Improved Data Clustering Algorithm for Mining Web Documents

Ejemplares similares