Pātuhitia tēnei: An Improved Data Clustering Algorithm for Mining Web Documents