Evaluation of DUST (Duplicate URLs with Similar Text) Using Multiple Sequence of Alignment

Authors

  • K.P.Bhagwat1, M.S.Chaudhari2 Department of CSE, Priyadarshani Bhagwati College of Engineering, Nagpur, India

Abstract

Today World Wide Web is a commonly used medium to search information using Web crawlers. Some of the information collected by the web crawlers includes pages with duplicate contents. Different URLs with Similar Text are known as DUST. To improve the performance of search engines, a new method called DUSTER is proposed. The proposed method converts the entire URL into multiple sequences of alignments and removes the duplicate URLs. The proposed method uses normalization rules to convert the duplicate URLs into a single canonical form. Using this method reduction of large number of duplicate URLs is achieved.
Key Words: URL (Uniform Resource Locator), Search Engine, DUST (Duplicate URL with Similar Text), Normalization Rules

Downloads

Published

2017-08-30

How to Cite

M.S.Chaudhari2, K. (2017). Evaluation of DUST (Duplicate URLs with Similar Text) Using Multiple Sequence of Alignment. International Journal of Engineering Technology and Computer Research, 5(4). Retrieved from https://www.ijetcr.org/index.php/ijetcr/article/view/406

Issue

Section

Articles