Identification of Query Forms for Retrieving the Information From Deep Web
Keywords:Crawler, deep web, wrapper generation, attribute
Web databases are now present everywhere. The data from the Deep Web can not be accessed by Search engine and web crawlers directly. The only way to access the hidden database is through query interfaces and filling up number of HTML forms for a specific domain . In this paper a technique called QFORT (QUERY FORM RETRIEVAL TECHNIQUE) has been developed for identifying the relevant query form is presented. Retrieving information from deep web pages using wrappers is a fundamental problem arising in a huge range of web pages of vast practical interests. In this paper, we propose a novel technique to the problem of identifying the query forms from Web pages, which is one of the key problems in automatic extraction approach. The problem is resolved by many authors by using different technique Intensive experiments on real web sites show that the proposed technique can effectively help extracting desired data with high accuracies in most of the cases.
. S. Lawrence, C. L. Giles, Searching the world wide web, Science 280(5360)(1998) 98-100.
. M . K. Bergman , The Deep Web : Surfacing Hidden Value, September 2001, http://www.brightplanet.com/deepcontent/tutorials/deepwebwhitepaper.pdf.
. S. Chakrabarti, M. van den Berg, B. Dom, Focused Crawling : A new approach to topic specific web resource discovery, in 8th world wide web conference may 1999.
. J . Cho, H. Gracia-Molina . The evolution of the web and implication for an incremental crawler , in : Proc, 26th Int. Conf. Very Large Data Bases. VLDB, 2000.
. Yanhong Zhai and Bing Liu, “Extracting Web Data Using Instances Based Learning.” WISE conference, 2005.
. Hu D, Meng X, “ Automatic Data Extraction from Data-Rich Web Pages” , The 10th Data System for Advanced Applications (DASFAA), Beijing, 2005.
. LIU Wei, MENG Xiao-Feng, MENG Wei-Yi, A Survey of Deep Web Data Integration, Chinese Journal of Computers, Vol. 30, No. 9, Sept 2007, pp : 1475-1489.
. B. He, Patel M, Zhang Z, C Chang, Accessing the deep web, Communication of the ACM, Col. 50 (5), May 2007; 95-101.
. Yoo Jung An, James Geller, Yi-Ta Wu, Soon Ae Chun : Semantic deep web : automatic attribute extraction from the deep web data source. In proceeding of the 2007 ACM symposium on Applied Computing (SAC2007), Seoul, Korea, March, 2007, pp : 1667-1672.
. Longzhuang Li, Y onghuai Liu, Abel Obregon, and Matt A.Weatherston "Visual Segmentation-Based Data Record Extraction from Web Documents," IEEE IRJ 2007.
. Wei Liu , Xiaofeng Meng, and Weiyi Meng, "ViDE: A Vision-based Approach for Deep Web Data Extraction," IEEE TKDE, 2009.
. Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Raghavan, and Clement Yu, "Fully automatic wrapper generation for search engines," in ACM WWW, 2005.
. Wu, W., Doan, A., Yu, C.: WebIQ: Learning from the Web to match Deep-Web query interfaces. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), p. 44 (2006).
. B. He, Z. Zhang, and K. C.-C. Chang. Knocking the door to the deep web: Integrating web query interfaces. In SIGMOD Conference, System Demonstration, 2004.
. L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
. A. Ntoulas, P. Zerfos, and J. Cho. Downloading hidden web content. Technical report, UCLA, 2004.
. J. Cope, N. Craswell, and D. Hawking. Automated Discovery of Search Interfaces on the Web. In Proc. of ADC, pages 181–189, 2003.
. Das N. N., Kumar Ela, Hidden Web Query Technique for Extracting the Data from Deep Web Data Base, WCECS2012_PP410-414, in proceeding of the world congress on engineering and computer science 2012 vol-1 WCECS October 24-26 2012