International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Reviewer Referral Program
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 6 Issue 6
November-December 2024
Indexing Partners
Scrapy-Based Incremental Housing Rental Information Crawling System Design
Author(s) | Qichen Shao, Dongxiao Ren |
---|---|
Country | China |
Abstract | A web-controlled incremental crawling system is designed for incremental crawling of property rental information on websites because of the need for massive data sets to train the housing rental system model, and to solve the problem of always using site-wide crawling and multiple database accesses for crawling websites based on the Scrapy framework. In order to achieve incremental crawling, a download middleware is added to the Scrapy framework, the system loads the seed page, the visited URLs and their hash lists and the control page list when the crawler starts, obtains the URLs of the sub-level pages and enters them into the database, then crawls the sub-level pages in bulk and parses the property information in the sub-level pages. The data is cleaned by verifying the data format, completing missing items, removing duplicate data and detecting abnormal data to get the eligible property data. |
Keywords | Scrapy crawler, incremental crawling, download middleware |
Field | Computer Applications |
Published In | Volume 5, Issue 3, May-June 2023 |
Published On | 2023-06-08 |
Cite This | Scrapy-Based Incremental Housing Rental Information Crawling System Design - Qichen Shao, Dongxiao Ren - IJFMR Volume 5, Issue 3, May-June 2023. DOI 10.36948/ijfmr.2023.v05i03.3488 |
DOI | https://doi.org/10.36948/ijfmr.2023.v05i03.3488 |
Short DOI | https://doi.org/gscdqz |
Share this
E-ISSN 2582-2160
doi
CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.