![](images/logo.png?v=2)
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Reviewer Referral Program
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
Conferences Published ↓
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 6 Issue 4
July-August 2024
Indexing Partners
![Academia.edu Academia](images/index-partners/academia.png)
![Advanced Sciences Index Advanced Sciences Index](images/index-partners/advanced-sciences.png)
![Bielefeld Academic Search Engine Bielefeld Academic Search Engine](images/index-partners/bielefeld.gif)
![CiteSeer CiteSeer](images/index-partners/cite-seer.png)
![DRJI DRJI](images/index-partners/drji.png)
![Google Scholar Google Scholar](images/index-partners/google-scholar.png)
![Independent Search Engine & Directory Network (isedn.org) Independent Search Engine & Directory Network](images/index-partners/isedn.jpg)
![ISI (International Scientific Indexing) ISI (International Scientific Indexing)](images/index-partners/isi.png)
![Issuu Issuu](images/index-partners/issuu.png)
![Mendeley Research Networks Mendeley Research Networks](images/index-partners/mendeley.png)
![RefSeek RefSeek](images/index-partners/ref-seek.png)
![ResearcherId - Thomson Reuters ResearcherId - Thomson Reuters](images/index-partners/researcher-id.png)
![ResearchGate ResearchGate](images/index-partners/research-gate.png)
![Scirus Scirus](images/index-partners/scirus.png)
![Scribd Scribd](images/index-partners/scribd.gif)
![Semantic Scholar Semantic Scholar](images/index-partners/semantic-scholar.png)
![UTeM - Universiti Teknikal Malaysia Melaka UTeM - Universiti Teknikal Malaysia Melaka](images/index-partners/utem.png)
![Wiki for Call for Papers Wiki for Call for Papers](images/index-partners/wiki-cfp.png)
![WorldCat WorldCat](images/index-partners/world-cat.png)
Scrapy-Based Incremental Housing Rental Information Crawling System Design
Author(s) | Qichen Shao, Dongxiao Ren |
---|---|
Country | China |
Abstract | A web-controlled incremental crawling system is designed for incremental crawling of property rental information on websites because of the need for massive data sets to train the housing rental system model, and to solve the problem of always using site-wide crawling and multiple database accesses for crawling websites based on the Scrapy framework. In order to achieve incremental crawling, a download middleware is added to the Scrapy framework, the system loads the seed page, the visited URLs and their hash lists and the control page list when the crawler starts, obtains the URLs of the sub-level pages and enters them into the database, then crawls the sub-level pages in bulk and parses the property information in the sub-level pages. The data is cleaned by verifying the data format, completing missing items, removing duplicate data and detecting abnormal data to get the eligible property data. |
Keywords | Scrapy crawler, incremental crawling, download middleware |
Field | Computer Applications |
Published In | Volume 5, Issue 3, May-June 2023 |
Published On | 2023-06-08 |
Cite This | Scrapy-Based Incremental Housing Rental Information Crawling System Design - Qichen Shao, Dongxiao Ren - IJFMR Volume 5, Issue 3, May-June 2023. DOI 10.36948/ijfmr.2023.v05i03.3488 |
DOI | https://doi.org/10.36948/ijfmr.2023.v05i03.3488 |
Short DOI | https://doi.org/gscdqz |
Share this
E-ISSN 2582-2160
![](images/ean-13-barcode.gif)
CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
![](images/loading.gif)