Adaptive Cross-Platform Web Crawling System Design via Deep Reinforcement Learning and Privacy Protection Article Swipe
Modern web and mobile platforms increasingly deploy complex anti-crawling mechanisms and enforce strict privacy regulations, making large-scale, compliant data acquisition a persistent challenge. In this paper, we propose a novel cross-platform adaptive web crawling framework that integrates deep reinforcement learning (DRL), federated learning (FL), and local differential privacy (LDP) to address the dual demands of operational efficiency and legal compliance. We formulate the crawling process as a Markov Decision Process (MDP) and leverage a PPO-based policy to enable dynamic decision-making under adversarial conditions, including CAPTCHA triggers, tokenized APIs, and platform switching. The system adopts a privacy-by-design architecture: federated training avoids raw data exposure, LDP ensures local feature desensitization, and blockchain-based audit logging provides immutable, transparent behavior tracking. Extensive experiments on real-world platforms—ranging from e-commerce sites to mobile social applications—demonstrate that our framework achieves superior success rates, adaptive behavior, and compliance scores compared to traditional, heuristic, and non-private baselines. The proposed system offers a practical and legally conscious solution for next-generation web crawling in dynamic, regulated ecosystems.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.63619/ijai4s.v1i2.001
- OA Status
- hybrid
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4409380158
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4409380158Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.63619/ijai4s.v1i2.001Digital Object Identifier
- Title
-
Adaptive Cross-Platform Web Crawling System Design via Deep Reinforcement Learning and Privacy ProtectionWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-12Full publication date if available
- Authors
-
Wen ZengList of authors in order
- Landing page
-
https://doi.org/10.63619/ijai4s.v1i2.001Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
hybridOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.63619/ijai4s.v1i2.001Direct OA link when available
- Concepts
-
Crawling, Reinforcement learning, Computer science, Computer security, World Wide Web, Internet privacy, Human–computer interaction, Artificial intelligence, Biology, AnatomyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4409380158 |
|---|---|
| doi | https://doi.org/10.63619/ijai4s.v1i2.001 |
| ids.doi | https://doi.org/10.63619/ijai4s.v1i2.001 |
| ids.openalex | https://openalex.org/W4409380158 |
| fwci | 0.0 |
| type | article |
| title | Adaptive Cross-Platform Web Crawling System Design via Deep Reinforcement Learning and Privacy Protection |
| biblio.issue | 2 |
| biblio.volume | 1 |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12016 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9473000168800354 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1710 |
| topics[0].subfield.display_name | Information Systems |
| topics[0].display_name | Web Data Mining and Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C100368936 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8416177034378052 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1411725 |
| concepts[0].display_name | Crawling |
| concepts[1].id | https://openalex.org/C97541855 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6270140409469604 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q830687 |
| concepts[1].display_name | Reinforcement learning |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.5559982061386108 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C38652104 |
| concepts[3].level | 1 |
| concepts[3].score | 0.48076507449150085 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[3].display_name | Computer security |
| concepts[4].id | https://openalex.org/C136764020 |
| concepts[4].level | 1 |
| concepts[4].score | 0.38703739643096924 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[4].display_name | World Wide Web |
| concepts[5].id | https://openalex.org/C108827166 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3840716481208801 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q175975 |
| concepts[5].display_name | Internet privacy |
| concepts[6].id | https://openalex.org/C107457646 |
| concepts[6].level | 1 |
| concepts[6].score | 0.32411402463912964 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[6].display_name | Human–computer interaction |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.18304693698883057 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C86803240 |
| concepts[8].level | 0 |
| concepts[8].score | 0.08463439345359802 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[8].display_name | Biology |
| concepts[9].id | https://openalex.org/C105702510 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q514 |
| concepts[9].display_name | Anatomy |
| keywords[0].id | https://openalex.org/keywords/crawling |
| keywords[0].score | 0.8416177034378052 |
| keywords[0].display_name | Crawling |
| keywords[1].id | https://openalex.org/keywords/reinforcement-learning |
| keywords[1].score | 0.6270140409469604 |
| keywords[1].display_name | Reinforcement learning |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.5559982061386108 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/computer-security |
| keywords[3].score | 0.48076507449150085 |
| keywords[3].display_name | Computer security |
| keywords[4].id | https://openalex.org/keywords/world-wide-web |
| keywords[4].score | 0.38703739643096924 |
| keywords[4].display_name | World Wide Web |
| keywords[5].id | https://openalex.org/keywords/internet-privacy |
| keywords[5].score | 0.3840716481208801 |
| keywords[5].display_name | Internet privacy |
| keywords[6].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[6].score | 0.32411402463912964 |
| keywords[6].display_name | Human–computer interaction |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.18304693698883057 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/biology |
| keywords[8].score | 0.08463439345359802 |
| keywords[8].display_name | Biology |
| language | en |
| locations[0].id | doi:10.63619/ijai4s.v1i2.001 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S5407048883 |
| locations[0].source.issn | 3067-3593 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 3067-3593 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| locations[0].source.host_organization | |
| locations[0].source.host_organization_name | |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| locations[0].landing_page_url | https://doi.org/10.63619/ijai4s.v1i2.001 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5042875971 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4556-5534 |
| authorships[0].author.display_name | Wen Zeng |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Weipeng Zeng |
| authorships[0].is_corresponding | True |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.63619/ijai4s.v1i2.001 |
| open_access.oa_status | hybrid |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Adaptive Cross-Platform Web Crawling System Design via Deep Reinforcement Learning and Privacy Protection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T12016 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9473000168800354 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1710 |
| primary_topic.subfield.display_name | Information Systems |
| primary_topic.display_name | Web Data Mining and Analysis |
| related_works | https://openalex.org/W4393220254, https://openalex.org/W4321258516, https://openalex.org/W2051833850, https://openalex.org/W4287845917, https://openalex.org/W3156164993, https://openalex.org/W2385015894, https://openalex.org/W2171573941, https://openalex.org/W4360873893, https://openalex.org/W4390135167, https://openalex.org/W4317382653 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.63619/ijai4s.v1i2.001 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S5407048883 |
| best_oa_location.source.issn | 3067-3593 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | 3067-3593 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| best_oa_location.source.host_organization | |
| best_oa_location.source.host_organization_name | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| best_oa_location.landing_page_url | https://doi.org/10.63619/ijai4s.v1i2.001 |
| primary_location.id | doi:10.63619/ijai4s.v1i2.001 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S5407048883 |
| primary_location.source.issn | 3067-3593 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 3067-3593 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| primary_location.source.host_organization | |
| primary_location.source.host_organization_name | |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | International Journal of Artificial Intelligence for Science (IJAI4S) |
| primary_location.landing_page_url | https://doi.org/10.63619/ijai4s.v1i2.001 |
| publication_date | 2025-04-12 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 20, 28, 66, 73, 94, 152 |
| abstract_inverted_index.In | 23 |
| abstract_inverted_index.We | 60 |
| abstract_inverted_index.as | 65 |
| abstract_inverted_index.in | 162 |
| abstract_inverted_index.of | 54 |
| abstract_inverted_index.on | 119 |
| abstract_inverted_index.to | 49, 76, 125, 142 |
| abstract_inverted_index.we | 26 |
| abstract_inverted_index.LDP | 103 |
| abstract_inverted_index.The | 91, 148 |
| abstract_inverted_index.and | 2, 10, 44, 57, 71, 88, 108, 138, 145, 154 |
| abstract_inverted_index.for | 158 |
| abstract_inverted_index.our | 130 |
| abstract_inverted_index.raw | 100 |
| abstract_inverted_index.the | 51, 62 |
| abstract_inverted_index.web | 1, 32, 160 |
| abstract_inverted_index.data | 18, 101 |
| abstract_inverted_index.deep | 37 |
| abstract_inverted_index.dual | 52 |
| abstract_inverted_index.from | 122 |
| abstract_inverted_index.that | 35, 129 |
| abstract_inverted_index.this | 24 |
| abstract_inverted_index.(FL), | 43 |
| abstract_inverted_index.(LDP) | 48 |
| abstract_inverted_index.(MDP) | 70 |
| abstract_inverted_index.APIs, | 87 |
| abstract_inverted_index.audit | 110 |
| abstract_inverted_index.legal | 58 |
| abstract_inverted_index.local | 45, 105 |
| abstract_inverted_index.novel | 29 |
| abstract_inverted_index.sites | 124 |
| abstract_inverted_index.under | 80 |
| abstract_inverted_index.(DRL), | 40 |
| abstract_inverted_index.Markov | 67 |
| abstract_inverted_index.Modern | 0 |
| abstract_inverted_index.adopts | 93 |
| abstract_inverted_index.avoids | 99 |
| abstract_inverted_index.deploy | 6 |
| abstract_inverted_index.enable | 77 |
| abstract_inverted_index.making | 15 |
| abstract_inverted_index.mobile | 3, 126 |
| abstract_inverted_index.offers | 151 |
| abstract_inverted_index.paper, | 25 |
| abstract_inverted_index.policy | 75 |
| abstract_inverted_index.rates, | 135 |
| abstract_inverted_index.scores | 140 |
| abstract_inverted_index.social | 127 |
| abstract_inverted_index.strict | 12 |
| abstract_inverted_index.system | 92, 150 |
| abstract_inverted_index.CAPTCHA | 84 |
| abstract_inverted_index.Process | 69 |
| abstract_inverted_index.address | 50 |
| abstract_inverted_index.complex | 7 |
| abstract_inverted_index.demands | 53 |
| abstract_inverted_index.dynamic | 78 |
| abstract_inverted_index.enforce | 11 |
| abstract_inverted_index.ensures | 104 |
| abstract_inverted_index.feature | 106 |
| abstract_inverted_index.legally | 155 |
| abstract_inverted_index.logging | 111 |
| abstract_inverted_index.privacy | 13, 47 |
| abstract_inverted_index.process | 64 |
| abstract_inverted_index.propose | 27 |
| abstract_inverted_index.success | 134 |
| abstract_inverted_index.Decision | 68 |
| abstract_inverted_index.achieves | 132 |
| abstract_inverted_index.adaptive | 31, 136 |
| abstract_inverted_index.behavior | 115 |
| abstract_inverted_index.compared | 141 |
| abstract_inverted_index.crawling | 33, 63, 161 |
| abstract_inverted_index.dynamic, | 163 |
| abstract_inverted_index.learning | 39, 42 |
| abstract_inverted_index.leverage | 72 |
| abstract_inverted_index.platform | 89 |
| abstract_inverted_index.proposed | 149 |
| abstract_inverted_index.provides | 112 |
| abstract_inverted_index.solution | 157 |
| abstract_inverted_index.superior | 133 |
| abstract_inverted_index.training | 98 |
| abstract_inverted_index.Extensive | 117 |
| abstract_inverted_index.PPO-based | 74 |
| abstract_inverted_index.behavior, | 137 |
| abstract_inverted_index.compliant | 17 |
| abstract_inverted_index.conscious | 156 |
| abstract_inverted_index.exposure, | 102 |
| abstract_inverted_index.federated | 41, 97 |
| abstract_inverted_index.formulate | 61 |
| abstract_inverted_index.framework | 34, 131 |
| abstract_inverted_index.including | 83 |
| abstract_inverted_index.platforms | 4 |
| abstract_inverted_index.practical | 153 |
| abstract_inverted_index.regulated | 164 |
| abstract_inverted_index.tokenized | 86 |
| abstract_inverted_index.tracking. | 116 |
| abstract_inverted_index.triggers, | 85 |
| abstract_inverted_index.baselines. | 147 |
| abstract_inverted_index.challenge. | 22 |
| abstract_inverted_index.compliance | 139 |
| abstract_inverted_index.e-commerce | 123 |
| abstract_inverted_index.efficiency | 56 |
| abstract_inverted_index.heuristic, | 144 |
| abstract_inverted_index.immutable, | 113 |
| abstract_inverted_index.integrates | 36 |
| abstract_inverted_index.mechanisms | 9 |
| abstract_inverted_index.persistent | 21 |
| abstract_inverted_index.real-world | 120 |
| abstract_inverted_index.switching. | 90 |
| abstract_inverted_index.acquisition | 19 |
| abstract_inverted_index.adversarial | 81 |
| abstract_inverted_index.compliance. | 59 |
| abstract_inverted_index.conditions, | 82 |
| abstract_inverted_index.ecosystems. | 165 |
| abstract_inverted_index.experiments | 118 |
| abstract_inverted_index.non-private | 146 |
| abstract_inverted_index.operational | 55 |
| abstract_inverted_index.transparent | 114 |
| abstract_inverted_index.differential | 46 |
| abstract_inverted_index.increasingly | 5 |
| abstract_inverted_index.large-scale, | 16 |
| abstract_inverted_index.regulations, | 14 |
| abstract_inverted_index.traditional, | 143 |
| abstract_inverted_index.anti-crawling | 8 |
| abstract_inverted_index.architecture: | 96 |
| abstract_inverted_index.reinforcement | 38 |
| abstract_inverted_index.cross-platform | 30 |
| abstract_inverted_index.decision-making | 79 |
| abstract_inverted_index.next-generation | 159 |
| abstract_inverted_index.blockchain-based | 109 |
| abstract_inverted_index.desensitization, | 107 |
| abstract_inverted_index.privacy-by-design | 95 |
| abstract_inverted_index.platforms—ranging | 121 |
| abstract_inverted_index.applications—demonstrate | 128 |
| cited_by_percentile_year | |
| corresponding_author_ids | https://openalex.org/A5042875971 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 1 |
| citation_normalized_percentile.value | 0.13737079 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |