Wikipedia:Bots/Requests for approval/PageLinkScraper

The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was

Denied.

PageLinkScraper

New to bots on Wikipedia? Read these primers!

Approval process – How this discussion works
Overview/Policy – What bots are/What they can (or can't) do
Dictionary – Explains bot-related jargon

Operator: ChintanP04 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:14, Monday, March 10, 2025 (UTC)

Function overview: This bot's only function is to crawl the wiki and extract page titles, page URLs, and all the outgoing links within a page.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Will be made available upon request (or link to GitHub if applicable)

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: Thousands (depending on API rate limits and query depth)

Namespace(s): Mainspace (Articles)

Exclusion compliant (Yes/No): Yes

Adminbot (Yes/No): No

Function details: This bot will: - Query Wikipedia’s API to retrieve article titles in batches (up to 500 per request). - Fetch the full URL of each article. - Extract all internal links (outgoing links) from each article. - Store the data for further graph-based visualization or analysis.

The bot will **not** edit any Wikipedia pages. It will strictly read data and follow Wikipedia's API usage policies, including rate limiting to avoid excessive load on Wikipedia servers.

I realize I could try downloading the Wikipedia database but I specifically want to do this project via the http request method. If the process for the requests takes too long, I will do the database download method.

Discussion

Denied. WP:BOTNOTNOW. Please edit Wikipedia for a while to gain experience and understanding of our policies. Also, please note that the API is not the best method to do this, and bots are not allowed to download large amounts of content by requesting many individual pages. If you need such content, download the database dumps instead. – DreamRimmer (talk) 07:50, 10 March 2025 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.