@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Disallow webcrawlers to follow Paste line number anchor links

Summary:
Paste provides line anchor links in every single line of a paste.
If webcrawlers follow these links, they index the very same Paste again.
Thus disallow in robots.txt to reduce unneeded traffic and indexing time.

Closes T15662

Test Plan:
Go to `/robots.txt` in the web browser.
Cross fingers that more webcrawlers abide by RFC 9309.

Reviewers: O1 Blessed Committers, valerio.bozzolan

Reviewed By: O1 Blessed Committers, valerio.bozzolan

Subscribers: tobiaswiese, valerio.bozzolan, Matthew, Cigaryno

Maniphest Tasks: T15662

Differential Revision: https://we.phorge.it/D25461

+7
+7
src/applications/system/controller/robots/PhabricatorRobotsPlatformController.php
··· 19 19 $out[] = 'Disallow: /diffusion/'; 20 20 $out[] = 'Disallow: /source/'; 21 21 22 + // See T15662. Prevent indexing line anchor links in Pastes. Per RFC 9309 23 + // section 2.2.3, percentage-encode "$" to avoid interpretation as end of 24 + // match pattern. However, crawlers may not abide by it but follow the 25 + // original standard at https://www.robotstxt.org/orig.html with no mention 26 + // how to interpret characters like "$" and thus entirely ignore this rule. 27 + $out[] = 'Disallow: /P*%24*'; 28 + 22 29 // Add a small crawl delay (number of seconds between requests) for spiders 23 30 // which respect it. The intent here is to prevent spiders from affecting 24 31 // performance for users. The possible cost is slower indexing, but that