The code and data behind xeiaso.net
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

chore(k8s/xesite): update anubis to use yaml config

Signed-off-by: Xe Iaso <me@xeiaso.net>

+63 -2
+51
manifest/xesite/anubis/botPolicies.yaml
··· 1 + ## Anubis has the ability to let you import snippets of configuration into the main 2 + ## configuration file. This allows you to break up your config into smaller parts 3 + ## that get logically assembled into one big file. 4 + ## 5 + ## Of note, a bot rule can either have inline bot configuration or import a 6 + ## bot config snippet. You cannot do both in a single bot rule. 7 + ## 8 + ## Import paths can either be prefixed with (data) to import from the common/shared 9 + ## rules in the data folder in the Anubis source tree or will point to absolute/relative 10 + ## paths in your filesystem. If you don't have access to the Anubis source tree, check 11 + ## /usr/share/docs/anubis/data or in the tarball you extracted Anubis from. 12 + 13 + bots: 14 + # Pathological bots to deny 15 + - # This correlates to data/bots/ai-robots-txt.yaml in the source tree 16 + import: (data)/bots/ai-robots-txt.yaml 17 + - import: (data)/bots/cloudflare-workers.yaml 18 + - import: (data)/bots/headless-browsers.yaml 19 + - import: (data)/bots/us-ai-scraper.yaml 20 + 21 + # Search engines to allow 22 + - import: (data)/crawlers/googlebot.yaml 23 + - import: (data)/crawlers/bingbot.yaml 24 + - import: (data)/crawlers/duckduckbot.yaml 25 + - import: (data)/crawlers/qwantbot.yaml 26 + - import: (data)/crawlers/internet-archive.yaml 27 + - import: (data)/crawlers/kagibot.yaml 28 + - import: (data)/crawlers/marginalia.yaml 29 + - import: (data)/crawlers/mojeekbot.yaml 30 + 31 + # Allow common "keeping the internet working" routes (well-known, favicon, robots.txt) 32 + - import: (data)/common/keep-internet-working.yaml 33 + - import: /xe/cfg/anubis/xesite-rss-feeds.yaml 34 + 35 + # # Punish any bot with "bot" in the user-agent string 36 + # # This is known to have a high false-positive rate, use at your own risk 37 + # - name: generic-bot-catchall 38 + # user_agent_regex: (?i:bot|crawler) 39 + # action: CHALLENGE 40 + # challenge: 41 + # difficulty: 16 # impossible 42 + # report_as: 4 # lie to the operator 43 + # algorithm: slow # intentionally waste CPU cycles and time 44 + 45 + # Generic catchall rule 46 + - name: generic-browser 47 + user_agent_regex: > 48 + Mozilla|Opera 49 + action: CHALLENGE 50 + 51 + dnsbl: false
+9
manifest/xesite/anubis/xesite-rss-feeds.yaml
··· 1 + - name: blog-rss-feed 2 + action: ALLOW 3 + path_regex: ^/blog.rss$ 4 + - name: blog-json-feed 5 + action: ALLOW 6 + path_regex: ^/blog.json$ 7 + - name: xecast-rss-feed 8 + action: ALLOW 9 + path_regex: ^/xecast.rss$
+1 -1
manifest/xesite/deployment.yaml
··· 80 80 - name: "METRICS_BIND" 81 81 value: ":9090" 82 82 - name: "POLICY_FNAME" 83 - value: "/xe/cfg/anubis/botPolicies.json" 83 + value: "/xe/cfg/anubis/botPolicies.yaml" 84 84 - name: "SERVE_ROBOTS_TXT" 85 85 value: "false" 86 86 - name: "TARGET"
+2 -1
manifest/xesite/kustomization.yaml
··· 12 12 - name: anubis-cfg 13 13 behavior: create 14 14 files: 15 - - ./anubis/botPolicies.json 15 + - ./anubis/botPolicies.yaml 16 + - ./anubis/xesite-rss-feeds.yaml