@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at recaptime-dev/main 138 lines 6.9 kB view raw
1@title Things You Should Do Now 2@group sundry 3 4Describes things you should do now when building software, because the cost to 5do them increases over time and eventually becomes prohibitive or impossible. 6 7 8= Overview = 9 10If you're building a hot new web startup, there are a lot of decisions to make 11about what to focus on. Most things you'll build will take about the same amount 12of time to build regardless of what order you build them in, but there are a few 13technical things which become vastly more expensive to fix later. 14 15If you don't do these things early in development, they'll become very hard or 16impossible to do later. This is basically a list of things that would have saved 17Facebook huge amounts of time and effort down the road if someone had spent 18a tiny amount of time on them earlier in the development process. 19 20See also @{article:Things You Should Do Soon} for things that scale less 21drastically over time. 22 23 24= Start IDs At a Gigantic Number = 25 26If you're using integer IDs to identify data or objects, **don't** start your 27IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will 28ever appear in any other role in your application (like a count, a natural 29index, a byte size, a timestamp, etc). This takes about 5 seconds if you do it 30before you launch and rules out a huge class of nasty bugs for all time. It 31becomes incredibly difficult as soon as you have production data. 32 33The kind of bug that this causes is accidental use of some other value as an ID: 34 35 COUNTEREXAMPLE 36 // Load the user's friends, returns a map of friend_id => true 37 $friend_ids = user_get_friends($user_id); 38 39 // Get the first 8 friends. 40 $first_few_friends = array_slice($friend_ids, 0, 8); 41 42 // Render those friends. 43 render_user_friends($user_id, array_keys($first_few_friends)); 44 45Because array_slice() in PHP discards array indices and renumbers them, this 46doesn't render the user's first 8 friends but the users with IDs 0 through 7, 47e.g. Mark Zuckerberg (ID 4) and Dustin Moskovitz (ID 6). If you have IDs in this 48range, sooner or later something that isn't an ID will get treated like an ID 49and the operation will be valid and cause unexpected behavior. This is 50completely avoidable if you start your IDs at a gigantic number. 51 52 53= Only Store Valid UTF-8 = 54 55For the most part, you can ignore UTF-8 and unicode until later. However, there 56is one aspect of unicode you should address now: store only valid UTF-8 strings. 57 58Assuming you're storing data internally as UTF-8 (this is almost certainly the 59right choice and definitely the right choice if you have no idea how unicode 60works), you just need to sanitize all the data coming into your application and 61make sure it's valid UTF-8. 62 63If your application emits invalid UTF-8, other systems (like browsers) will 64break in unexpected and interesting ways. You will eventually be forced to 65ensure you emit only valid UTF-8 to avoid these problems. If you haven't 66sanitized your data, you'll basically have two options: 67 68 - do a huge migration on literally all of your data to sanitize it; or 69 - forever sanitize all data on its way out on the read pathways. 70 71As of 2011 Facebook is in the second group, and spends several milliseconds of 72CPU time sanitizing every display string on its way to the browser, which 73multiplies out to hundreds of servers worth of CPUs sitting in a datacenter 74paying the price for the invalid UTF-8 in the databases. 75 76You can likely learn enough about unicode to be confident in an implementation 77which addresses this problem within a few hours. You don't need to learn 78everything, just the basics. Your language probably already has a function which 79does the sanitizing for you. 80 81 82= Never Design a Denylist-Based Security System = 83 84When you have an alternative, don't design security systems which are default 85permit, denylist-based, or otherwise attempt to enumerate badness. When 86Facebook launched Platform, it launched with a denylist-based CSS filter, which 87basically tried to enumerate all the "bad" parts of CSS and filter them out. 88This was a poor design choice and lead to basically infinite security holes for 89all time. 90 91It is very difficult to enumerate badness in a complex system and badness is 92often a moving target. Instead of trying to do this, design allowlist-based 93security systems where you list allowed things and reject anything you don't 94understand. Assume things are bad until you verify that they're OK. 95 96It's tempting to design denylist-based systems because they're easier to write 97and accept more inputs. In the case of the CSS filter, the product goal was for 98users to just be able to use CSS normally and feel like this system was no 99different from systems they were familiar with. An allowlist-based system would 100reject some valid, safe inputs and create product friction. 101 102But this is a much better world than the alternative, where the denylist-based 103system fails to reject some dangerous inputs and creates //security holes//. It 104//also// creates product friction because when you fix those holes you break 105existing uses, and that backward-compatibility friction makes it very difficult 106to move the system from a denylist to an allowlist. So you're basically in 107trouble no matter what you do, and have a bunch of security holes you need to 108unbreak immediately, so you won't even have time to feel sorry for yourself. 109 110Designing denylist-based security is one of the worst now-vs-future tradeoffs 111you can make. See also "The Six Dumbest Ideas in Computer Security": 112 113http://www.ranum.com/security/computer_security/ 114 115 116= Fail Very Loudly when SQL Syntax Errors Occur in Production = 117 118This doesn't apply if you aren't using SQL, but if you are: detect when a query 119fails because of a syntax error (in MySQL, it is error 1064). If the failure 120happened in production, fail in the loudest way possible. (I implemented this in 1212008 at Facebook and had it just email me and a few other people directly. The 122system was eventually refined.) 123 124This basically creates a high-signal stream that tells you where you have SQL 125injection holes in your application. It will have some false positives and could 126theoretically have false negatives, but at Facebook it was pretty high signal 127considering how important the signal is. 128 129Of course, the real solution here is to not have SQL injection holes in your 130application, ever. As far as I'm aware, this system correctly detected the one 131SQL injection hole we had from mid-2008 until I left in 2011, which was in a 132hackathon project on an underisolated semi-production tier and didn't use the 133query escaping system the rest of the application does. 134 135Hopefully, whatever language you're writing in has good query libraries that 136can handle escaping for you. If so, use them. If you're using PHP and don't have 137a solution in place yet, the Phorge implementation of `qsprintf()` is 138similar to Facebook's system and was successful there.