@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Explain upstream attitudes toward CLI exit codes

Summary: Ref T5991. See D14116. We are consistent but nonstandard in our use of exit codes. This document explains what we use exit codes for and why we do this.

Test Plan: Read it.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T5991

Differential Revision: https://secure.phabricator.com/D14173

+243
+243
src/docs/user/field/exit_codes.diviner
··· 1 + @title Command Line Exit Codes 2 + @group fieldmanual 3 + 4 + Explains the use of exit codes in Phabricator command line scripts. 5 + 6 + Overview 7 + ======== 8 + 9 + When you run a command from the command line, it exits with an //exit code//. 10 + This code is normally not shown on the CLI, but you can examine the exit code 11 + of the last command you ran by looking at `$?` in your shell: 12 + 13 + $ ls 14 + ... 15 + $ echo $? 16 + 0 17 + 18 + Programs which run commands can operate on exit codes, and shell constructs 19 + like `cmdx && cmdy` operate on exit codes. 20 + 21 + The code `0` means success. Other codes signal some sort of error or status 22 + condition, depending on the system and command. 23 + 24 + With rare exception, Phabricator uses //all other codes// to signal 25 + **catastrophic failure**. 26 + 27 + This is an explicit architectural decision and one we are unlikely to deviate 28 + from: generally, we will not accept patches which give a command a nonzero exit 29 + code to indicate an expected state, an application status, or a minor abnormal 30 + condition. 31 + 32 + Generally, this decision reflects a philosophical belief that attaching 33 + application semantics to exit codes is a relic of a simpler time, and that 34 + they are not appropriate for communicating application state in a modern 35 + operational environment. This document explains the reasoning behind our use of 36 + exit codes in more detail. 37 + 38 + In particular, this approach is informed by a focus on operating Phabricator 39 + clusters at scale. This is not a common deployment scenario, but we consider it 40 + the most important one. Our use of exit codes makes it easier to deploy and 41 + operate a Phabricator cluster at larger scales. It makes it slightly harder to 42 + deploy and operate a small cluster or single host by gluing together `bash` 43 + scripts. We are willingly trading the small scale away for advantages at larger 44 + scales. 45 + 46 + 47 + Problems With Exit Codes 48 + ======================== 49 + 50 + We do not use exit codes to communicate application state because doing so 51 + makes it harder to write correct scripts, and the primary benefit is that it 52 + makes it easier to write incorrect ones. 53 + 54 + This is somewhat at odds with the philosophy of "worse is better", but a modern 55 + operations environment faces different forces than the interactive shell did 56 + in the 1970s, particularly at scale. 57 + 58 + We consider correctness to be very important to modern operations environments. 59 + In particular, we manage a Phabricator cluster (Phacility) and believe that 60 + having reliable, repeatable processes for provisioning, configuration and 61 + deployment is critical to maintaining and scaling our operations. Our use of 62 + exit codes makes it easier to implement processes that are correct and reliable 63 + on top of Phabricator management scripts. 64 + 65 + Exit codes as signals for application state are problematic because they are 66 + ambiguous: you can't use them to distinguish between dissimilar failure states 67 + which should prompt very different operational responses. 68 + 69 + Exit codes primarily make writing things like `bash` scripts easier, but we 70 + think you shouldn't be writing `bash` scripts in a modern operational 71 + environment if you care very much about your software working. 72 + 73 + Software environments which are powerful enough to handle errors properly are 74 + also powerful enough to parse command output to unambiguously read and react to 75 + complex state. Communicating application state through exit codes almost 76 + exclusively makes it easier to handle errors in a haphazard way which is often 77 + incorrect. 78 + 79 + 80 + Exit Codes are Ambiguous 81 + ======================== 82 + 83 + In many cases, exit codes carry very little information and many different 84 + conditions can produce the same exit code, including conditions which should 85 + prompt very different responses. 86 + 87 + The command line tool `grep` searches for text. For example, you might run 88 + a command like this: 89 + 90 + $ grep zebra corpus.txt 91 + 92 + This searches for the text `zebra` in the file `corpus.txt`. If the text is 93 + not found, `grep` exits with a nonzero exit code (specifically, `1`). 94 + 95 + Suppose you run `grep zebra corpus.txt` and observe a nonzero exit code. What 96 + does that mean? These are //some// of the possible conditions which are 97 + consistent with your observation: 98 + 99 + - The text `zebra` was not found in `corpus.txt`. 100 + - `corpus.txt` does not exist. 101 + - You do not have permission to read `corpus.txt`. 102 + - `grep` is not installed. 103 + - You do not have permission to run `grep`. 104 + - There is a bug in `grep`. 105 + - Your `grep` binary is corrupt. 106 + - `grep` was killed by a signal. 107 + 108 + If you're running this command interactively on a single machine, it's probably 109 + OK for all of these conditions to be conflated. You aren't going to examine the 110 + exit code anyway (it isn't even visible to you by default), and `grep` likely 111 + printed useful information to `stderr` if you hit one of the less common issues. 112 + 113 + If you're running this command from operational software (like deployment, 114 + configuration or monitoring scripts) and you care about the correctness and 115 + repeatability of your process, we believe conflating these conditions is not 116 + OK. The operational response to text not being present in a file should almost 117 + always differ substantially from the response to the file not being present or 118 + `grep` being broken. 119 + 120 + In a particularly bad case, a broken `grep` might cause a careless deployment 121 + script to continue down an inappropriate path and cascade into a more serious 122 + failure. 123 + 124 + Even in a less severe case, unexpected conditions should be detected and raised 125 + to operations staff. `grep` being broken or a file that is expected to exist 126 + not existing are both detectable, unexpected, and likely severe conditions, but 127 + they can not be differentiated and handled by examining the exit code of 128 + `grep`. It is much better to detect and raise these problems immediately than 129 + discover them after a lengthy root cause analysis. 130 + 131 + Some of these conditions can be differentiated by examining the specific exit 132 + code of the command instead of acting on all nonzero exit codes. However, many 133 + failure conditions produce the same exit codes (particularly code `1`) and 134 + there is no way to guarantee that a particular code signals a particular 135 + condition, especially across systems. 136 + 137 + Realistically, it is also relatively rare for scripts to even make an effort to 138 + distinguish between exit codes, and all nonzero exit codes are often treated 139 + the same way. 140 + 141 + 142 + Bash Scripts are not Robust 143 + ============================ 144 + 145 + Exit codes that indicate application status make writing `bash` scripts (or 146 + scripts in other tools which provide a thin layer on top of what is essentially 147 + `bash`) a lot easier and more convenient. 148 + 149 + For example, it is pretty tricky to parse JSON in `bash` or with standard 150 + command-line tools, and much easier to react to exit codes. This is sometimes 151 + used as an argument for communicating application status in exit codes. 152 + 153 + We reject this because we don't think you should be writing `bash` scripts if 154 + you're doing real operations. Funadmentally, `bash` shell scripts are not a 155 + robust building block for creating correct, reliable operational processes. 156 + 157 + Here is one problem with using `bash` scripts to perform operational tasks. 158 + Consider this command: 159 + 160 + $ mysqldump | gzip > backup.sql.gz 161 + 162 + Now, consider this command: 163 + 164 + $ mysqldermp | gzip > backup.sql.gz 165 + 166 + These commands represent a fairly standard way to accomplish a task (dumping 167 + a compressed database backup to disk) in a `bash` script. 168 + 169 + Note that the second command contains a typo (`dermp` instead of `dump`) which 170 + will cause the command to exit abruptly with a nonzero exit code. 171 + 172 + However, both these statements run successfully and exit with exit code `0` 173 + (indicating success). Both will create a `backup.sql.gz` file. One backs up 174 + your data; the other never backs up your data. This second command will never 175 + work and never do what the author intended, but will appear successful under 176 + casual inspection. 177 + 178 + These behaviors are the same under `set -e`. 179 + 180 + This fragile attitude toward error handling is endemic to `bash` scripts. The 181 + default behavior is to continue on errors, and it isn't easy to change this 182 + default. Options like `set -e` are unreliable and it is difficult to detect and 183 + react to errors in fundamental constructs like pipes. The tools that `bash` 184 + scripts employ (like `grep`) emit ambiguous error codes. Scripts can not help 185 + but propagate this ambiguity no matter how careful they are with error handling. 186 + 187 + It is likely //possible// to implement these things safely and correctly in 188 + `bash`, but it is not easy or straightforward. More importantly, it is not the 189 + default: the default behavior of `bash` is to ignore errors and continue. 190 + 191 + Gluing commands together in `bash` or something that sits on top of `bash` 192 + makes it easy and convenient to get a process that works fairly well most of 193 + the time at small scales, but we are not satisfied that it represents a robust 194 + foundation for operations at larger scales. 195 + 196 + 197 + Reacting to State 198 + ================= 199 + 200 + Instead of communicating application state through exit codes, we generally 201 + communicate application state through machine-parseable output with a success 202 + (`0`) exit code. All nonzero exit codes indicate catastrophic failure which 203 + requires operational intervention. 204 + 205 + Callers are expected to request machine-parseable output if necessary (for 206 + example, by passing a `--json` flag or other similar flags), verify the command 207 + exits with a `0` exit code, parse the output, then react to the state it 208 + communicates as appropriate. 209 + 210 + In a sufficiently powerful scripting environment (e.g., one with data 211 + structures and a JSON parser), this is straightforward and makes it easy to 212 + react precisely and correctly. It also allows scripts to communicate 213 + arbitrarily complex state. Provided your environment gives you an appropriate 214 + toolset, it is much more powerful and not significantly more complex than using 215 + error codes. 216 + 217 + Most importantly, it allows the calling environment to treat nonzero exit 218 + statuses as catastrophic failure by default. 219 + 220 + 221 + Moving Forward 222 + ============== 223 + 224 + Given these concerns, we are generally unwilling to bring changes which use 225 + exit codes to communicate application state (other than catastrophic failure) 226 + into the upstream. There are some exceptions, but these are rare. In 227 + particular, ease of use in a `bash` environment is not a compelling motivation. 228 + 229 + We are broadly willing to make output machine parseable or provide an explicit 230 + machine output mode (often a `--json` flag) if there is a reasonable use case 231 + for it. However, we operate a large production cluster of Phabricator instances 232 + with the tools available in the upstream, so the lack of machine parseable 233 + output is not sufficient to motivate adding such output on its own: we also 234 + need to understand the problem you're facing, and why it isn't a problem we 235 + face. A simpler or cleaner approach to the problem may already exist. 236 + 237 + If you just want to write `bash` scripts on top of Phabricator scripts and you 238 + are unswayed by these concerns, you can often just build a composite command to 239 + get roughly the same effect that you'd get out of an exit code. 240 + 241 + For example, you can pipe things to `grep` to convert output into exit codes. 242 + This should generally have failure rates that are comparable to the background 243 + failure level of relying on `bash` as a scripting environment.