The code and data behind xeiaso.net
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

The Social Quandry of Devops (#440)

* the social quandry of devops

Signed-off-by: Xe Iaso <me@christine.website>

* the social quandry of devops: more better

Signed-off-by: Xe Iaso <me@christine.website>

authored by

Xe Iaso and committed by
GitHub
f45ca40a 10a086d6

+260
+260
blog/social-quandry-devops-2022-03-17.markdown
··· 1 + --- 2 + title: Technical Solutions Poorly Solve Social Problems 3 + date: 2022-03-17 4 + tags: 5 + - devops 6 + --- 7 + 8 + [I just wanna lead this article out by saying that _I do not have all the 9 + answers here_. I really wish I did, but I also feel that I shouldn't have to 10 + have an answer in mind in order to raise a question. Please also keep in mind 11 + that this is coming from someone who has been working in devops for most of 12 + their career.](conversation://Cadey/coffee) 13 + 14 + ## Or: The Social Quandry of Devops 15 + 16 + Technology is the cornerstone of our society. As a people we have seen the 17 + catalytic things that technology has enabled us to do. Through technology and 18 + new and innovative ways of applying it, we can help solve many problems. This 19 + leads some to envision technology as a panacea, a mythical cure-all that will 20 + make all our problems go away with the right use of it. 21 + 22 + This does not extend to social problems. Technical fixes for social problems are 23 + how we end up with an inadequate mess that can make the problem a lot worse than 24 + it was before. You've almost certainly been able to see this in action with 25 + social media (under the belief that allowing people to connect is so morally 26 + correct that it will bring in a new age of humanity that will be objectively 27 + good for everyone). The example I want to focus on today is the Devops 28 + philosophy. Devops is a technical solution (creating a new department) that 29 + helps work around social problems in workplaces (fundamental differences in 30 + priorities and end goals), and in the process it doesn't solve either very well. 31 + 32 + There are a lot of skillset paths that you can end up with in tech, but the two 33 + biggest ones are development (making the computer do new things) and systems 34 + administration (making computers keep doing those things). There are many other 35 + silos in the industry (technical writing, project/product management, etc.), but 36 + the two main ones are development and systems administration. These two groups 37 + have vastly different priorities, skillsets, needs and future goals, and as a 38 + result of this there is very little natural cross-pollenation between the two 39 + silos. I have seen this evolve into cultural resentment. 40 + 41 + [Not to say that this phenomenon is exclusive to inter-department ties, I've 42 + also seen it happen intra-department over choice of programming language.](conversation://Cadey/coffee) 43 + 44 + As far as the main differences go, development usually sees what could be. What 45 + new things could exist and what steps you need to take to get people there. This 46 + usually involves designing and implementing new software. The systems 47 + administration side of things is more likely to see it as a matter of 48 + integrating things into an existing whole, and then ensuring that whole is 49 + reliable and proven so they don't have to worry about it constantly. This causes 50 + a slower velocity forward and can result in extra process, slow momentum and 51 + stagnation. These two forces naturally come into conflict because they are 52 + vastly different things and have vastly different requirements and expectations. 53 + 54 + Development may want to use a new version of the compiler to support a language 55 + feature that will eliminate a lot of repetitive boilerplate. The sysadmins may 56 + not be able to ship that compiler in production build toolstack because of 57 + conflicting dependencies elsewhere, but they may also not want to ship that 58 + compiler because of fears over trusting unproven software in production. 59 + 60 + [This fear sounds really odd at first glance, but this is a paraphrased version 61 + of a problem I actually encountered in the real world at one of my first big 62 + tech jobs. This place had some unique tech choices such as making their own fork 63 + of Ubuntu for "stability reasons", and the process to upgrade tools was a huge 64 + pain on the sysadmin side because it meant retesting and deploying a lot of 65 + internal tooling, which took a lot longer than the engineering team had patience 66 + for. This may not be the best example from a technical standpoint, but things 67 + don't have to make sense for them to exist.](conversation://Cadey/coffee) 68 + 69 + This tension builds over a long period of time and can cause problems when the 70 + sysadmin team is chronically underfunded (due to the idea that they are 71 + successful when nothing goes wrong, also incurring the problem of success being 72 + a negative, which can make the sysadmin team look like a money pit when they are 73 + actually the very thing that is making the money generator generate money). This 74 + can also lead to avoidable burnout, unwarranted anxiety issues and unneeded 75 + suffering on both ends of the conflict. 76 + 77 + So given the unstoppable force of development and the immovable wall of 78 + sysadmin, an organizational compromise was made. This started out as many things 79 + with many names, but as the idea rippled throughout people's heads the name 80 + "devops" ended up sticking. Devops is a hybrid of traditional software 81 + development and systems administration. On paper this should be great. The silos 82 + will shrink. People will understand the limits and needs of the others. Managers 83 + will be able to have more flexible employees. 84 + 85 + Unfortunately though, a lot of the ideas behind devops and the overall 86 + philosophy really do require you to radically burn down everything and start 87 + from scratch. This tends to really not be conducive to engineering timetables 88 + and overall system stability during the age of turbulence. 89 + 90 + [What's the problem with burning everything down? Fire cleanses all things and 91 + purifies away the unworthy!](conversation://Numa/delet) 92 + 93 + [Not when you're the one being burned!](conversation://Cadey/angy) 94 + 95 + [Wait, so what actually happens then? Does it just end up being a sysadmin team 96 + made up out of coders?](conversation://Mara/hmm) 97 + 98 + [Yeeeeeeeeep.](conversation://Numa/stare) 99 + 100 + Yeah, in practice this ends up being a "new team" or a reboot of an existing 101 + team in ways that is suddenly compelling or sexy to executives because a new 102 + buzzword is on the scene. Realistically, devops did end up getting a proper 103 + definition at a buzzword conference level (being able to handle development and 104 + deployment of services from editor to production), but in practice this ends up 105 + being just some random developers that you tricked into caring about production 106 + now while also telling them that they're better than the sysadmins. 107 + 108 + [Two jobs for the price of one!](conversation://Numa/delet) 109 + 110 + This ends up shafting the sysadmin team even harder because the new fancy devops 111 + team has things they can talk about as positives for their quarters, so people 112 + can more easily make a case for promotion. As a sysadmin, your "success" case is 113 + "bad things didn't happen", which means success can't stand out on reviews. 114 + Consider "scaled production above the rate of our customer acquistion rate" 115 + against "set up continuous delivery to ensure velocity on our team, saving 50 116 + hours of effort per week". Which one of those do you think gets you promoted? 117 + Which one of those do you think gets headcount for new hires? 118 + 119 + This has human costs too. At one of my past jobs doing more sysadmin-y things 120 + (it was marketed as a devops hybrid role, but the "hybrid" part was more of 121 + "frantically patch up the sinking ship with code" and not traditional software 122 + development). Sleep is really essential to helping you function properly to do 123 + your job. During the times when I was pager bitch, there was at least a 1/8 124 + chance that I would be woken up in the middle of the night to handle a problem. 125 + I had to change my pager tone 15 times and still get goosebumps hearing those 126 + old sounds nearly a decade later. This ended up being a huge factor in my 127 + developing anxiety issues that I still feel today. I ended up getting addicted 128 + to weed really bad for a few years. I admit that I'm really not the most robust 129 + person in the world, but these things add up. 130 + 131 + [I guess "addicted to weed" isn't totally accurate or inaccurate here, it's more 132 + that I was addicted to the feeling of being high rather than dependence on the 133 + drug itself. Either way, it was bad and weed was my cope. It also probably 134 + really didn't help that I was also starting hormone replacement therapy at the 135 + time, so I was going through second puberty at the time as well. This is the 136 + kind of human capital cost when dealing with dysfunction like this. I've always 137 + been kind of afraid to speak up about this.](conversation://Cadey/coffee) 138 + 139 + However, there are real technical problems that can only really be solved from a 140 + devops perspective. Tools like Docker would probably never have happened in the 141 + way they did if the devops philosophy didn't exist. 142 + 143 + ![A three panel meme with an old man talking to a child. The child says "it 144 + works on my machine". The old man replies with "then we'll ship your machine". 145 + The last panel says "and that is how docker was 146 + born".](https://cdn.christine.website/file/christine-static/blog/1BDBBB94-7052-4E4C-AE32-CFEE4226CBA8.jpeg) 147 + 148 + In a way, Docker is one of the perfect examples of the devops philosophy. It 149 + allows developers to have their own custom versions of everything. They can use 150 + custom compilers that the sysadmins don't have to integrate into everything. 151 + They can experiment with new toolstacks, languages and build systems without 152 + worrying about how they integrate into existing processes. And in the process it 153 + defaults to things that are so hilariously unsafe that you only really realize 154 + the problems when they own you. It makes it easy to ship around configurations 155 + for services yes, but it doesn't make supply chain management easy at all. 156 + 157 + [Wait, what about that? How does that make any sense?](conversation://Mara/wat) 158 + 159 + Okay, let's consider this basic Dockerfile that builds a Go service. If you 160 + start from very little knowledge of what's going on, you'd probably end up with 161 + something like this: 162 + 163 + ```Dockerfile 164 + FROM golang:1.17 165 + 166 + WORKDIR /usr/src/app 167 + 168 + COPY go.mod go.sum ./ 169 + RUN go mod download && go mod verify 170 + 171 + COPY . . 172 + RUN go build -v -o /usr/local/bin/app ./... 173 + 174 + CMD ["app"] 175 + ``` 176 + 177 + This allows you to pin the versions of things like the Go compiler without 178 + bothering the sysadmin team to make it available, but in the process you also 179 + don't know what version of the compiler you are actually running. Let's say that 180 + you have all your Docker images built with CI and that CI has an image cache set 181 + up (as is the default in many CI systems). On your laptop you may end up getting 182 + the latest release of Go 1.17 (at the time of writing, this is version 1.17.8), 183 + but since CI may have seen this before and may have an old version of the `1.17` 184 + tag cached. This would mean that despite your efforts at making things easy to 185 + recreate, you've just accidentally put [an ASN.1 parsing 186 + DoS](https://github.com/golang/go/issues/50165) into production, even though 187 + your local machine will never have this issue! Not to mention if the image 188 + you're using has a glibc bug, a DNS parsing bug or any issue with one of the 189 + packages that makes up the image. 190 + 191 + [So as a side effect of burning down everything and starting over you don't 192 + actually get a lot of the advantages that the old system had in spite of the 193 + dysfunction?](conversation://Mara/hmm) 194 + 195 + [Yep! Realistically though you can get around this by using exact sha256 hashes 196 + of the precise Docker image you want, however this isn't the _default_ behavior 197 + so nobody will really know about it. There are ways to work around this with 198 + tools like Nix, but that is a topic for another day.](conversation://Cadey/coffee) 199 + 200 + This is what the devops experience feels like, chaining together tools that 201 + require careful handling to avoid accidental security flaws in ways that the 202 + traditional sysadmin team approach fundamentally avoided by design. By 203 + sidestepping the sysadmin team's stability and process, you learn nothing from 204 + what they were doing. 205 + 206 + [This is all of course assuming that at the same time as you go devops, you also 207 + avow the grandeur of the cloud. Statistics say that these two usually go hand in 208 + hand as the cloud is sold to executives as good for 209 + devops.](conversation://Cadey/coffee) 210 + 211 + As for how to get out of this mess though, I'm not sure. Like I said, this is a 212 + _social_ problem that is trying to be solved through a _business organizational_ 213 + fix. I am a technical solutions kind of person and as such I'm really not the 214 + right person to ask about all this. I don't want to propose a solution here. 215 + I've thought out several ideas, but I got nowhere with them fast. 216 + 217 + I remember at one of my jobs where I was a devops I ended up also having to be 218 + the tutor on how fundamental parts of the programming language they are using 219 + work. This one service that was handling a lot of production load had issues 220 + where it would just panic and die randomly when a very large customer was trying 221 + to view a list of things that was two orders of magnitude larger than other 222 + customers that use that service. I eventually ended up figuring out where the 223 + issue was but then I had an even harder time explaining what concurrency does at 224 + a fundamental level and how race conditions can make things crash due to 225 + undefined behavior. I think it ended up being a 3 line fix too. 226 + 227 + I guess the thing that would really help with this is education and helping 228 + people hone their skills as developers. I understand that there's a learning 229 + curve and not everyone is going to become a programming god overnight, but every 230 + little bit sets off butterfly effects that will ripple down in other ways. Any 231 + solution that requires everyone be a programming god isn't viable for anyone, 232 + including programming gods. 233 + 234 + [This whole mentorship thing only really works when the company you work for 235 + doesn't de-facto punish you for mentoring people like that. If you aren't 236 + careful about how you frame this, doing that could make it difficult for you to 237 + prove yourself come review time. "Helped other people do their jobs better" 238 + doesn't really look good for a promotion committee.](conversation://Numa/delet) 239 + 240 + [Yeah but what are you supposed to do if that kind of mentorship is what really 241 + helps motivate you as a person and is what you really enjoy doing? I don't 242 + really see "mentor" as a job title on any postings.](conversation://Mara/hmm) 243 + 244 + [There's always getting tired of trying to change things from within and then 245 + writing things out on a publicly visible blog, building up a bunch of articles 246 + over time. Then you could use that body of work as a way to meme yourself into 247 + hiring pipelines thanks to people sharing your links on aggegators like the 248 + orange site. It'd probably help if you also got a reputation as a shitposter, 249 + usually when people are able to openly joke about something that signals that 250 + they are pretty damn experienced in it.](conversation://Numa/stare) 251 + 252 + [You're describing this blog aren't you.](conversation://Cadey/facepalm) 253 + 254 + Like I said though, this is hard. A lot of the problems are actually structural 255 + problems in how companies do the science part of computer science. Structural 256 + problems cannot be solved overnight. These things take time, effort and patience 257 + to truly figure out and in the process you will fail to invent a light bulb many 258 + times over. Devops is probably a necessary evil, but I really wish that 259 + situations weren't toxic enough in the first place to require that evil. 260 +