{ "id": "https://ryan.freumh.org/claude-code.html", "title": "A Week With Claude Code", "link": "https://ryan.freumh.org/claude-code.html", "updated": "2025-04-21T00:00:00", "published": "2025-04-21T00:00:00", "summary": "
I tried using Claude\nCode while writing Caledonia, and these\nare the notes I took on the experience. It’s possible some of the\ndeficiencies are due to the model’s smaller training set of OCaml code\ncompared to more popular languages, but there’s work being done\nto improve this situation.
\nIt needs a lot of hand-holding, often finding it\nvery difficult to get out of simple mistakes. For example, it frequently\nforgot to bracket nested match statements,
\nmatch expr1 with\n| Pattern1 ->\n match expr2 with\n | Pattern2a -> result2a\n | Pattern2b -> result2b\n| Pattern2 -> result2and it found it difficult to fix this as the\ncompiler error message only showed the line with Pattern2. An interesting note here is that tools\nthat are easy for humans to use, e.g. with great error messages, are\nalso easy for the LLM to use. But, unlike (I hope) a human, even after\nadding a rule to avoid this in CLAUDE.md\nit frequently ignored it.
It often makes code very verbose or inelegant,\nespecially after repeated rounds of back-and-forth with the compiler. It\nrarely shortens code, whereas some of the best changes I make to\ncodebases have a negative impact on the lines of code (LoC) count. I\nthink this is how you end up with 35k LoC recipe\napps, and I wonder how maintainable these codes bases will\nbe.
\nIf you give it a high level task, even after\ncreating an architecture plan, it often makes poor design decisions that\ndon’t consider future scenarios. For example, it combined all the .ics files into a single calendar which when it\ncomes to modifying events them will make it impossible to write edits\nback. Another example of where it unnecessarily constrained interfaces\nwas by making query and sorting parameters variants, whereas porting\nto a lambda and comparator allowed for more expressivity with the same\nbrevity.
But while programming I often find myself doing a\nlot of ‘plumbing’ things through, and it excels at these more mundane\ntasks. It’s also able to do more intermediate tasks, with some back and\nforth about design decision. For example, once I got the list command\nworking it was able to get the query command working without me writing\nany code – just prompting with design suggestions like pulling common\nparameters into a separate module (see the verbosity point again).\nAnother example of a task where it excels is writing command line\nargument parsing logic, with more documentation than I would have the\nwill to write myself.
\nIt’s also awesome to get it to write tests where I\nwould never otherwise for a personal project, even with the above\ncaveats applying to them. It also gives the model something to check\nagainst when making changes, though when encountering errors with tests\nit tends to change the test to be incorrect to pass the compiler, rather\nthan fixing the underlying problem.
\nIt’s somewhat concerning that this agent is running\nwithout any sandboxing. There is some degree of control over what\ndirectories it can access, and what tools it can invoke, but I’m sure a\nsufficiently motivated adversary could trivially get around all of them.\nWhile deploying Enki on hippo\nI tested out using it to change the NixOS config, and after making the\nchange it successfully invoked sudo to do\na nixos-rebuild switch as I had just used\nsudo myself in the same shell session. Patrick’s work on shelter could\nprove invaluable for this, while also giving the agent ‘rollback’\ncapabilities!
Something I’m wondering about while using these\nagents is whether they’ll just be another tool to augment the\ncapabilities of software engineers; or if they’ll increasingly replace\nthe need for software engineers entirely.
\nI tend towards the former, but only time will\ntell.
\nIf you have any questions or comments on this feel\nfree to get in touch.
", "content": "I tried using Claude\nCode while writing Caledonia, and these\nare the notes I took on the experience. It’s possible some of the\ndeficiencies are due to the model’s smaller training set of OCaml code\ncompared to more popular languages, but there’s work being done\nto improve this situation.
\nIt needs a lot of hand-holding, often finding it\nvery difficult to get out of simple mistakes. For example, it frequently\nforgot to bracket nested match statements,
\nmatch expr1 with\n| Pattern1 ->\n match expr2 with\n | Pattern2a -> result2a\n | Pattern2b -> result2b\n| Pattern2 -> result2and it found it difficult to fix this as the\ncompiler error message only showed the line with Pattern2. An interesting note here is that tools\nthat are easy for humans to use, e.g. with great error messages, are\nalso easy for the LLM to use. But, unlike (I hope) a human, even after\nadding a rule to avoid this in CLAUDE.md\nit frequently ignored it.
It often makes code very verbose or inelegant,\nespecially after repeated rounds of back-and-forth with the compiler. It\nrarely shortens code, whereas some of the best changes I make to\ncodebases have a negative impact on the lines of code (LoC) count. I\nthink this is how you end up with 35k LoC recipe\napps, and I wonder how maintainable these codes bases will\nbe.
\nIf you give it a high level task, even after\ncreating an architecture plan, it often makes poor design decisions that\ndon’t consider future scenarios. For example, it combined all the .ics files into a single calendar which when it\ncomes to modifying events them will make it impossible to write edits\nback. Another example of where it unnecessarily constrained interfaces\nwas by making query and sorting parameters variants, whereas porting\nto a lambda and comparator allowed for more expressivity with the same\nbrevity.
But while programming I often find myself doing a\nlot of ‘plumbing’ things through, and it excels at these more mundane\ntasks. It’s also able to do more intermediate tasks, with some back and\nforth about design decision. For example, once I got the list command\nworking it was able to get the query command working without me writing\nany code – just prompting with design suggestions like pulling common\nparameters into a separate module (see the verbosity point again).\nAnother example of a task where it excels is writing command line\nargument parsing logic, with more documentation than I would have the\nwill to write myself.
\nIt’s also awesome to get it to write tests where I\nwould never otherwise for a personal project, even with the above\ncaveats applying to them. It also gives the model something to check\nagainst when making changes, though when encountering errors with tests\nit tends to change the test to be incorrect to pass the compiler, rather\nthan fixing the underlying problem.
\nIt’s somewhat concerning that this agent is running\nwithout any sandboxing. There is some degree of control over what\ndirectories it can access, and what tools it can invoke, but I’m sure a\nsufficiently motivated adversary could trivially get around all of them.\nWhile deploying Enki on hippo\nI tested out using it to change the NixOS config, and after making the\nchange it successfully invoked sudo to do\na nixos-rebuild switch as I had just used\nsudo myself in the same shell session. Patrick’s work on shelter could\nprove invaluable for this, while also giving the agent ‘rollback’\ncapabilities!
Something I’m wondering about while using these\nagents is whether they’ll just be another tool to augment the\ncapabilities of software engineers; or if they’ll increasingly replace\nthe need for software engineers entirely.
\nI tend towards the former, but only time will\ntell.
\nIf you have any questions or comments on this feel\nfree to get in touch.
", "content_type": "html", "categories": [], "source": "https://ryan.freumh.org/atom.xml" }