There have been a lot of developments since I wrote about my first stab at creating an app via LLM, and by leveraging them I've since speedrun an AI generated concept from blank slate to passing the App Store acceptance gauntlet in ~20 minutes (less than 5% the time it took on my first attempt). My take on the best way to capitalize on the monumental shift in software development brought to bear by frontier LLMs has only grown stronger and the tools that empower it have (predictably) become even better.
In this post I walk through what's new and how best to apply it given my hands-on experience, through which I created a tool to help with this sort of workflow called arcode (which I'll also compare and contrast with other broadly available similar solutions).
When I first started experimenting with creating an app in a language I didn't know using modern generative AI, I had an intentionall primitive approach. Given the still relatively nascent nature of capable LLMs, I wanted to ensure there were as few external dependencies and abstractions as possible. The more variables I could control for, the better. This lead me to initially interfacing largely with various vendors' consumer facing UIs directly - an arrangement which came with its own set of pros and cons.
Most notably, this workflow incurred significant friction from shuttling code back and forth via copy/paste - but I generally got more bang for my buck given the flat monthly vs by-token billing model. Since my process involved dumping as many complete files that were relevant in my codebase into the prompt as possible and requesting complete modified files without truncation as output, I was generating a lot of throughput - and this input/output commonly triggered Anthropic and OpenAI's rate limits (Gemini had already constrained input message length in their UI to be insufficient for my purposes at the time of my initial experiments).
Still, I was getting a great deal. Looking back at the billing incurred when I ultimately switched to using API calls rather than monthly-billed consumer UIs, I'd easily run through a month subscription's ($20) worth of usage in just a few days on average.
Given this value proposition, I was more than happy to keep playing proxy for the robots - but the providers got wise and tightened the screws. Both OpenAI and Anthropic began limiting my message size arbitrarily below the max token length of the underlying models when using their UIs - a huge blow given that the CrayEye codebase had grown beyond the size of these new limits.
It's worth noting that at the time of this writing, Claude has since increased its UI message length constraints at least enough to accommodate the codebase it previously limited - alongside adding some other cool (and relevant) features like artifacts and projects. The direction of their UI is, in my opinion, the current gold standard for a consumer facing LLM product - and continues to improve. That said, when it initially broke it left me without a way to continue building the way I had been and served as the perfect impetus to embark on the next stage of my workflow optimization.
I was aware of Devin and Copilot Workspace, but wasn't hip enough to get off the waitlist for either. Of the myriad freely available solutions available, Aider looked the most interesting to me - it was a CLI tool, supported sending and receiving complete files, and was open-source. I fired it up and tried to make some minor changes to CrayEye, but due to an error in the way its default setting tried to generate and interpolate snippets of code rather than dealing with whole files, the change broke the app. I decided that, given my previous experience expediting development via holistic AI code generation, it was worth taking a quick pass at recreating a workflow that I already knew worked and using tools and UX I was well familiar with. This is all to say that yes, there are other (frankly more sophisticated) tools available for AI driven development today (more on that later) - but we're already at the point where, at least for certain simple workflows, it's just as quick to build something bespoke as it is to come up to speed on something new that you aren't acquainted with.
In order to unblock the ability to work on CrayEye as quickly as possible, I prioritized re-enabling the process exactly as I'd performed it before (i.e. prompting for complete file changesets with a large input context) with further optimizations and enhancements to follow. Fairly quickly, I had a scrappy CLI that would let me copy the modified files one by one, and as I started using it to build itself it fast grew the ability to write the full changeset, support image input and followup prompts, and include context from remote URLs. It was almost surreal watching the tool build itself, and while it was trivial to move to using a stable version to build changes in the very beginning it was effectively performing surgery on itself (breaking itself in the process more than once).
One highly underrated aspect of not writing the code to make a project, particularly one that is open source, is the implicit detachment of shame for imperfections or incompleteness. When it's the robots writing the thing, you can let yourself work less inhibitted by the need to be obsessive and write unimpeachably perfect code - even if you're going to show the world that code.
Minolta X700 / CineStill 800T / Cyclist in Manhattan, 2024