Scaling in a New Direction
We’re in the age of the hyperscaler. Sutton’s Bitter Lesson is in full swing, and the price of RAM is increasing faster than ever. Everything is more, more, more (except for our budgets).
This all feels a bit dull personally. There’s a reason it’s called the bitter lesson and not the fun finding - throwing more compute at a problem is an exceedingly unsatisfying solution, and one with huge knock on consequences (from energy consumption to basically propping up the US economy.)
This might be why we at Origami Labs typically find ourselves in constrained scenarios where you can’t simply add another data centre and call it a day. If scaling is ultimately more efficient and you’re not a hyperscaler, you’re going to lose. So our solution is to play a different game.
Our bitter lesson is that using a lot of computers costs a lot of money and energy.
Now this isn’t to say we won’t use the compute we have - as recently mentioned, we’re using the Isambard-AI supercomputer cluster to train new models. But we try to position ourselves in areas where we have limited SWAP (size, weight, and power) budgets where you have to be more creative. Bigger and better hardware will eventually arrive and outscale you, but after something is deployed you can’t just duct tape a new GPU to the side of it.
We know that constraints breed innovation; old games are always a great source of technological creativity as the hardware is limited and the realtime requirements are constantly pushing its boundaries. The implementation of the fast inverse square root in Quake 3 is an often cited example, even if it’s commonly misattributed.
One of the more overlooked aspects of the bitter lesson in our opinion is that you can gain a short term advantage by using cleverer approaches, but that this is will likely be overtaken by scale. It turns out that on the battlefield, a short term advantage can be absolutely crucial.
To us, this suggests that we need a faster need - research - deploy - repeat cycle. Bonus points if we can reuse the same hardware using a new algorithm discovered later by scale and search. We don’t need everything to be an exquisite system that can’t we can’t afford to replace, but instead start thinking in terms of advantage cycles. If I deploy this now, how long do I have an advantage for?
The cycle length needs to be proportional to the advantage window, which it currently is certainly not. Lengthening the time from need to deployment both reduces the advantage window, and perversely makes you want to use the current solution longer because it was so hard to get in the first place. This is lose-lose.
Nothing lasts forever, and the bitter lesson shows that wedding ourselves to approaches and concepts can inhibit further progress. Instead, be creative, and be quick about it. In a way, that’s creating new solutions through search and learning, but instead of scaling compute, we’re scaling research/procurement cycles. No amount of agentic AI will help us if we can’t procure and deploy new solutions at pace.