Rabbit R1: the new 'everything tool' to know about
The Consumer Electronics Show or CES was once a hotbed of interesting tech. Its relevance has been increasingly debatable as the years have gone on. The big TV companies show off their new TVs every year, there's loads of travel dock chargers & batteries and new versions of the same kind of stuff every year.
Occasionally, an interesting developer swoops in and snatches the show away from everyone else, by showing off something actually interesting. This year at CES 2024, my heart was stolen by Jesse Lyu, the CEO of Rabbit, during the introductory demonstration of the Rabbit R1 "pocket companion."
The Rabbit R1 is described as being like a walkie-talkie to your own personal AI assistant, that can do things for you, rather than just answering questions. It's not the first time something like this has been tried, but there are several reasons why I personally believe this one is special. I'll explain why. First, if you're interested, have a look at the official announcement & reveal video. The demo starts about 9 minutes in if you want to skip ahead.
So here's why I think this is a big deal.
The problem with current AI assistants
...is that they try to do everything on the backend. Basically in programming you have the front end, which is what the user sees and interacts with, known as the GUI or "graphical user interface." And then you have the "backend," which is kind of like backstage. When you ask Siri to do something, she uses an API to pull a string backstage to make it happen. But the thing about that approach is, apple has to give Siri each of those individual strings to pull by providing APIs for every possible action. This means Siri doesn't know how to do things the same way you do. She has her own secret back-channels that she goes through. So does the google assistant on android.
Building in these backdoor channels known as APIs for Siri to interface with your app is extra work, which means that for a lot of apps, it never happens, because developers aren't incentivized to implement the new features. This is one of the reasons that Siri still sucks. On top of being spaghetti code, and being an acquisition (not having been developed by apple in the first place), Siri has always been difficult to maintain, much less build upon. They essentially passed the work onto the user and community with the Shortcuts app.
How the r1 is different
Essentially, they taught an AI to use a computer like a person. They call this foundation the "Large Action Model" or LAM.
The Large Action Model is the cornerstone of rabbit OS. LAM is a new type of foundation model that understands human intentions on computers. With LAM, rabbit OS understands what you say and gets things done.
Here's how they describe it working.
First, rabbit OS will understand what you mean by what you said. Human intentions are deeply personal, have layers, may be incomplete, and could change on a whim. rabbit OS uses its long-term memory of you to translate your requests into actionable steps and responses that LAM could leverage in real-time. LAM then comprehends how applications and services you use daily instead of relying on application programming interfaces (APIs). LAM can learn to see and act [on information & apps] like humans do. LAM has seen most interfaces of consumer applications on the internet and is more capable as we feed it with more data of actions taken on them. LAM completes these tasks for you on virtual environments in our cloud, from basic tasks such as booking a flight or reservation to complex ones like editing images on Photoshop or streaming music and movies. There is no need for a complex local setup, such as installing an app, a Chrome plugin, or typing code into a command line. Simply talk to rabbit OS, and it will carry out the tasks for you.
So rabbit OS is different because instead of relying on developers to build their apps with APIs to interface with a voice assistant, they're building the voice assistant to interface with the apps themselves.
so to summarize, the difference between Rabbit and Siri, the things that set Rabbit apart (for now), and the reasons I think it's worth paying attention to:
- It's designed to learn what you want to do when you just naturally ask for it, instead of making you learn specific commands
- It claims to be able to get to know you better over time
- It works directly with apps and websites to do what you ask based on what you mean, without needing special instructions from the app makers
The demo implies that the model is capable of deep contextual understanding and detailed complex tasks, like planning and purchasing an entire vacation in one prompt (travel, lodging & entertainment reservations all created with one voice command, verified individually by the user before purchasing.) This specific example use case is similar to the original intent behind Siri, before apple acquired it. Siri was originally imagined as an automated travel concierge, who could do things like proactively react to changes, like automatically booking you a nearby hotel if your flight was delayed. This is the kind of multi-step intention-solving that rabbit is targeting.
"Teach Mode'
By far the biggest deal about Rabbit OS is what they call "teach mode," which lets you teach a rabbit to do "anything." They showed Jesse recording what amounts to a video tutorial for the AI to watch. "Today I'm going to show you how to generate an image using MidJourney," he says, and proceeds to explain that it's done using Discord. Click here, then click there, find this option and select it, then enter this text, this is the formatting you use, then hit enter. Like that. And then he shows it following his instructions. It takes awhile, but so does MidJourney.
The Hardware
The r1 is a very cute soft bright orange gadget designed by Teenage Engineering, whose work I have enjoyed from afar from a long time. They make very fun playful gadgets and this is that.
It has a button where your thumb rests when you hold it, to use as a walkie talkie button to the AI. it also has a camera that rotates into the body when not being used. For privacy, And because it's cool. It has microphones and a speaker to hear and answer queries, a headphone jack, and it has Wifi, Bluetooth, and a SIM card slot, meaning you can use it with your mobile data plan. Although no word yet on whether it can make or take cell phone calls or SMS messages. We think probably not.
It also has a little wheely thingy for scrolling through content on the 1.8 inch screen, and a cute robot rabbit voice to answer your questions with, complete with a little animation of a rabbit character that's very cute. I love the theming on this thing as you can tell. Also the thing is orange because carrots. I wonder if they considered calling it the "Carrot" at one point but decided that Rabbit was friendlier.
wrap-up
I'm excited for it, I preordered one but I have to wait until, like, July before mine gets here. But until then I will have my ear to the ground about what kinds of things people are able to teach theirs to do. Jesse hinted in the Discord that he taught his to play Diablo 4.