The Rabbit R1 Reveal Keynote

M Tony Rodriguez

23 Apr 2024 • 13 min read

a transcript of the reveal of the Rabbit R1 device from January 2024

Hi everyone. My name is Jesse and I'm the founder and CEO of Rabbit.

I'm so excited to be here today to present to you two things we've been working on. A revolutionary new foundation model and a groundbreaking consumer mobile device powered by it.

Our mission is to create the simplest computer, something so intuitive that you don't need to learn how to use it.

The best way to achieve this is to break away from app-based operating system currently used by smartphones. Instead, we envision a natural language centered approach.

The computer we're building, which we call a companion, should be able to talk to, understand, and more importantly, get things done for you. The future of human machine interfaces should be more intuitive.

Mission: Create the Simplest Computer | Companion

Now, before we get started, let's take a look at the existing mobile devices that we use daily. The one device that's in your pocket, the smartphone. Like iPhone and Android phones. These guys been here for years and we've grown tired of them. The problem with these devices [is] the app-based operating system. Want to get a ride to the office? There's the app for that. Want to buy groceries? There's another app for that. Each time you want to do something, you fumble through multiple pages and folders to find the app you want to use. And there are always endless buttons that you need to click. Add to cart, go to the next page, check the boxes, and jumping back and forth and so on. The smartphone was supposed to be intuitive, but with hundreds of apps on your phone today that don't work together, it no longer is.

If you look at the top ranking apps on app stores today, you'll find that most of them focus on entertainment. Our smartphone has become the best device to kill time instead of saving time. It's just harder for them to do things.

Many people before us have tried to build simpler and more intuitive computers with AI a decade ago. Companies like Apple, Microsoft, and Amazon made Siri, Cortana, and Alexa with these smart speakers. Often, they either don't know what you're talking about or fail to accomplish the tasks we asked for. Recent achievements in large language models, however, or LLMs, a type of AI technology, have made it much easier for machines to understand you. The popularity of LLM chatbots over the past years has shown that the natural language-based experience is the path forward. However, where these assistants struggle is still getting things done. For example, if you go to chatGPT and use their Expedia plugin to book a ticket, it can suggest options, but ultimately cannot assist you in completing the booking process from start to finish. Things like chatGPT are extremely good at understanding your intentions. They could be better at triggering actions.

Another hot topic is a field of research around what they call agents. It has caught the eye of many open-source projects and productivity software companies. What remains to be solved is for these agents to perform tasks end-to-end accurately and speedily.

The problem is forcing a model to perform a task it is not designed for, whether for a language model to reason about web page using super prompts or screenshots. We have yet to produce an agent as good as users simply clicking the buttons. To fulfill our vision of a delightful intuitive companion, we must go beyond a piece of complex software. We want it to be in the hands of everyone.

So we first set out to fundamentally understand how computer apps are structured, and more importantly, how humans interact with them. We wanted to find a way for our AI to trigger auctions on behalf of users across all environments. We want it to be universal, not just a Chrome plugin or limited set of apps, but everything, iOS, Android, and desktop. These applications share something in common, the interface. They all have a user interface.

So at a philosophical level, if we can make an AI trigger actions on any kind of interface, just like a human would, it will solve the problem. This insight led us to create the large action model. Or LAM, as we call it. It is the new foundational model that understands and executes human intentions on computers, driven by our research in neuro-symbolic systems.

LAM

With the large action model, we fundamentally find a solution to the challenges that apps, APIs, or agents face. We solve it with interfaces.

LAM can learn any interfaces from any software, regardless of which platform they're running on. In short, the large language model understands what you say, but the large action model gets things done. We use LAM to bring AI from words to action.

Finally, we can build a computer that in addition to understanding what you're trying to say, can actually help you do things on your behalf. We've packed the large action model into an advanced Rabbit OS operating system. It is built for real-time interactions between you and the rabbit, powered by LAM.

Large action models concept and test results are so powerful that we decided to make a one-of-a-kind mobile device.

Introducing R1, your pocket companion.

This is the Rabbit R1. Designed in collaboration with teenage engineering, the R1 is a fully standalone device, primarily driven by natural language. The accumulation of countless hours of work from our research, design, and software teams. It is simple to use and equipped with everything it needs to interact with you and perceive your surroundings.

It comes with

a touchscreen,
a push-to-talk button,
and an analog scroll wheel,
a microphone and speakers,
and a computer vision-enabled 360-degree rotational camera we call the Rabbit Eye.
It is Bluetooth and Wi-Fi capable,
and it has a SIM card slot that supports global 4G LTE network.

And now, let's see how it works.

Demo

I can ask Rabbit for anything, just like if I'm using chatGPT. The difference though is the speed. With Rabbit OS, I get a response 10 times faster than most of the voice AI projects. Rabbit answers my questions within 500 milliseconds. With a push-to-talk button, you don't need to say anything to wake it up. Just press and hold the button and talk like a walkie-talkie. Now, let's give it a shot.

What's the nature of the reality?

The nature of reality is a topic that has captivated the minds of philosophers and thinkers throughout history. According to Bertrand Russell, reality is composed of hard data derived from sensory experiences and logical analysis. It is the amalgamation of our perceptions, thoughts, and the external world. -

I can speak or type with R1. If you shake R1, a keyboard will pop up.

You will also know that the response speed is much faster than using some of the LLM assistant apps with web browsing, such as chatGPT.

What's the stock price of Coca-Cola?

Searching for the stock price of Coca-Cola. The stock price of Coca-Cola, KO, is $59.76. -

Who played the role Oppenheimer in Christopher Nolan's latest movies?

In Christopher Nolan's most recent film, "Oppenheimer," the role of J. Robert Oppenheimer is played by Cillian Murphy. The film is a biographical thriller that explores the story of American scientist J. Robert Oppenheimer and his involvement in the development of the atomic bomb.

Now, let's see how the large action model works. With LAM, Rabbit R1 can interact with all kinds of applications.

Introducing the Rabbit Hole Web portal.

I can log into different services and unlock functionalities for R1. It's just like iCloud. Let's say I want to listen to some music. With the Rabbit Hole Web portal, I have complete control over which services I want to activate on my R1, and which provider I prefer. For music, I use Spotify. So let's go ahead and connect my Spotify to the Rabbit Hole.

Before I start the login process, I want to emphasize our commitment to privacy. The way we design the infrastructure for apps that we interact with, we don't create fake users or spam users. We don't hack their infrastructures. And of course, we do not store any of your third party credentials. Instead, we redirect you to their login page. We never save your username and password, and we don't track your data. The authentication happens on the respective apps.

Now I'm logging with Spotify. Rabbit OS recognized that Spotify has been authenticated on my device. Before this presentation, I preauthenticated some other apps like Uber and DoorDash. Just choose your preferred services and log in through them to get authenticated on your device. It's that easy. Now let's play some music.

Play Kraftwerk’s Pocket Calculator.

Playing Pocket Calculator by Craftwork for You.

Play another song from the album.

Sure, playing Computer Love for You.

That's all the tracks from the same album. Who wrote the lyrics for this song?

Ralph Hutter and Emil Schult wrote the lyrics for the song Computer Love by Craftwork.

Which band sampled this song?

It seems that Coldplay's Talk is the only known example of a song that sampled Computer Love.

What do you think about this song?

Computer Love by Craftwork is widely regarded as a groundbreaking and influential song in the realm of electronic music. Its fusion of catchy melodies, robotic vocals, and innovative use of synthesizers has left a lasting impact on the genre.

I can also use R1 to call a ride.

Get me a ride from my office to home now.

Of course, I will book an Uber ride for you from your office to your home. Please confirm the ride.

I have six people with three packages, five in Uber that can fit all of us.

For six people and three pieces of luggage, I recommend booking an Uber XL as it provides ample space for all passengers and luggage. Please confirm the ride.

The ride shows up, I just hit confirm. Uber's on my way.

I can also use R1 to order food.

Get me a 12-inch pizza from Pizza Hut delivered to here. The most ordered option on the app is fine.

Ordering a 12-inch pizza from Pizza Hut. Since you mentioned that the most ordered option is fine, I will select that for you. I just created an order for a 12-inch pizza. It's going to be hand-tossed with a classic marinara sauce and topped with regular cheese. Please confirm your order.

That sounds really good. I just confirm the order here. Now I'm waiting for my pizza.

More complex tasks

I can also make R1 do a set of more complex actions that may take multiple apps to finish on a smartphone. For example, if I want to plan a trip to London, I can just describe what I wish in full to R1.

I want to take my family to London. It's going to be two of us and a child aged 12. We're thinking of January 30th to February 5th. Can you plan the entire trip for me? We like cheap non-stop flights, grouped seats, a cool SUV, and a nice hotel that has Wi-Fi.

Exploring ticketing options to make your trip a reality…
For your trip, I found various flight options, a range of hotels to choose from, and car rentals available. Please confirm each option individually for further details and booking.

So it's all been planned out. I just confirm, confirm, confirm, and that's it.

Could you come up with a schedule for fun things to do while I'm over there?

On it…
I have prepared a detailed travel plan for your trip. In summary, you will be exploring London's iconic landmarks, visiting museums, enjoying delicious meals at local restaurants, and experiencing the vibrant atmosphere of different neighborhoods. Navigation details are also prepared, and I can help you book tickets for any events you'd like to attend.

R1 just planned the entire trip for me. That's awesome. But,

it seems like this is a little bit too intense. Can you plan an easy schedule for us?

Sure, I'm working on it.
Please take a look at it and let me know what you think.

It gave me a more relaxed schedule every day with all the details, sounds really good. And I just confirm that.

Communication

R1 can help me reach others just like a phone, but smarter. It comes with a built-in real-time translator and a note-taker, removing the final barriers to communication. R1 automatically detects the language spoken in the environment and provides the bi-directional translation.

Vision

R1 has an eye, an onboard camera designed for advanced computer vision. It can analyze surroundings and take actions in real time. To activate the eye, just double-tap the button.

This is what I got in the fridge. Can you make me a nice dish that's low in calories?

Let me see…
How about a green garden omelet? It's a delicious and low-calorie dish that combines the freshness of broccoli and cabbage with the creaminess of eggs. Here's a simple recipe for you.

Nice. It recognized all the stuff and gave me the actual recipes.

Rabbit eye can also help you with your documents.

Here's a table I've been working on for a job questionnaire. Can you create an additional column that matches candidates who mentioned Rabbit in their questions about how they found us?

Sure. Let me take a look at the table and add the matching column for you…
I’ve processed the table and sent you an email with the results.

Okay. Now let's check our email. I can continue to interact with Rabbit OS even beyond R1. Let's reply this email directly.

Can you add another column that matches candidates who have included rabbit in their question and are LA-based? I just replied the email, hit send, and I got a refined version from Rabbit OS through my email.

Teach Mode

Let's say I have a unique routine or task I cannot do on my phone. R1 can do that too. We are experimenting with what we call teach mode.

Just like how I can teach my friend how to skateboard, I can show R1 how to do it and it will learn from me. This means that any user, regardless of technical background, can teach R1 to learn new skills. So you go to teach mode, start a new session…

Teach Mode Demo

"Today I will show you how to generate an image of puppy using mid journey from prompt using Discord. First I will go to the servers page and click one of my own servers. Since this is only a general image generation, I'll go to mid journey text channel. Then I will use the image command along with the prompt. Here I'm putting a cute baby wild dog with big eyes animated cartoon on real 8K. Let's wait for a minute for the engine to start generating the images. Once it's done, let's click on the image to get the link."

I will then explain to Rabbit OS how to use this Rabbit and annotate it so that I can generate anything, not just puppies. So let's go back to our web portal, submit the request. It takes seconds for the web portal to finish processing. And that's it. It's that simple. Now, once we finish the training, I can go back to my R1.

Now let's use mid journey, as I told you, to generate a picture of a bunny in pixel art style.

Certainly, Jessie. I will use mid journey to generate a picture of a bunny in pixel art style for you. Please give me a moment to create the image.

Now, here you go. You got an image generated on mid journey through teach mode. Watch, learn, and repeat. That's teach mode. It's that simple.

That's all the demos for today. With Lamb Fast evolving, my R1 will eventually help me to do things that can never be achieved on an app-based phone.

"Why isn't it just an app?"

Speaking of the current app-based phones, the first question we ask about ourselves is, why would I need a new device if I already have a $1,000 iPhone? My iPhone can't do any of this at all. We do not build Rabbit R1 to replace your phone. It's just a different generation of devices.

The app-based system was introduced more than 15 years ago, and a new generation of native AI-powered devices are just getting started.

Here's a quick recap.

R1 is our companion that hosts the large action model with natural language. I can use it for a wide range of tasks. Ask anything, direct actions, complex actions, AI-enhanced video calls, note-taker, translator, with a rabbit eye, computer vision, and experimental teach mode. On the hardware perspective, we got a 360 rotational camera, a global 4G LTE SIM card, a push-to-talk button, and an analog scroll wheel.

One last thing...

What about the price?

Now, before we reveal our price, I want to do a quick comparison. Here are some of the best phones on the market right now. You got iPhone, you got latest version of Android phones. We're looking at somewhere around $700 to $1,000 for a top phone with an app-based system. I bought my new iPhone 15 Pro Max last year, and it's the same experience as my previous ones.

Here are not so smart smart speakers. They're asking roughly around $200, but they're all outdated. And finally, here are a couple of the new things with only large language models. You got AI Pin asking for $699 plus monthly subscriptions for their base models. You got Tab asking for $600, and you got Metal Ray-Ban glasses asking for roughly $300. Remember, these are the things with only large language model. We still think these were too expensive.