San Francisco, California5 min read

Winning the Hackathon 🏆

When Claude met Robot:

Our 4-Hour Physical AI Adventure

Ever wondered what happens when you combine a language model with a robot arm? No, this isn't the start of a tech joke – it's actually what our team decided to tackle at a recent hackathon and won!

Robotic arm with computer monitors displaying circuit patterns

The Hackathon

There were hundreds of engineers and over 100 creative projects submissions that were built with Claude and computer use, organized by Menlo Ventures X Anthropic Builder Day at the Anthropic Headquarters in San Francisco, California.

The AI Drive team at a tech conference

The Setup: Claude Goes Physical

We're usually busy making AI Drive and AI PDF smarter at handling your documents, but we thought: "What if we gave Claude 3.5 Sonnet (our favorite language model) some actual hands to work with?" Armed with a robot arm, a webcam, and possibly too much caffeine, we set out to the Anthropic Headquarters in San Francisco, California to see what would happen.

Using nothing but well-crafted prompts and function calls, we taught Claude to control a robot arm to grab and deliver pens to humans. Think of it as a very sophisticated, slightly over-engineered personal assistant. The webcam served as Claude's "eyes," allowing it to see what it was doing through screenshots.

Unexpected Plot Twists

Here's where things got interesting. While we were testing pen-grabbing capabilities, Claude decided to go above and beyond:

  • It spotted a beverage bottle and, without being asked, provided a detailed description of it. Classic Claude, always eager to show off its observation skills.
  • In a moment of delightful self-awareness, it actually noticed when it dropped a pen outside a cup and initiated a retry. We didn't program this – Claude just really hates missing its target!

Why This Matters (Beyond Having a Robot Butler)

While we're not planning to turn AI Drive into a robot (yet... 😉), this experiment showcases something really exciting: the power of generalist AI models to understand context and tackle complex tasks through natural language instructions. It's the same principle that makes the AI Drive Agent so effective at managing your documents and workflows – just with fewer mechanical arms involved.

The 4-Hour Sprint

Two people working with a robotic arm

Perhaps the most impressive part? We pulled this off in just 4 hours. It's amazing how far AI capabilities have come when a general-purpose language model can learn to control physical objects with just some basic prompts and visual feedback.

Before recent advances with LLMs, it would take lots of specialized data and precise calibration to get robots to do tasks like these. On top of that, those robots would be very sensitive to any change in its environment (misplaced parts, humans in the area, etc) so everything had to be very precise and choreographed.

That said, LLMs are still somewhat far from playing a role in the industry, but if anything this hackathon showed us that there is a lot of promise in this new area of applications.

The Challenges (Or: How Not to Make an AI Control a Robot)

Like any good hackathon story, ours comes with its share of face-palm moments and valuable lessons learned:

Overcomplicated Much?

Our first mistake? We tried to make Claude think like a robot instead of letting it think like, well, Claude. We initially created this complex system prompt with precise servo motor controls - imagine trying to control a puppet with six strings simultaneously! Turns out, when you're working with an AI that understands human concepts, it's better to let it work with natural, intuitive instructions, such as incremental up/down, left/right movements, rather than making it micromanage six different motors.

Camera Shenanigans

Close-up of the blue robotic arm with a coffee cup and webcam in the background

Picture this: we mounted our webcam at what we thought was a clever angle, only to realize we were giving Claude the equivalent of trying to parallel park while looking through a kaleidoscope. Switching to an arm's-eye view made a world of difference. Sometimes the simplest solution is the best!

We also think adding another camera angle in the future (in addition to arm's eye) will help with depth-of-field. This would give Claude a more comprehensive view of its environment, potentially leading to even more precise and adaptable movements.

The "Perfect" Hackathon Setting

Let's talk about our workspace aesthetics:

  • Background: About as cluttered as your desktop after a week of "I'll organize it later"
  • Lighting: Think "moody coffee shop" when we needed "professional photography studio"
  • General chaos: Well, it was a hackathon after all!

These environmental factors definitely gave our AI some extra challenges to work through. But hey, if Claude can handle our messy hackathon setup, imagine what it could do in a properly controlled environment!

What's Next?

The AI Drive team celebrating at the hackathon

While we promise AI Drive won't grow mechanical arms anytime soon, this experiment reinforces our belief in the power of generalist AI models. Whether it's managing your documents or controlling robot arms, these models are increasingly capable of understanding context and executing complex tasks with minimal specialized programming.

Who knows? Maybe one day AI Drive will fetch your coffee while organizing your files. Until then, we'll stick to revolutionizing document management – no robot arms required (for now).

P.S. If you're wondering if we taught the robot arm to high-five... we ran out of time. There's always the next hackathon!