What about Claude AI Sonnet 3.5's computer use capabilities?

Hugo Pinto
Oct 24, 2024
2 min read

Just days after Microsoft's announcement of their new army of agents, here's another foundation rocking release from Anthropic, Claude Sonnet 3.5, with the ability to use your computer for you.

The new model's ability to directly interact with computer interfaces (cursor movement, clicks, typing) represents a significant leap beyond existing models' capabilities. While GPT-4 can interpret screenshots and Gemini can analyse visual inputs, direct computer control is unique.

Let's look at what is can do for coding...

And for automation (this one has cataclysmic consequences for the RPA market...)

While everyone's obsessing with deciding whether is graduate or intern level of intelligence, I think it's far more valuable to start running processes with it.

Through the development of a new venture, I found an immediate use, where using this feature to complete a process in a 3rd party platform, can bypass the need of integration, and making a full end to end service being delivered.

So will reality match the expectations set by this new capability?

What will OpenAI and the remaining competitors bring to market?

What are the big questions?

What is going to be the cost of running this at scale, and are LLM vendors finally going to give the right set of tools without the need of extensive and expensive technical teams?

They have announced a couple of interesting additions to their pricing, announcing savings of up to 90% cost savings with prompt caching and 50% cost savings with the Message Batches API, but it needs testing to be able to model running any operation at scale to really know.

The next obvious question is are business leaders ready to embrace this new Tech strategy? How does it fit with current operating models and governance structures, which can get very expensive and complex to manage?

What are the big opportunities?

Self-learning via observation. Setting the obvious privacy and IP questions aside, this is the opportunity to really deepen the knowledge of how people actually do things - not what is written in the manual, not the way it's supposed to be done, but how they actually do it.

I can see another component of technology and product developed being super-charged: Product! Exciting to say the least...

What's left for us humans to do?

Assess current capabilities and frame applications of the technology
- Audit existing processes IRL (in real life)
- Identify advisory-appropriate touch points
- Map human-AI collaboration opportunities
Design for adoption, value and govern
- Focus on knowledge-intensive tasks
- Establish clear handoff protocols
- Build feedback loops
- Review and moderate governance both at the function/ business/ process and transformation level
- Define processes to manage and mitigate Risk
(Re)train, measure & improve continuously
- Upskill / Retrain
- Track efficiency gains
- Monitor quality improvements
- Assess team adoption rates
- Learn and iterate

This will definitely add to the flurry of new businesses emerging, and they all have some questions of their own:

How fast can we generate revenue?
How fast can we raise funding?
How many people do we actually need to run our business?

Wanna find out?

Try it out - or share your questions and thoughts...

FYI

Pets at Home using agentic approach with Microsoft's Copilot agents

What about Claude AI Sonnet 3.5's computer use capabilities?

What are the big questions?

What are the big opportunities?

What's left for us humans to do?

FYI

Recent Posts

Subscribe Form