Microsoft’s AI Agents idea for Windows 11 is insane, it could change how you use PCs

Peer Networks UK Windows Latest Microsoft’s AI Agents idea for Windows 11 is insane, it could change how you use PCs

What if your Windows 11 PC could understand and perform tasks like a human? Microsoft has an insane idea that could shape the future. WindowsLatest.com had the opportunity to take a closer look at the idea and discuss Windows Agent Arena in detail with one of the researchers at Microsoft AI.

You might have heard the term ‘AI Agent’ in the news due to Claude’s AI Agent announcement, but Microsoft has been working on the “AI Agent” concept for the past several months. It even has a research paper, and the project, “Windows Agent Arena,” was made open-source in September.

If you’re like me and closely follow Microsoft, you probably know that Microsoft is leading the AI race. But that’s not all. AI researchers within Microsoft’s AI division are coming up with their own projects to help independent developers and researchers experiment with large or small language models.

Microsoft AI has been working on the completely open-source “Windows Agent Arena,” which allows researchers and developers to build and test their AI agents. It’s a full-fledged open-source framework that has everything you need to build and benchmark your AI agents for Windows 11, but what exactly is an AI agent on a PC?

First, let’s take a closer look at some of the AI agents you might find useful.

Every morning, instead of opening your email, calendar, and favourite news website one by one, you can simply say, “Start my morning setup,” and the AI agent will open all those apps for you.

Another example of Windows 11 AI Agent could be something that listens to you and changes your PC’s settings. If you’re concerned about your online privacy and wish to enable the “Do Not Track” feature in Microsoft Edge, an AI agent can do it for you.

Here’s how it will work:

  • AI Agent will understand your request. In this case, you’re hoping an AI Agent will open Edge and change privacy settings so nobody can track you.
  • After getting the request, it will open Microsoft Edge.
  • It will access the main menu by clicking on the three horizontal dots. Yes, an AI Agent will perform all of this with zero human interaction.
  • Now, from the dropdown menu, the agent would select “Settings”.
  • Within the Settings page, it will head to the “Privacy, search, and services” section, and then begin scrolling through this page to find a toggle to turn on or off ‘Do Not Track’.

It will automatically turn on the “Do not Track” toggle, right in front of your eyes.

Windows Agent AI chain method
How an AI Agent works

Here are some more examples shared by Microsoft on its Applied Sciences blog post:

Example 1: AI Agent turns on Do not Track in Microsoft Edge for you

Example 2: AI Agent installs the pylance extension in VSCode.

Example 3: AI Agent can change your search engine

Example 4: AI Agent can modify VLC settings to modify the folder used to store recordings

Example 5: AI Agent can open Paint and do the drawing for you

Example 6: AI Agent can change Edge profile name

Insane, right?

Windows Agent Arena is where things start to get super interesting, and these are only a few examples of the idea. The possibilities are limitless, especially on an OS like Windows 11.

The idea behind Windows Agent Arena is to support an open-source framework, so devs or researchers can build their own AI Agents for Windows 11 and benchmark performance.

What exactly is Windows Agent Arena?

“AI assistants like Copilot and ChatGPT have become really helpful for millions of people. These assistants use advanced language models to help with all kinds of tasks, like fixing code or coming up with dinner ideas. As these models get smarter, we’re thinking about what the future holds for AI assistants,” Francesco Bonacci, one of the Microsoft AI researchers behind the project, told me in a statement.

“We introduce Windows Agent Arena, a framework to test and develop AI agents that can perform tasks on a Windows computer. Think of these AI agents as smart assistants that can see what’s on your screen, understand it, and then interact with your computer by clicking, typing, or opening apps to help you complete tasks—just like you would do manually.”

For those unaware, Microsoft AI is a new division within Microsoft that works on Copilot, Edge, and other AI stuff. Remember that excellent small language model Phi-3? It was also developed by Microsoft AI. The division is headed by former Google DeepMind executive Mustafa Suleyman, who now serves as CEO at Microsoft AI.

Researchers at Microsoft AI are building Windows Agent Arena (WAA) to help developers and researchers build, test, and benchmark AI agents specifically designed for Windows 11.

The core idea is to bring more people on board and encourage them to build AI Agents for Windows 11 to automate tasks on your PC. It’s completely open-source and flexible, so developers can either use local OS or Microsoft’s Azure Machine Learning (Azure ML) cloud infrastructure to test and run multiple agents at the same time.

Windows Agent Arena
How Azure works to help devs build multiple AI Agents if they don’t prefer local environment

Since it also works in Azure, it has access to the realistic Windows 11 environment, which means the dev can explore how an AI agent would operate in an actual Windows 11 installation. We’re not talking about a limited simulation or some special version of Windows 11.

This might be technical for typical users, but let’s try to dumb down how AI Agents are developed:

  • Devs get access to Windows Agent Arena, which is a platform to code, test and benchmark AI agents for Windows 11.
  • Microsoft has designed for default “AI Agent”, which are templates given as a starting point to developers.
  • Devs can use the template offered by Microsoft to begin building unique AI Agents to address problems people face on Windows 11.
  • For example, if you have many photos in your Desktop, Documents, or Pictures folders, and you’d like to rename, compress and change their file extension automatically, you can use an AI agent to automate the tasks. This is one of the examples of how an AI Agent could solve a real life problem on Windows 11, and it runs locally.
  • In addition to building AI Agents, devs can benchmark their AI Agents for security and performance. Since AI Agents run locally in Windows 11, there are some concerns around performance, but Microsoft has also covered that with its own benchmarking tools included in WAA.
  • Devs can get started with Docker with WSL 2, an OpenAI or AzureOpen API key, Python 3.9, cloning the WAA repository, installing the dependencies, and finally, using Windows Enterprise Evaluation ISO.
  • Devs can test their AI Agents locally or use Azure’s cloud infrastructure.

As Microsoft’s Francesco Bonacci told us, researchers can use this framework to improve their AI models, making them better at understanding and interacting with a typical desktop environment.

This platform is open-source, meaning anyone can use it, and it allows for testing AI agents on multiple computers at once through Azure, making the testing process faster and more scalable. The ultimate goal is to create AI agents that can significantly improve productivity by automating tasks that we usually do manually on our computers.

How powerful is Windows Agent Arena?

In a research paper, titled “Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale,” a group of researchers at Microsoft, including Rogerio BonattiDan ZhaoFrancesco BonacciDillon DupontSara AbdaliYinheng LiYadong LuJustin WagleKazuhito KoishidaArthur BuckerLawrence Jang, and Zack Hui, revealed that WAA’s initial model can handle as many as 150 different tasks on Windows 11.

Windows 11 AI Agents 150 tasks

What could be all those 150 tasks? It depends, but the tasks will cover most of the stuff that you do on your PC.

“For example, you might tell the AI to install a browser extension, change settings, or even draw something in a simple paint program. The AI uses large language and vision models to understand both the text and images on your screen, helping it to decide what actions to take. Windows Agent Arena provides a way to evaluate how well these AI agents perform in a variety of tasks, from using web browsers to editing documents, all within a real Windows operating system,” Microsoft’s Francesco Bonacci shared some examples of tasks AI agents can perform.

One of the 150 tasks could be related to Microsoft Edge or Chrome, where you ask an AI Agent to change some settings, such as turning on privacy mode, clearing cookies or switching the default search engine.

You can also work with an AI Agent on LibreOffice Writer or Calc to edit documents and spreadsheets. If you’re a developer, an AI Agent could help you install extensions or edit code while you sit on your desk and watch it do it for you.

Those are some of the examples I can think of, but the opportunities are endless. I mean, we’re talking about Windows 11 here. There could be an AI Agent for interacting with all the apps you can think of, such as Notepad, Paint, or even Clock. Here are some more examples:

  • Save the Paint image as “circle.png” in Downloads folder
  • Change my desktop background to a solid color
  • Turn off notifications for my system
  • Enable night light and set to 7 pm to sunrise
  • Export the current document into PDF
  • Make the first two paragraphs double line spaced
  • Please separate each sentence by creating one empty line space after each.
  • Help me center align the heading in LibreOffice
  • Help me change the 2 in my text to subscript
  • Make the first letter of each word uppercase
  • Make Times New Roman the default font
  • Help me rename sheet1 “LARSScienceAssessment”
  • Sort the list of employees according to their birthday
  • Fill the Sequence Numbers as “No. #” in the “Seq No.” column
  • Enable the ‘Do Not Track’ feature in Edge to enhance my online privacy
  • Set the default font size to the largest
  • Save this webpage I’m looking at now

But how powerful is the Windows Agent Arena platform for developers? As mentioned, devs can use local hardware or the cloud to scale using Azure Machine Learning (Azure ML). This means that instead of testing AI agents one at a time on a PC, developers can run multiple agents at the same time in the cloud.

Agent Arena and Azure

In the research paper, Microsoft also discussed its own AI Agent, Navi, which has a success rate of 19.5%. For example, if Navi is given a task, it has a success rate of 19.5%, which is less than that of a human (74.5%), but it’s a significant milestone for an AI Agent.

Microsoft noted that Navi uses “chain-of-thought prompting,” where it tries to think through tasks and how it can perform on Windows 11.

It knows what it has to do, what it is doing, and what it needs to do next by looking at your screen, processing what’s on the screen, like where the cursor is, then deciding what to do next and finally finishing the task.

As part of its efforts to help everyone build their own AI Agents, Microsoft went a step ahead and open-sourced “Omniparser“, a powerful screen-understanding model.

What’s next for AI Agents on Windows 11?

WAA is more than a concept, and I wouldn’t be surprised if Microsoft tries to bring their own versions of AI Agents to Windows 11.

For now, it’s still an open-source project under development with a low success rate and we don’t know when Windows 11 will get its own AI Agent, but it’s definitely coming at some point in future.

AI Agents could soon be able to learn your daily habits, suggest better ways to do things, or even automate tasks without you asking.

AI agents have limitations like understanding what’s on your screen and where to move the mouse cursor, especially when it’s asked to perform a task like drawing in Paint.

The post Microsoft’s AI Agents idea for Windows 11 is insane, it could change how you use PCs appeared first on Windows Latest