Last year, Microsoft confirmed it was working on Copilot Vision, and it rolled out the feature to everyone using the Pro subscription in the United States. Today, Windows Latest noticed that Copilot Vision, which allows you to chat with any web page, is rolling out to free users (without Copilot Pro). However, it works only in the United States.
Windows Latest was able to try this new Copilot feature in a US-location virtual machine with Edge. To get started, we did a Bing search for Copilot Vision in the browser and clicked on a matching search result. Then, we selected “Try it now” and accepted the terms and conditions.
Microsoft followed up with a quick preview of how Vision worked, but we weren’t satisfied with it because it didn’t explain how to use Vision.
To actually use Vision in Edge, you need to open Copilot via the sidebar and click on the voice icon. Once done, you’ll notice a new glasses icon along with the mic button and two other buttons appear automatically on the screen (bottom region).
Whenever the glasses icon is highlighted, it means that Vision is active and can view the webpage.
Hands on with Copilot Vision in Microsoft Edge on Windows 11
I am using Windows 11 and opened Copilot Vision via Edge’s sidebar. Before launching Vision, I opened the official Vision page to understand how it works.
I asked Copilot Vision to describe what the webpage was about, but it didn’t work properly because Copilot stopped responding in the middle of the conversation.
I thought Copilot was having temporary issues, so I repeated my question, and Vision tried to answer me. Still, it spoke for about 15 seconds and stopped abruptly again, only to come up with a different response later.
Vision gets stuck in a loop of incomplete responses, so we made a few attempts to get a proper answer. It’s impossible to have a decent back-and-forth conversation with Copilot, which struggles to understand the webpage and answer in a smooth flow.
At this point, we’re still on Microsoft’s website. I asked Copilot to tell me how many buttons there are on the page. Copilot told me that there was only one prominent button on the page, and that is “Try it”.
While it’s true, Copilot was unable to recognize the second button, which allows us to play the video on Microsoft’s website.
I asked Copilot to click the button or interact with the web page, but the assistant declined my request.
It also turned down my request to play the embedded video on the page or close the webpage. Later, I asked Copilot Vision to stop speaking, but it politely declined that request, too, citing its inability to access anything on the page.
Based on this response, we can conclude that it cannot access a page element at the moment or pause itself.
Copilot Vision needs a lot of work to be useful
After closing that window, I opened www.windowslatest.com and navigated to one of our recent stories. I invoked Copilot Vision again, and this time, it correctly described the article to me.
Then, I scrolled a bit, and asked Copilot to talk about the author of the story, and Copilot again correctly highlighted it.
At this point, I was convinced that Copilot Vision has potential, but it requires a lot of work to be useful.
To test how well it can understand what’s on my screen, I opened Amazon UK, and asked Copilot to tell me about the best SSD based on Amazon’s search results. Copilot started talking about every SSD it could see on the screen, and then I asked it to compare SSDs from Acer and WD.
While Copilot did compare the two hardware, it made the comparison on the basis of what it could see on Amazon’s search results. Copilot Vision is not trying to look up information on the web or other pages. When I asked it to talk about the performance of the SSDs, it couldn’t find the write speed of Acer SSD because it wasn’t mentioned.
Then, I again asked Copilot to look up the answer on Bing, but it refused.
So, you can use it to quickly sift through a page, but you cannot completely rely on it. You’ll also have to apply your understanding; otherwise, it might advocate for a bad product.
I asked Copilot to tell me which sponsored items were on the page, and it could only specify Sandisk. However, Amazon was also promoting Samsung on the same web page.
It appears that it can only scan the visible region of the screen and cannot browse the whole web page, which is why it couldn’t see Samsung.
When we scrolled down, only then was it able to recognize other sponsored items, but at this point, it could no longer remember that it had identified Sandisk previously.
It’s a mess and Copilot Vision offers very little value, which explains why it’s being rolled out to everyone for free.
Vision also tries to please the user too much and becomes visibly subservient after pointing out its mistake.
We think Microsoft could at least add a scroll function to Vision or allow it to scan the complete page, regardless of what we’re currently viewing.
The post Microsoft just added Copilot Vision to Edge for free on Windows 11 (hands on) appeared first on Windows Latest