Skip to main content

Microsoft Open Sources Browser Agent Magentic-UI

·906 words·5 mins
Microsoft Magentic-UI
Table of Contents

Microsoft has open-sourced Magentic-UI, an agent specifically designed for browser-based web tasks, on its official website.

Magentic-UI is built upon Magentic-One, a previously open-sourced project from Microsoft, and supports human-computer collaborative control to enhance the efficiency and accuracy of agent execution.

According to GAIA test data, when equipped with simulated users providing auxiliary information, Magentic-UI’s task completion rate increased from 30.3% in autonomous mode to 51.9%, improving accuracy by 71%. Furthermore, Magentic-UI only sought help from simulated users in 10% of tasks, with an average of just 1.1 instances of help per task.

Microsoft Magentic-UI

The open-source address is: https://github.com/microsoft/magentic-ui

Magentic-UI is Human-Centric
#

One of Magentic-UI’s greatest highlights is its human-centric approach. Unlike traditional agents, it deeply integrates humans into all stages of task execution, rather than simply pursuing complete automation.

Traditional agents often aim for autonomous task completion, emphasizing machine independence and automation. Users may not fully understand the agent’s specific operations and decision-making processes, and it can be difficult to intervene and correct issues in a timely manner if they arise.

In contrast, Magentic-UI adopts a human-computer collaboration model, fully considering the role and value of humans in task execution. It completes tasks through close collaboration with users, allowing them to control the agent’s behavior in real-time and make adjustments and provide guidance as needed.

Microsoft Magentic-UI

During the planning phase, Magentic-UI engages in collaborative planning with the user. It doesn’t directly formulate a task plan based on preset programs or algorithms. Instead, it communicates with the user to understand their needs and expectations, then generates a preliminary step-by-step plan. This plan allows users to directly modify it through a plan editor or by providing text feedback.

Users can add, delete, reorder, or even rewrite certain steps in the plan based on their experience and understanding of the task, ensuring the plan better meets actual requirements. This collaborative planning method enables users to integrate their expertise and experience into the task plan, thereby improving the quality and efficiency of task completion.

During task execution, Magentic-UI also emphasizes collaborative execution with users. It provides real-time updates to the user on its upcoming actions, such as which button it will click, what content it will input, or which webpage it will visit. It also provides real-time feedback on observed webpage information to the user.

Microsoft Magentic-UI

Users can pause the agent’s operations at any time, provide feedback to the agent through natural language, point out issues, offer suggestions, or make corrections. They can even directly take over browser operations, complete certain steps themselves, and then hand control back to the agent. This collaborative execution allows users to promptly identify and resolve potential issues that may arise during the agent’s execution, preventing task failures or undesirable outcomes due to incorrect agent operations.

Magentic-UI also features a unique “action protection” mechanism, which seeks user permission before performing potentially irreversible operations. These operations may include closing tabs, clicking buttons with side effects, or submitting forms.

Users can decide whether to allow the agent to perform these actions based on their judgment, thereby avoiding risks associated with the agent’s blind operations. Magentic-UI also employs sandbox technology, running the browser and code executor in isolated environments, further ensuring operational security and preventing potential security threats from the agent.

Magentic-UI Framework Overview
#

When a user submits an automation task request to Magentic-UI, the system first receives the user’s input, which can be simple text commands or complex requests with accompanying images. Magentic-UI’s core component, the coordinator, leverages its underlying Large Language Model (LLM) capabilities to generate a preliminary step-by-step plan based on the user’s input. This plan details the steps required to complete the task, including webpages to visit, actions to perform, and other tools that may need to be invoked.

After generating the preliminary plan, Magentic-UI doesn’t immediately begin execution. Instead, it enters a crucial collaborative planning phase. In this stage, users can directly modify the plan generated by Magentic-UI through an intuitive plan editing interface. Users can add, delete, or adjust steps in the plan, or even completely rewrite certain steps.

Microsoft Magentic-UI

Magentic-UI provides real-time feedback on user modification suggestions and adjusts the plan based on user input. This process ensures that users can integrate their expertise and expectations into the task plan, thereby improving the accuracy and efficiency of task completion.

The plan, once confirmed or modified by the user, is sent to the execution phase. Magentic-UI’s execution process is highly transparent and collaborative. The system shows the user in real-time the specific actions it is about to take, for example, clicking buttons, entering search terms, or visiting specific web pages.

At the same time, Magentic-UI also provides real-time feedback to the user on the information it observes on the webpage. Users can pause Magentic-UI’s operations at any time and provide feedback through natural language, pointing out issues or offering suggestions. If users believe a certain step requires manual operation, they can even directly take over browser operations, complete specific steps, and then hand control back to Magentic-UI.

Another important feature of Magentic-UI is self-planned learning. After completing a task, it can learn from user feedback and the task execution process, saving step-by-step plans to form a plan library.

Microsoft Magentic-UI

In future tasks, when users input similar tasks, Magentic-UI can quickly retrieve and invoke the corresponding plans, significantly improving task execution efficiency. Furthermore, users can view and modify saved plans at any time, making adjustments and optimizations as needed to better handle different task scenarios.

Related

Intel Secures Major 18A Foundry Order From Microsoft
·837 words·4 mins
Intel 18A Microsoft
AMD Ryzen 9000G Desktop CPUs Coming Soon
·673 words·4 mins
AMD Ryzen 9000G Zen 5
Three Major Chip Design EDA Firms Have Cut Off Supply to China
·811 words·4 mins
EDA Cadence Synopsys Siemens