Kimi Agent

Back

Imagining an agentic future

Role

Design Owner

Role

Design Owner

Team

1 product manager

1 designer

Kimi Engineering Team

Kimi RL Team

Team

1 product manager

1 designer

Kimi Engineering Team

Kimi RL Team

Skills

Design System

Competitive Analysis

Prototyping

Usability Testing

Agent Interaction Design

Skills

Design System

Competitive Analysis

Prototyping

Usability Testing

Agent Interaction Design

Duration

August 2024 - April 2025

Duration

August 2024 - April 2025

The Agentic Leap

The buzz around “agents” or “agentic systems” signals a significant shift in the AI industry. While powerful, traditional LLMs often act as passive information processors, unable to directly act on a user’s behalf. At Moonshot AI, we see agents as an inevitable step towards an AGI future, precisely because they bridge this gap.

Agents move beyond passive knowledge retrieval to active task completion by understanding context, reasoning, planning, and executing actions to achieve complex goals. This potential creates tangible value in specific, design-critical scenarios.

Targeting Value Through Design

AI agents excel in two key scenarios. First, they can tackle tasks humans simply don’t want to do - the repetitive, time-consuming processes. Functioning as intelligent automation, they handle multi-step workflows efficiently, often in the background.

The design priority here is providing reliable oversight and clear status visibility. Users need confidence that the agent is performing correctly without constant supervision. This can be achieved by transparent action logs, clear progress indicators, and robust controls for starting, stopping, and managing error cases.

Second, agents assist with tasks humans don’t know how to do optimally, acting as intelligent guides or collaborators on complex goals.

For these agents, the core design challenge shifts towards building user trust and enabling effective collaboration. This requires deep transparency into the agent’s reasoning (its CoT) and tool use. A collaborator design should also provide controls that allow users to intervene, provide clarification, and guide the process, eventually balancing agent autonomy with user partnership.

Ultimately, addressing these distinct design priorities thoughtfully is key to guiding users on a journey to build robust trust in agentic systems, from initial interaction to deep collaboration.

Case Study 1: The Kimi Browser Agent

Why Browser Agent as my first step?

My exploration into agentic design began with the Kimi Browser Agent, building directly on the success of Kimi’s popular Chrome browser extension. While the extension excelled at explaining webpage content or summarizing articles, I envisioned an agent capable of much more:

Expanding Access to Information: While Search APIs provide broad coverage, many valuable information sources remain less accessible - think dynamic social media feeds and content behind logins and CAPTCHAs. A Browser Agent, by directly interacting with web pages like a human, could tap into these previously hard-to-reach information wells.

Enabling True Task Completion: More profoundly, the Browser Agent allowed users to delegate actual tasks, moving beyond simple information retrieval. This meant Kimi could not only summarize an article about ergonomic office setups but also actively help choosing and purchasing those appliances on Amazon.

The Not-Very-Successful First Attempt

My initial design allowed users to switch from “Chat Mode” to “Agent Mode” via the launcher to assign a task. The agent would then operate by opening new tabs within the same Chrome browser window, displaying a status indicator at the bottom.

However, usability testing quickly surfaced critical issues:

Lack of Education: Some users were still asking questions about the website while some others assigning tasks that were out of the scope of a browser. They didn't fully understand what could be a best task for the Browser Agent, even though the participants were told to think about what they typically do with a browser.

Failed Asynchronous Use: Users tended to watch the agent operate rather than working asynchronously as I’d intended. This behavior might be triggered by a mix of curiosity about the novel (and still evolving) agent technology and an initial lack of trust. This undermined my goal of an autonomous, background assistant.

Low Process Visibility & Confusing Help Requests: Users had minimal visibility into the specific steps the agent was performing. This lack of transparency, combined with a technical limitation where the agent’s screen perception couldn’t detect if users had provided requested assistance (like completing a login or filling a form with sensitive payment info), led to significant confusion. Users struggled to understand what help the agent needed or the appropriate moment to click “resume,” severely hampering task completion rates.

Improving the Usability of the Browser Agent

Subsequent design iterations aimed to directly address the issues identified in the initial usability tests.

For education and promoting async use:

First, the Browser Agent was given a dedicated homepage accessible from the launcher, allowing users to start new tasks, view examples, or check task history. Tasks now run in a separate, dedicated Chrome window. This design choice reinforce the concept of an autonomous, background assistant and, practically, reduced task failures caused by users accidentally closing the agent’s working tabs.

For process visibility and control:

Second, I introduced a refined status bar at the bottom of the agent window, paired with an action log displayed in a sidebar. This combination clearly communicated the agent’s current status, the steps it had taken, and provided improved user controls. This transparency aimed to build user confidence, reassuring them that they didn't need to constantly supervise the agent but could easily review progress or provide help when needed.

For better help requests:

This enhanced interface also made it much easier for users to provide assistance. With the status indicator, action log, and a clear view of the live webpage, users could quickly understand where the agent got stuck and provide the necessary input.

For promoting async use and better help requests:

Third, if the extension detected that the user was not active in the agent’s dedicated browser window (which is great, this indicates a successful parallel workflow), it would display a minimal, non-intrusive status indicator in the corner of the user's active window. This ensured that if the agent completed its task, encountered an error, or needed help, the user would be immediately notified and could quickly switch to the agent’s window. This feature dramatically improved user responsiveness: in usability tests, 83% of help requests were noticed and resolved before timeout, a significant increase from 49% with the initial design.

Case Study 2: The Kimi Researcher Agent

Navigating Complexity for Deeper Insights

I designed Kimi’s Researcher Agent for complex, in-depth investigations, guided by the principle that intelligence truly shines when knowledge is purposefully put into action. Unlike simpler queries, this agent can leverage a suite of tools — Search APIs, Browser Interaction, Code execution, and more — powered by multiple cycles of reasoning to select appropriate tools and synthesize information.

This deeper approach naturally produced far richer, more comprehensive outputs, often substantial enough to be considered standalone “deliverables” rather than just chat replies. These capabilities, while powerful, introduced unique UX complexities around managing longer task times, visualizing intricate processes, and presenting these rich outputs effectively.

Three Core Design Questions Guiding My Exploration

How should human's input bandwidth be improved? This is a core dilemma in human-AI interaction, especially for chatbot products: humans have a relatively low input bandwidth (around 40 words per minutes on average), far below a LLM can process. While LLMs need more detailed and highly structured input for high quality output, people simply don't want to write an essay for each prompt.

How should the research process (Chain-of-Thought, tool calls) be presented? While some user feedback on earlier Kimi reasoning models questioned the purpose of showing detailed CoT, for the Researcher Agent, I believed showcasing the process — including tool interactions like web browse or Python code execution — is key. It provides transparency, building trust in both the methodology and the results. Furthermore, observing the process demonstrates Kimi’s capabilities, and the “labor illusion”—the perceived effort—can enhance the output’s value.

How should the final research results be delivered? While an in-chat markdown report is familiar, presenting the report as a standalone artifact offers distinct advantages. An artifact, opened in a sidebar, allows for a more immersive viewing and export experience for very long outputs. It also creates the possibility of incorporating direct editing with Kimi’s assistance or supporting diverse output types, like interactive webpages.

To address these considerations, I explored three primary design directions:

Direction 1: Familiar In-Chat Experience
This concept prioritized consistency with existing Kimi interactions, placing both the research process and the final report in the main chat stream. Browsed web content was accessible via a sidebar. This aimed to brand “Researcher” as an advanced mode offering comprehensive output at the cost of more time, rather than an entirely new feature.

Direction 2: Process in Sidebar, Report in Chat
To declutter the main chat for conversation and final report, this design moved the detailed research process (CoT, tool logs) to an optional sidebar. The sidebar’s “Research” tab would switch to “References” after the report was generated.

Direction 3: Process in Chat, Live Tool Output in Sidebar
This approach showed the main research process in the chat, while a dynamic half-screen sidebar provided a live view of tool outputs, such as active web browse or code execution.

After extensive rounds of usability testing and interviews with professional users — including bankers, lawyers, engineers, and students tackling final projects — I built an iterated version based on their valuable feedback.

Increased Human Supervision at a Low Cost

After users assign a research topic, the Researcher Agent will ask follow-up questions to proactively seek for clarification about the scale, the focus, and the type of deliverable. Users can type in a few words, or let the agent decide on its own with one click.

This additional, guided input, although could be really simple, made significant contribution to the research. When we presented the research report to our pro users in legal and health domain, 40% of them agreed that the agent's planning, the research relevance, and the quality of the report were greatly improved.

Balancing Transparency, Focus, and Rich Interaction

The final design integrated research progress display, live tool interaction, and result presentation within a cohesive interface, primarily utilizing a dynamic sidebar.

Research progress is shown in this sidebar, with inactive steps automatically collapsing. This provides a clean overview of the agent's path without overwhelming users with an overly detailed action log, while still offering a clear sense of the steps taken. A key change here is the “Workbench” concept within the lower part of this sidebar. The Workbench displays live, detailed insights into the agent's current tool usage. Users can even drill down into specific actions; for example, when the agent is using the browser, arrow buttons allow users to see individual steps like scrolling, clicking, or media interactions.

A core design principle was to maintain a single visual focal point at any given time: the user’s attention is directed either to the agent's reasoning (CoT), the live tool activity in the Workbench, or, once completed, the final report. This prevents cognitive overload.

The research report itself is delivered as a markdown file, which automatically opens in the sidebar, replacing the research progress view, once the agent begins outputting it. This artifact-based approach significantly improves the reading experience for long-form content, charts, or HTML-rendered outputs. More importantly, it enables a powerful interactive editing loop: users can select text or code within the artifact and ask Kimi to make changes (e.g., "expand," "shorten," "adjust"), with Kimi then generating a new version of the file. This offers far greater flexibility and capability than a static chat message, and users can easily download the report or share the webpage.

This design, despite introducing engineering complexity, was highly endorsed by our professional users during testing. It also proved highly scalable structure to accommodate a growing array of new tools and capabilities the agent might use in the future (such as MCP).

Designing for Trust and Value in AI Agent

My work designing Kimi's very first agentic systems was an insightful, iterative exploration into the evolving landscape of human-AI interaction. It reinforced my conviction that creating truly effective agentic systems isn't just about harnessing technical capability, but about a deep commitment to human-centered design.

This involves thoughtfully balancing robust agent autonomy with clear user oversight, demystifying underlying complexity through transparent operations, and always prioritizing the user's sense of control and the tangible value delivered in every interaction.

Essentially, by designing AI agents, we are building a new type of partnership, instead of a new, sophisticated hammer.

To read more about how Kimi Researcher was trained, here is an article from the team.