How Trae Achieves 68.2% on SWE-bench Verified

May 13, 2025 | By Dr. Pengfei Gao

1. Basic Configuration

We provided the Agent with the following four tools:

str_replace_editor: Enables the Agent to browse files, edit code, etc.
Bash: Allows the Agent to execute any command.
ckg_tools: Builds a Code Knowledge Graph (CKG) for the code repository, enabling the Agent to efficiently perform search_class and search_function operations.
sequential_thinking_tool: Facilitates step-by-step reasoning for the Agent.

The success rate for solving tasks in a single run ranged between 60.6% to 62.8%.

2. Patch Selection

We ran Trae in parallel across five independent solving attempts. The selection was based on the Augment SWEbench Agent's ensembler. Additionally, we integrated the regression testing module of Agentless to filter out patches that did not fully pass regression tests before making the final selection.

We utilized OpenAI o1 to select only one patch which is most likely to be correct.

Even though the highest single-run success rate was capped at 62.8%, the selection process improved the overall success rate to 68.2%.

3. Future Work

Our future work will focus on:

Improving single-run success rates: Exploring strategies to enhance the Agent's performance in a single solving attempt.
The sampling space: Investigating whether increasing the sampling space can enable the model to identify more correct solutions.

Contributions

Contributors: Pengfei Gao, Zhao Tian and Xiangxin Meng
Project Lead: Chao Peng

Meet Us at FSE

We are attending FSE 2025, presenting our paper, AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions, and organising an AI-IDE workshop. You can find us at our Booth close to the registration area, and the workshop on June 27th.

About Trae

Trae (/treɪ/) IDE is your helpful coding partner. It offers features like AI Q&A, code auto-completion, and agent-based AI programming capabilities. When developing projects with Trae, you can collaborate with AI to enhance your development efficiency.