MarsCode Agent is ranked 1st place in SWE-bench Lite

Oct 23, 2024 1 min read

We are delighted to announce that MarsCode Agent is ranked 1st place in SWE-bench Lite, a benchmark to evaluate large language models and agents on solving real-world github issues.

Original Twitter link

Recent advances in large language models (LLMs) have shown significant potential to automate various software development tasks, including code completion, test generation, and bug fixing. However, the application of LLMs for automated bug fixing remains challenging due to the complexity and diversity of real-world software systems.

We introduce MarsCode Agent, a novel framework that leverages LLMs to automatically identify and repair bugs in software code. MarsCode Agent combines the power of LLMs with advanced code analysis techniques to accurately localize faults and generate patches. Our approach follows a systematic process of planning, bug reproduction, fault localization, candidate patch generation, and validation to ensure high-quality bug fixes. We evaluated MarsCode Agent on SWE-bench, a comprehensive benchmark of real-world software projects, and our results show that MarsCode Agent achieves a high success rate in bug fixing compared to most of the existing automated approaches.

Checkout our technical report at MarsCode Agent: AI-native Automated Bug Fixing.