<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Agent-oriented_software_engineering</id>
	<title>Agent-oriented software engineering - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Agent-oriented_software_engineering"/>
	<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Agent-oriented_software_engineering&amp;action=history"/>
	<updated>2026-05-04T16:46:07Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>http://debianws.lexgopc.com/wiki143/index.php?title=Agent-oriented_software_engineering&amp;diff=4712721&amp;oldid=prev</id>
		<title>imported&gt;Maxeto0910 at 19:12, 1 January 2025</title>
		<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Agent-oriented_software_engineering&amp;diff=4712721&amp;oldid=prev"/>
		<updated>2025-01-01T19:12:15Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Short description|Software}}&lt;br /&gt;
{{Multiple issues|&lt;br /&gt;
{{essay-like|date=December 2008}}&lt;br /&gt;
{{No footnotes|date=April 2009}}&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Agent-oriented software engineering&amp;#039;&amp;#039;&amp;#039; (&amp;#039;&amp;#039;&amp;#039;AOSE&amp;#039;&amp;#039;&amp;#039;) is a software engineering [[paradigm]] that arose to apply best practice in the development of complex [[Multi-agent systems|Multi-Agent Systems]] (MAS) by focusing on the use of agents, and organizations (communities) of agents as the main abstractions.  The field of [[Product Family Engineering|Software Product Line]]s (SPL) covers all the [[software]] development lifecycle necessary to develop a family of products where the derivation of concrete products is made systematically and rapidly.&lt;br /&gt;
&lt;br /&gt;
==Commentary==&lt;br /&gt;
With the advent of biologically inspired, pervasive, and [[autonomic computing]], the advantages of, and necessity of, agent-based technologies and MASs has become obvious{{Citation needed|date=December 2008}}. Unfortunately, current AOSE methodologies are dedicated to developing single MASs. Clearly, many MASs will make use of significantly the&lt;br /&gt;
same techniques, adaptations, and approaches. The field is thus ripe for exploiting the benefits of SPL: reduced costs, improved time-to-market, etc. and enhancing agent technology in such a way that it is more industrially applicable.&lt;br /&gt;
&lt;br /&gt;
Multiagent Systems Product Lines (MAS-PL) is a research field devoted to combining the two approaches: applying the SPL philosophy for building a MAS. This will afford all of the advantages of SPLs and make MAS development more practical.&lt;br /&gt;
&lt;br /&gt;
==Benchmarks==&lt;br /&gt;
Several benchmarks have been developed to evaluate the capabilities of AI coding agents and large language models in software engineering tasks. Here are some of the key benchmarks:&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Agentic software engineering benchmarks&lt;br /&gt;
! Benchmark !! Description&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.swebench.com/ SWE-bench]&lt;br /&gt;
| Assesses the ability of AI models to solve real-world software engineering issues sourced from GitHub repositories. The benchmark involves:&lt;br /&gt;
* Providing agents with a code repository and issue description&lt;br /&gt;
* Challenging them to generate a patch that resolves the described problem&lt;br /&gt;
* Evaluating the generated patch against unit tests&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/snap-stanford/MLAgentBench ML-Agent-Bench]&lt;br /&gt;
| Designed to evaluate AI agent performance on machine learning tasks&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/sierra-research/tau-bench τ-Bench]&lt;br /&gt;
| τ-Bench is a benchmark developed by Sierra AI to evaluate AI agent performance and reliability in real-world settings. It focuses on:&lt;br /&gt;
* Testing agents on complex tasks with dynamic user and tool interactions&lt;br /&gt;
* Assessing the ability to follow domain-specific policies&lt;br /&gt;
* Measuring consistency and reliability at scale&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/web-arena-x/webarena WebArena]&lt;br /&gt;
| Evaluates AI agents in a simulated web environment. The benchmark tasks include:&lt;br /&gt;
* Navigating complex websites to complete user-driven tasks&lt;br /&gt;
* Extracting relevant information from the web&lt;br /&gt;
* Testing the adaptability of agents to diverse web-based challenges&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/THUDM/AgentBench AgentBench]&lt;br /&gt;
| A benchmark designed to assess the capabilities of AI agents in handling multi-agent coordination tasks. The key areas of evaluation include:&lt;br /&gt;
* Communication and cooperation between agents&lt;br /&gt;
* Task efficiency and resource management&lt;br /&gt;
* Adaptability in dynamic environments&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/aryopg/mmlu-redux MMLU-Redux]&lt;br /&gt;
| An enhanced version of the MMLU benchmark, focusing on evaluating AI models across a broad range of academic subjects and domains. It measures:&lt;br /&gt;
* Subject matter expertise across multiple disciplines&lt;br /&gt;
* Ability to handle complex problem-solving tasks&lt;br /&gt;
* Consistency in providing accurate answers across topics&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/MCEVAL/McEval McEval]&lt;br /&gt;
| A coding benchmark designed to test AI models&amp;#039; ability to solve coding challenges. The benchmark evaluates:&lt;br /&gt;
* Code correctness and efficiency&lt;br /&gt;
* Ability to handle diverse programming languages&lt;br /&gt;
* Performance across different coding paradigms and tasks&lt;br /&gt;
|-&lt;br /&gt;
| [https://csbench.github.io/ CS-Bench]&lt;br /&gt;
| A specialized benchmark for evaluating AI performance in computer science-related tasks. The key focus areas include:&lt;br /&gt;
* Algorithms and data structures&lt;br /&gt;
* Computational complexity and optimization&lt;br /&gt;
* Theoretical and applied computer science concepts&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/allenai/WildBench WildBench]&lt;br /&gt;
| Tests AI models in understanding and reasoning about real-world wild environments. It emphasizes:&lt;br /&gt;
* Handling noisy and unstructured data&lt;br /&gt;
* Adapting to unpredictable changes in the environment&lt;br /&gt;
* Performing well in multi-modal scenarios with real-world relevance&lt;br /&gt;
|-&lt;br /&gt;
| [https://huggingface.co/datasets/baharef/ToT Test of Time]&lt;br /&gt;
| A benchmark that focuses on evaluating AI models&amp;#039; ability to reason about temporal sequences and events over time. It assesses:&lt;br /&gt;
* Understanding of temporal logic and sequence prediction&lt;br /&gt;
* Ability to make decisions based on time-dependent data&lt;br /&gt;
* Performance in tasks requiring long-term planning and foresight&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Software engineering agent systems ==&lt;br /&gt;
&lt;br /&gt;
There are several software engineering (SWE) agent systems in development. Here are some examples:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ List of SWE Agent Systems&lt;br /&gt;
! SWE Agent System !! Backend LLM&lt;br /&gt;
|-&lt;br /&gt;
| [https://salesforce-research-dei-agents.github.io/ Salesforce Research DEIBASE-1] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://cosine.sh/ Cosine Genie] || Fine-tuned OpenAI GPT&lt;br /&gt;
|-&lt;br /&gt;
| [https://aide.dev/ CodeStory Aide] || gpt4o + Claude 3.5 Sonnet&lt;br /&gt;
|-&lt;br /&gt;
| [https://mentat.ai/blog/mentatbot-sota-coding-agent AbenteAI MentatBot] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| Salesforce Research DEIBASE-2 || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| Salesforce Research DEI-Open || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.marscode.com/ Bytedance MarsCode] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://arxiv.org/abs/2406.01422 Alibaba Lingma] || gpt-4-1106-preview&lt;br /&gt;
|-&lt;br /&gt;
| [https://www.factory.ai/ Factory Code Droid] || Anthropic + OpenAI&lt;br /&gt;
|-&lt;br /&gt;
| [https://autocoderover.dev/ AutoCodeRover] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://aws.amazon.com/q/developer/ Amazon Q Developer] || (unknown)&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/NL2Code/CodeR CodeR] || gpt-4-1106-preview&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/masai-dev-agent/masai MASAI] || (unknown)&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240706_sima_gpt4o SIMA] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/OpenAutoCoder/Agentless Agentless] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/aorwall/moatless-tools Moatless Tools] || Claude 3.5 Sonnet&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240612_IBM_Research_Agent101 IBM Research Agent] || (unknown)&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/paul-gauthier/aider Aider] || gpt4o + Claude 3 Opus&lt;br /&gt;
|-&lt;br /&gt;
| [https://docs.all-hands.dev/ OpenDevin + CodeAct] || gpt4o&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/FSoft-AI4Code/AgileCoder AgileCoder] || (various)&lt;br /&gt;
|-&lt;br /&gt;
| [https://chatdev.ai/ ChatDev] || (unknown)&lt;br /&gt;
|-&lt;br /&gt;
| [https://github.com/geekan/MetaGPT MetaGPT] || gpt4o&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== External links ==&lt;br /&gt;
* &amp;#039;&amp;#039;Agent-Oriented Software Engineering: Reflections on Architectures, Methodologies, Languages, and Frameworks&amp;#039;&amp;#039; {{ISBN|978-3642544316}}&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
* Michael Winikoff and Lin Padgham.  &amp;#039;&amp;#039;Agent Oriented Software Engineering&amp;#039;&amp;#039;. Chapter 15 (pages 695-757) In G. Weiss (Ed.). [http://mitpress.mit.edu/multiagentsystems Multiagent Systems]. 2nd Edition. MIT Press. {{ISBN|978-0-262-01889-0}} (a recent survey of the field)&lt;br /&gt;
* Site of the MaCMAS methodology which is applying MAS-PL. https://web.archive.org/web/20100922120209/http://james.eii.us.es/MaCMAS/index.php/Main_Page&lt;br /&gt;
* MAS Product Lines site: https://web.archive.org/web/20140518122645/http://mas-productlines.org/&lt;br /&gt;
* Joaquin Peña, Michael G. Hinchey, and Antonio Ruiz-Cortés. Multiagent system product lines: Challenges and benefits. Communications of the ACM, December 2006, volume 49, issue number 12. {{doi|10.1145/1183236.1183272}}&lt;br /&gt;
* {{cite journal | last1 = Peña | first1 = Joaquin | last2 = Hinchey | first2 = Michael G. | last3 = Resinas | first3 = Manuel | last4 = Sterritt | first4 = Roy | last5 = Rash | first5 = James L. | title = Designing and Managing Evolving Systems using a MAS-Product-Line Approach | doi = 10.1016/j.scico.2006.10.007 | journal = Journal of Science of Computer Programming | year = 2007 | volume =  66| pages =  71–86| url = https://pure.ulster.ac.uk/en/publications/0a91f377-9421-4585-957b-77060a458644 | doi-access = free }}&lt;br /&gt;
* Joaquin Peña, Michael G. Hinchey, Antonio Ruiz-Cortés, and Pablo Trinidad. Building the Core Architecture of a NASA Multiagent System Product Line. In 7th International Workshop on Agent Oriented Software Engineering 2006, page to be published, Hakodate, Japan, May 2006. LNCS. https://doi.org/10.1007%2F978-3-540-70945-9_13&lt;br /&gt;
* Joaquin Peña, Michael G. Hinchey, Manuel Resinas, Roy Sterritt, James L. Rash. Managing the Evolution of an Enterprise Architecture using a MAS-Product-Line Approach. 5th Int. Workshop on System/Software Architectures (IWSSA’06). Nevada, USA. 2006&lt;br /&gt;
* Soe-Tsyr Yuan. MAS Building Environments with Product-Line-Architecture Awareness.&lt;br /&gt;
* [https://web.archive.org/web/20070517214904/http://www.cs.iastate.edu/~dehlinge/publications.html Josh_Dehlinger] and [[Robyn Lutz]] have several publications in this field.&lt;br /&gt;
* [https://web.archive.org/web/20091231195122/http://james.eii.us.es/MaCMAS/images/6/69/Current-Research-MAS-PL-TF4-Lisbon.pdf MAS-PL -- Current research]. In [http://www.irit.fr/ACTIVITES/EQ_SMI/SMAC/TFG4_CFP.html THE FOURTH TECHNICAL FORUM (TF4) of AgentLink]. December 2006.&lt;br /&gt;
&lt;br /&gt;
[[Category:Software project management]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{software-eng-stub}}&lt;/div&gt;</summary>
		<author><name>imported&gt;Maxeto0910</name></author>
	</entry>
</feed>