Supported data for manuscript "Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot" Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.11221749
· OA: W4310280911
The repository data corresponds partially to the manuscript titled "Can LLM-Augmented Autonomous Agents Cooperate? An Evaluation of Their Cooperative Capabilities through Melting Pot," submitted to IEEE Transactions on Artificial Intelligence. The dataset comprises experiments conducted with Large Language Model-Augmented Autonomous Agents (LAAs), as implemented in the ["Cooperative Agents" repository](https://github.com/Cooperative-IA/CooperativeGPT/tree/main), using substrates from the Melting Pot framework. Dataset Scope This dataset is divided into two main experiment categories: Personality__experiments: These focus on a single scenario (Commons Harvest) to assess various agent personalities and their cooperative dynamics. Comparison_baselines__experiments: These experiments include three distinct scenarios designed by Melting Pot: Commons Harvest Open Externally Mushrooms Coins These scenarios evaluate different cooperative and competitive behaviors among agents and are used to compare decision-making architectures of LAAs against reinforcement learning (RL) baselines. Unlike the Personality__experiments, these comparisons do not involve bots but exclusively analyze RL and LAA architectures. Scenarios and Metrics The metrics and indicators extracted from the experiments depend on the scenario being evaluated: Commons Harvest Open: Focus: Resource consumption and environmental impact. Metrics include: Number of apples consumed. Devastation of trees (i.e., depletion of resources). Externally Mushrooms: Focus: Self-interest vs. collective benefit. Agents consume mushrooms with different outcomes: Mushrooms that benefit the individual. Mushrooms that benefit everyone. Mushrooms that benefit only others. Mushrooms that benefit the individual but penalize others. Metrics evaluate trade-offs between individual gain and collective welfare. Coins: Focus: Reciprocity and fairness. Agents collect coins with two options: Collect their own color coin for a reward. Collect a different color coin, which grants a reward to the agent but penalizes the other. Metrics include reciprocity rates and the balance of mutual benefits. Objectives of Comparison Experiments The Comparison_baselines__experiments aim to: Assess how LAAs compare to RL baselines in cooperative and competitive tasks across diverse scenarios. Compare decision-making architectures within LAAs, including chain-of-thought and generative approaches. These experiments help evaluate the robustness of LAAs in scenarios with varying complexity and social dilemmas, providing insights into their potential applications in real-world cooperative systems. Simulation Details (Applicable to All Experiments) In each simulation: Participants: Experiments involve predefined numbers of LAAs or RL agents. No bots are included in Comparison_baselines__experiments. Action Dynamics: Each agent performs high-level actions sequentially. Simulations conclude either after reaching a preset maximum number of rounds (typically 100) or prematurely if the scenario's resources are fully depleted. Metrics and Indicators: Extracted metrics depend on the scenario and include measures of individual performance, collective outcomes, and agent reciprocity. This repository enables reproducibility and serves as a benchmark for advancing research into cooperative and competitive behaviors in LLM-based agents.