Source code for swarmsim.Simulators.gym_simulator

import numpy as np
from swarmsim.Simulators import Simulator



[docs]
class GymSimulator(Simulator):
    """
    OpenAI Gym-compatible simulator for reinforcement learning applications.

    This simulator extends the base Simulator class to provide a Gym-style interface
    suitable for reinforcement learning algorithms. It implements the standard
    ``reset()``, ``step()``, ``render()``, and ``close()`` methods expected by
    RL frameworks while maintaining full compatibility with the swarmsim ecosystem.

    The GymSimulator is specifically designed for scenarios where one population
    (typically herders or controllers) receives actions from an RL agent, while
    other populations follow their programmed dynamics. This makes it ideal for
    training agents in multi-agent environments like shepherding, swarm control,
    or cooperative robotics.

    Key Features
    ------------
    **RL Compatibility**:
    
    - Standard Gym interface (reset, step, render, close)
    - Configurable render modes including "rgb_array" for headless training
    - Episode-based simulation management
    - Action space integration with swarmsim populations

    **Multi-Agent Integration**:
    
    - Seamless integration with existing swarmsim components
    - Support for mixed controlled/autonomous populations
    - Interaction computation between all agent types
    - Environment state management across episodes

    **Training Optimization**:
    
    - Efficient episode reset without full reinitialization
    - Configurable rendering for training vs evaluation
    - Memory management for long training sessions
    - Component state preservation between episodes

    Parameters
    ----------
    populations : list of Population
        List of agent populations, where populations[1] is typically controlled by RL actions.
    interactions : list of Interaction
        Inter-agent interaction models applied during simulation.
    environment : Environment
        The environment instance containing spatial and physical constraints.
    integrator : Integrator
        Numerical integration scheme for updating agent states.
    logger : Logger
        Data recording component for tracking training metrics.
    renderer : Renderer
        Visualization component with configurable render modes.
    render_mode : str, optional
        Rendering mode: "human" for display, "rgb_array" for numpy arrays. Default is None.
    config_path : str, optional
        Path to YAML configuration file with simulation parameters.

    Attributes
    ----------
    render_mode : str or None
        Current rendering mode configuration.

    Methods
    -------
    reset()
        Reset all simulation components to initial states for new episode.
    step(action)
        Execute one simulation timestep with the provided action.
    render()
        Render current simulation state according to render_mode.
    close()
        Clean up simulation resources and close rendering.

    Examples
    --------
    Basic RL setup for shepherding:

    .. code-block:: python

        from swarmsim.Simulators import GymSimulator
        from swarmsim.Populations import BrownianMotion, SimpleIntegrators
        from swarmsim.Environments import ShepherdingEnvironment

        # Create populations (sheep and herders)
        sheep = BrownianMotion(config_path="sheep_config.yaml")
        herders = SimpleIntegrators(config_path="herder_config.yaml")

        # Create RL-compatible simulator
        gym_sim = GymSimulator(
            populations=[sheep, herders],  # herders[1] will receive RL actions
            interactions=[repulsion, attraction],
            environment=shepherding_env,
            integrator=integrator,
            logger=logger,
            renderer=renderer,
            render_mode="rgb_array"  # For headless training
        )

        # RL training loop
        for episode in range(1000):
            gym_sim.reset()
            done = False
            while not done:
                action = rl_agent.get_action()  # Shape: (n_herders, action_dim)
                gym_sim.step(action)
                frame = gym_sim.render()  # Returns numpy array
                done = gym_sim.logger.check_termination()

    Integration with popular RL libraries:

    .. code-block:: python

        import gym
        from stable_baselines3 import PPO

        # Wrap GymSimulator in Gym environment
        class SwarmEnv(gym.Env):
            def __init__(self):
                self.simulator = GymSimulator(...)
                self.action_space = gym.spaces.Box(...)
                self.observation_space = gym.spaces.Box(...)
            
            def reset(self):
                self.simulator.reset()
                return self._get_observation()
            
            def step(self, action):
                self.simulator.step(action)
                obs = self._get_observation()
                reward = self._compute_reward()
                done = self._check_done()
                return obs, reward, done, {}

        # Train with stable-baselines3
        env = SwarmEnv()
        model = PPO("MlpPolicy", env, verbose=1)
        model.learn(total_timesteps=100000)

    Notes
    -----
    - The controlled population is assumed to be populations[1] by convention
    - Actions are directly assigned to the controlled population's input (u)
    - All populations and interactions are reset between episodes
    - Logger provides episode termination signals through done flags
    - Rendering behavior depends on the render_mode configuration
    """


[docs]
    def __init__(self, populations, interactions, environment, integrator, logger, renderer, render_mode=None,
                 config_path=None) -> None:

        """        
        Initializes the Simulator class with configuration parameters from a YAML file.

        Args:
            config_path (str): The path to the YAML configuration file.
        """

        # Load config params from YAML file
        super().__init__(populations=populations,
                         interactions=interactions,
                         environment=environment,
                         controllers=None,
                         integrator=integrator,
                         logger=logger,
                         renderer=renderer,
                         config_path=config_path)

        self.render_mode = render_mode     # IMPLEMENT RENDER MODE



[docs]
    def reset(self):
        """
        Reset all simulation components to initial states for new episode.

        This method prepares the simulator for a new episode by resetting all
        populations, interactions, and logger to their initial configurations.
        It ensures that each episode starts from a clean, reproducible state.

        Reset Operations
        ----------------
        1. **Population Reset**: All agent populations return to initial positions and states
        2. **Interaction Reset**: Interaction models clear any accumulated state
        3. **Logger Reset**: Logging system prepares for new episode data collection

        Performance Notes
        -----------------
        - Optimized to avoid full component reinitialization
        - Reuses existing object instances for memory efficiency
        - Faster than creating new simulator instance

        Notes
        -----
        - Called at the beginning of each RL episode
        - Does not reset renderer or environment (typically persistent)
        - Maintains component configurations while resetting states
        """

        # RESET INITIAL CONDITIONS OF THE POPULATIONS AND ENVIRONMENT AND LOGGER
        for population in self.populations:
            population.reset()

        for interaction in self.interactions:
            interaction.reset()

        self.logger.reset()



[docs]
    def step(self, action):
        """
        Execute one simulation timestep with the provided RL action.

        This method advances the simulation by one timestep, applying the provided
        action to the controlled population and updating all system components
        according to the standard simulation pipeline.

        Parameters
        ----------
        action : np.ndarray
            Control action for the controlled population. Shape should match
            the controlled population's input dimension: (n_agents, input_dim).

        Simulation Pipeline
        -------------------
        1. **Action Application**: Assign action to controlled population's input (u)
        2. **Interaction Computation**: Calculate forces between all agent populations
        3. **State Integration**: Update agent positions and velocities using integrator
        4. **Force Reset**: Clear interaction forces for next timestep
        5. **Environment Update**: Update environmental conditions

        Action Space
        ------------
        The action space depends on the controlled population and application:

        - **Velocity Control**: Direct velocity commands (vx, vy)
        - **Acceleration Control**: Force/acceleration inputs (fx, fy) 

        Notes
        -----
        - Assumes populations[1] is the controlled population by convention
        - Action dimensions must match controlled population's input_dim
        - Forces are automatically reset after each timestep
        - Environment state can change dynamically during simulation
        """

        self.populations[1].u = action  # the first population is assumed to be controlled

        # Compute the interactions between the agents
        for interact in self.interactions:
            interact.target_population.f += interact.get_interaction()

        # Update the state of the agents
        self.integrator.step(self.populations)

        # Reset interaction forces
        for population in self.populations:
            population.f = np.zeros([population.N, population.input_dim])

        # Update the environment
        self.environment.update()



[docs]
    def render(self):
        """
        Render the current simulation state according to configured render mode.

        This method provides flexible rendering output depending on the render_mode
        setting, supporting both human visualization and programmatic access to
        rendered frames for analysis or recording.

        Returns
        -------
        np.ndarray or None
            - If render_mode == "rgb_array": Returns numpy array of shape (height, width, 3)
            - Otherwise: Returns None, displays visualization

        Render Modes
        ------------
        **"rgb_array" Mode**:
        
        - Returns rendered frame as numpy array
        - Suitable for headless training and automated analysis
        - Efficient for batch processing and video generation
        - Compatible with gym.wrappers.RecordVideo

        **Other Mode**:
        
        - Displays visualization in real-time window
        - Interactive visualization with user controls
        - Suitable for debugging and demonstration
        - May block execution depending on renderer implementation

        Notes
        -----
        - Render mode can be changed dynamically during simulation
        - Frame dimensions depend on renderer configuration
        - Some renderers may not support all render modes
        - Rendering quality vs performance trade-offs are renderer-dependent
        """

        if self.render_mode == "rgb_array":
            return self.renderer.render()
        else:
            self.renderer.render()



[docs]
    def close(self):
        # self.logger.close()
        self.renderer.close()
        """
        Clean up simulation resources and shut down rendering.

        This method properly terminates the simulation by closing rendering
        windows and releasing allocated resources. It should be called at
        the end of training or evaluation to ensure clean shutdown.

        Cleanup Operations
        ------------------
        - **Renderer Shutdown**: Close visualization windows and graphics contexts
        - **Resource Release**: Free allocated memory and system resources
        - **Logger Cleanup**: Finalize data logging and close output files

        
        Notes
        -----
        - Should be called after training completion
        - Safe to call multiple times (idempotent)
        - Essential for preventing resource leaks in long training sessions
        - May save final logs or visualizations depending on configuration
        """