External Application API

In some cases, for instance when interacting with an externally hosted simulator or production environment, it makes more sense to interact with RLlib as if it were an independently running service, rather than RLlib hosting the simulations itself. This is possible via RLlib’s external applications interface (full documentation).

class ray.rllib.env.policy_client.PolicyClient(address: str, inference_mode: str = 'local', update_interval: float = 10.0)[source]

REST client to interact with an RLlib policy server.

start_episode(episode_id: Optional[str] = None, training_enabled: bool = True) str[source]

Record the start of one or more episode(s).

Parameters
  • episode_id (Optional[str]) – Unique string id for the episode or None for it to be auto-assigned.

  • training_enabled – Whether to use experiences for this episode to improve the policy.

Returns

Unique string id for the episode.

Return type

episode_id

get_action(episode_id: str, observation: Union[Any, Dict[Any, Any]]) Union[Any, Dict[Any, Any]][source]

Record an observation and get the on-policy action.

Parameters
  • episode_id – Episode id returned from start_episode().

  • observation – Current environment observation.

Returns

Action from the env action space.

Return type

action

log_action(episode_id: str, observation: Union[Any, Dict[Any, Any]], action: Union[Any, Dict[Any, Any]]) None[source]

Record an observation and (off-policy) action taken.

Parameters
  • episode_id – Episode id returned from start_episode().

  • observation – Current environment observation.

  • action – Action for the observation.

log_returns(episode_id: str, reward: float, info: Optional[Union[dict, Dict[Any, Any]]] = None, multiagent_done_dict: Optional[Dict[Any, Any]] = None) None[source]

Record returns from the environment.

The reward will be attributed to the previous action taken by the episode. Rewards accumulate until the next action. If no reward is logged before the next action, a reward of 0.0 is assumed.

Parameters
  • episode_id – Episode id returned from start_episode().

  • reward – Reward from the environment.

  • info – Extra info dict.

  • multiagent_done_dict – Multi-agent done information.

end_episode(episode_id: str, observation: Union[Any, Dict[Any, Any]]) None[source]

Record the end of an episode.

Parameters
  • episode_id – Episode id returned from start_episode().

  • observation – Current environment observation.

update_policy_weights() None[source]

Query the server for new policy weights, if local inference is enabled.

class ray.rllib.env.policy_server_input.PolicyServerInput(ioctx, address, port, idle_timeout=3.0)[source]

REST policy server that acts as an offline data source.

This launches a multi-threaded server that listens on the specified host and port to serve policy requests and forward experiences to RLlib. For high performance experience collection, it implements InputReader.

For an example, run examples/serving/cartpole_server.py along with examples/serving/cartpole_client.py --inference-mode=local|remote.

Examples

>>> import gym
>>> from ray.rllib.algorithms.pg import PGConfig
>>> from ray.rllib.env.policy_client import PolicyClient
>>> from ray.rllib.env.policy_server_input import PolicyServerInput
>>> addr, port = ... 
>>> config = ( 
...     PGConfig()
...     .environment("CartPole-v1")
...     .offline_data(
...         input_=lambda ioctx: PolicyServerInput(ioctx, addr, port)
...     )
...     # Run just 1 server (in the Algorithm's WorkerSet).
...     .rollouts(num_rollout_workers=0)
... )
>>> pg = config.build() 
>>> while True: 
>>>     pg.train() 
>>> client = PolicyClient( 
...     "localhost:9900", inference_mode="local")
>>> eps_id = client.start_episode()  
>>> env = gym.make("CartPole-v1")
>>> obs = env.reset()
>>> action = client.get_action(eps_id, obs) 
>>> reward = env.step(action)[0] 
>>> client.log_returns(eps_id, reward) 
>>> client.log_returns(eps_id, reward) 
next()[source]

Returns the next batch of read experiences.

Returns

The experience read (SampleBatch or MultiAgentBatch).