Gym除去各种环境定义外核心的代码其实并不大,下面是Gym源码的目录结构:

其中envs 占据了较大篇幅。单其核心部分是spaces, 和core.py 。这些定义了Env几个主要的类定,Env(环境),Spaces(值域),由space定义了Action和Observation的空间。这些构成了Action-Environment Loop的基本部件。

Spaces 中定义几种不同类型的值域空间类型,Space为基类,定义Space类型的接口:
class Space(object):
"""Defines the observation and action spaces, so you can write generic
code that applies to any Env. For example, you can choose a random
action.
"""
def __init__(self, shape=None, dtype=None):
import numpy as np # takes about 300-400ms to import, so we load lazily
self.shape = None if shape is None else tuple(shape)
self.dtype = None if dtype is None else np.dtype(dtype)
self.np_random = None
self.seed()
def sample(self):
"""Randomly sample an element of this space. Can be
uniform or non-uniform sampling based on boundedness of space."""
raise NotImplementedError
def seed(self, seed=None):
"""Seed the PRNG of this space. """
self.np_random, seed = seeding.np_random(seed)
return [seed]
def contains(self, x):
"""
Return boolean specifying if x is a valid
member of this space
"""
raise NotImplementedError
def __contains__(self, x):
return self.contains(x)
def to_jsonable(self, sample_n):
"""Convert a batch of samples from this space to a JSONable data type."""
# By default, assume identity is JSONable
return sample_n
def from_jsonable(self, sample_n):
"""Convert a JSONable data type to a batch of samples from this space."""
# By default, assume identity is JSONable
return sample_n
其中最主要的两种为 Discrete 和 Box
Discrete
Discrete为离散的整数 [0,1,2…n]
In [1]: from gym.spaces import *
In [2]: space = Discrete(5)
In [3]: space.sample()
Out[3]: 2
In [4]: space.sample()
Out[4]: 1
In [5]: space.sample()
Out[5]: 4
Box
Box 为连续的多维有理数空间R^n。 它表示n个闭区间的卡氏积。每个区间的取值可以是
[a, b], (-oo, b], [a, oo), or (-oo, oo)
Box 常见两种用法,一是每个维度的取值范围相同:
In [1]: from gym.spaces import *
In [2]: import numpy as np
In [3]: space = Box(low=-1.0, high=2.0, shape=(3, 4), dtype=np.float32)
In [4]: space.sample()
Out[4]:
array([[-0.49652585, 0.9263435 , 0.38507813, 0.783846 ],
[ 0.85791075, 1.8828201 , -0.9763712 , -0.7506176 ],
[ 0.5605676 , 0.58183783, -0.43566808, 0.50398904]],
dtype=float32)
In [5]: space.sample()
Out[5]:
array([[ 1.0351197 , -0.26707068, 1.2349498 , -0.03579823],
[ 0.35440695, 1.6972734 , -0.94597757, 0.43317792],
[ 1.7264552 , 0.7422606 , -0.641941 , 1.9083056 ]],
dtype=float32)
另外一种是每个维度的取值范围不同:
In [7]: space = Box(low=np.array([-1.0, -2.0]), high=np.array([2.0, 4.0]), dtype=np.float32)
In [8]: space.sample()
Out[8]: array([-0.01725261, -1.8928218 ], dtype=float32)
In [9]: space.sample()
Out[9]: array([-0.37487462, -0.4833201 ], dtype=float32)
In [10]: space.sample()
Out[10]: array([-0.2957047, 1.2446854], dtype=float32)
MultiDiscrete
MultiDiscrete 是多维离散整数,这对于游戏类型的操作尤为有用,比如我们可以定义任天堂游戏操纵杆为一个多维离散整数集:
大多数环境我们用0代表不做任何操作NOOP.
1) Arrow Keys: Discrete 5 - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] - params: min: 0, max: 4
2) Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
3) Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
可以使用MultiDiscrete([ 5, 2, 2 ])来表示:
In [11]: space = MultiDiscrete([ 5, 2, 2 ])
In [12]: space.sample()
Out[12]: array([2, 1, 0])
In [13]: space.sample()
Out[13]: array([2, 1, 1])
In [14]: space.sample()
Out[14]: array([3, 0, 0])
Spaces其它的几种类型基本上也是有Discrete和Box组合而成,比如MultiBinary和MultiDiscrete非常类似,只是每个取值只能是二员0或者1.dict和tuple分别对应于字典和元组类型:
In [16]: space = Dict({"position": Discrete(2), "velocity": Discrete(3)})
In [17]: space.sample()
Out[17]: OrderedDict([('position', 1), ('velocity', 2)])
In [18]: space.sample()
Out[18]: OrderedDict([('position', 0), ('velocity', 0)])
In [19]: space = Tuple((Discrete(2), Discrete(3)))
In [20]: space.sample()
Out[20]: (0, 1)
In [21]: space.sample()
Out[21]: (1, 0)
In [22]: space.sample()
Out[22]: (1, 2)
In [23]: space = MultiBinary(5)
In [24]: space.sample()
Out[24]: array([0, 1, 1, 1, 1], dtype=int8)
In [25]: space.sample()
Out[25]: array([1, 0, 1, 0, 1], dtype=int8)
In [26]: space.sample()
Out[26]: array([1, 0, 1, 0, 1], dtype=int8)