<<VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation>>是Google Waymo在2020年放出Paper，如论文题目所表达的，其核心贡献在于提出了一种针对HDMap和车辆轨迹的矢量化Encoding方式，相对于以往图像+Conv的编码方式，不仅极大的降低了神经网络的训练参数，在实际的车辆轨迹预测中也取得了不错的效果。

论文链接: https://arxiv.org/pdf/2005.04259.pdf

网上关于论文的解读已经很多，本文主要记录下代码复现过程和自己的一些理解。

Part1环境准备

论文中使用了公开数据集Argoverse Dataset。

数据集和高精地图下载地址：

https://www.argoverse.org/data.html#download-link

高精地图API:

除了高精地图数据，Argoverse还提供了开源的地图查询的API，GitHub地址如下：

https://github.com/argoai/argoverse-api

安装的方法wiki中有详细的说明，不赘述。API提供了很多数据加载查询和可视化的方法。

Argoverse运动预测竞赛

除此之外，Argoverse还有一个运动预测的挑战赛，官方提供了一些Base Line的算法实现，有些代码可以直接拿过用。代码的Github地址：https://github.com/jagjeet-singh/argoverse-forecasting

Part2数据预处理

高精地图数据和轨迹数据下载完成之后，进行模型训练前先要对数据进行预处理。

Argoverse Motion Forecasting数据集的每个数据序列中类型为Agent的轨迹是用于预测的轨迹，该轨迹采样频率为10HZ，采样时长为5s，通常我们用前2s的轨迹作为观察序列，后3s作为轨迹预测的真值来训练神经网络。

seq_df = pd.read_csv(seq_path)
agent_x = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['X']

agent_y = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['Y']

agent_traj = np.column_stack((agent_x, agent_y))
agent_timestamp = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['TIMESTAMP']

city_name = seq_df['CITY_NAME'].values[0]

1坐标系处理

为了避免神经网络对绝对坐标的依赖，同时降低神经网络的训练难度，论文中对坐标系进行了标准化处理：将坐标系的原点移到Tatget Agent的Observe序列的最后一个点，同时以该点的朝向作为x轴的正向。

论文中对坐标系的描述如下：

实现代码如下：

base_pt = agent_traj[args.obs_len - 1]

offset_agent_traj, angle = normalized_traj(args, agent_traj, base_pt)

平移和旋转坐标序列的代码如下。函数返回标准化的轨迹坐标序列以及旋转的角度。高精地图和所有车辆轨迹的标准化都使用这个函数。

def normalized_traj(args, xy_seq, base_pt):
# First apply translation

m = [1, 0, 0, 1, -base_pt[0], -base_pt[1]]

ls = LineString(xy_seq)
# Now apply rotation, taking care of edge cases

ls_offset = affine_transform(ls, m)

end = ls_offset.coords[args.obs_len]

if end[0] == 0 and end[1] == 0:

angle = 0.0

elif end[0] == 0:

angle = -90.0 if end[1] > 0 else 90.0

elif end[1] == 0:

angle = 0.0 if end[0] > 0 else 180.0

else:

angle = math.degrees(math.atan(end[1] / end[0]))

if (end[0] > 0 and end[1] > 0) or (end[0] > 0 and end[1] < 0):

angle = -angle

else:

angle = 180.0 - angle
# Rotate the trajetory

ls_rotate = rotate(ls_offset, angle, origin=(0, 0)).coords[:]
# Normalized trajectory

norm_xy = np.array(ls_rotate)

return norm_xy, angle

2高精地图数据预处理

车道中心线插值

论文中用均匀采样的离散点序列表达高精地图，具体的处理手法如下：

Argoverse Motion Prediction数据集中只提供了矢量的车道中心线，道路边界、车道边界都是多边形，不好处理，因此我只使用了车道中心线。

在将车道中心线表述为向量之前，先对中心线进行插值。这里以0.1m的间隔对车道中心线进行插值。插值代码如下:

def interpolate_polyline_n(polyline, num_points):    tck, u = interp.splprep(polyline.T, s=0)    u = np.linspace(0.0, 1.0, num_points)

    return np.column_stack(interp.splev(u, tck))

高精地图的向量表达

高精地图训练数据的总体表达分为三层：每个Seq是一个Graph，每一个车道中心线是一个PolyLine，每个Polyline包含多个vector。在我们的实现中，每个vector是长度为9为向量，每个Polyline最多包含100个vector，每个Graph最多包含200个Polyline。

--Graph_1

------PolyLine_1

-----------vector_1

-----------vector_2

-----------......

------PolyLine_2

-----------vector_1

-----------vector_2

-----------......

--Graph_2

--.........

Vector数据组织

按照论文的表述，将车道中心线表达为向量形式。

其中：、、、是相邻两个Point的坐标，是元素类型(车道、车辆轨迹)，否有红绿灯，是转向类型(左转、右转、直行)，是是否在路口内，id是元素的polyline的id，同一个Lane所有向量的id是相同的。

实现代码如下:

has_traffic_contrl = avm.lane_has_traffic_control_measure(lane_id, city_name)

turn = avm.get_lane_turn_direction(lane_id, city_name)

in_intersection = avm.lane_is_in_intersection(lane_id, city_name)

origin_center_line = avm.get_lane_segment_centerline(lane_id, city_name)
...
lane_features = []

for cl_indx in range(1, len(center_line)):

lane_feature = []
cl_0 = center_line[cl_indx - 1]

cl_1 = center_line[cl_indx]
# offset to the last observed point

lane_feature.extend([cl_0[0], cl_0[1]])

lane_feature.extend([cl_1[0], cl_1[1]])
# object type: lane

lane_feature.append(1)
if has_traffic_contrl == True:

lane_feature.append(1)

else:

lane_feature.append(-1)
if turn == 'LEFT':

lane_feature.append(1)

elif turn == 'RIGHT':

lane_feature.append(-1)

else:

lane_feature.append(0.5)
if in_intersection == True:

lane_feature.append(1)

else:

lane_feature.append(-1)
lane_feature.append(lane_id)

lane_features.append(lane_feature)

解决车道线长短不一的问题

由于车道中心线长短不一，为了解决向量长度差异比较大的问题，将长度超过100m的车道做了打断处理。

Polyline的数据组织

Polyline包含多个Vector，我们代码中设置最多包含100个Vector，不足100个用0补齐，多于100个直接截断；同时设置一个mask数组，记录哪些是真实数据，哪些是补齐的数据。

代码如下:

lane_features_mask = [True] * len(lane_features)

if len(lane_features) < args.max_features_in_elems:    pad_mask = [False] * (args.max_features_in_elems - len(lane_features))    lane_features_mask.extend(pad_mask)

    pad_feature = [[0] * args.max_feature_dim] * (args.max_features_in_elems - len(lane_features))    lane_features.extend(pad_feature)else:    lane_features = lane_features[:args.max_features_in_elems]    lane_features_mask = lane_features_mask[:args.max_features_in_elems]

sub_graph.append(lane_features)sub_graph_mask.append(lane_features_mask)

Graph的数据组织

Graph包含多个Polyline，我们代码中设置最多包含200个Polyline，不足200个用0补齐，多于200个直接截断；同时也设置一个mask数组，记录哪些是真实数据，哪些是补齐的数据。

代码如下:

graph_mask = [True] * len(sub_graph)
if len(sub_graph) < args.max_elems_in_sub_graph:

pad_mask = [[False] * args.max_features_in_elems] * (args.max_elems_in_sub_graph - len(sub_graph))

sub_graph_mask.extend(pad_mask)
graph_mask.extend([False] * (args.max_elems_in_sub_graph - len(sub_graph)))

pad_feature = [[[0] * args.max_feature_dim] * args.max_features_in_elems] * (args.max_elems_in_sub_graph - len(sub_graph)) sub_graph.extend(pad_feature) else: sub_graph_mask = sub_graph_mask[:args.max_elems_in_sub_graph] graph_mask = graph_mask[:args.max_elems_in_sub_graph] sub_graph = sub_graph[:args.max_elems_in_sub_graph]

其它

除了地图数据之外，论文中还进行了Graph Completion和Position Encoding。

Graph Completion

随机Mask的代码如下：

node_mask = np.array(graph_mask)lane_idxs = np.argwhere(node_mask == True)

random_cnt = 5 # (int)(lane_idxs.shape[0] * 0.3)indexs = np.random.choice(np.arange(1, lane_idxs.shape[0]), size = random_cnt, replace = False)lane_random_idxs = lane_idxs[indexs]node_mask = np.full(node_mask.shape, False)node_mask[lane_random_idxs] = True

Position Encoding

Position Encoding的实现很简单，取每个polyline的坐标最小数值即可。

sub_pids.append([np.array(lane_features)[:, [0, 2]].min(), np.array(lane_features)[:, [1, 3]].min()])

3Agent轨迹数据预处理

Agent分为Target Agent和Other Agent两个分类，Target Agent是待预测的车辆轨迹，Other Agent是其它车辆的轨迹，这两类轨迹数据的处理方法相同，其向量形式表达如下：

组织代码如下:

agent_features = []

# obs_trajfor agent_idx in range(1, len(offset_agent_traj)):    agent_feature = []    agent_0 = offset_agent_traj[agent_idx - 1]    agent_1 = offset_agent_traj[agent_idx]

    agent_feature.extend([agent_0[0], agent_0[1]])    agent_feature.extend([agent_1[0], agent_1[1]])

    # object type: agent_traj    agent_feature.append(obj_type)

    #timestamp    diff_timestamp = agent_timestamp[agent_idx] - agent_timestamp[agent_idx - 1]    agent_feature.append(diff_timestamp)

    unit_vec = (agent_1 - agent_0)    agent_feature.append(unit_vec[0])    agent_feature.append(unit_vec[1])

    agent_feature.append(agent_id)

    agent_features.append(agent_feature)

在处理Other Agent的轨迹，过滤掉了静止的车辆。过滤的方式采用了Argoverse Baseline的代码，将速度低于1m/s的Agent过滤掉。

def compute_velocity(track_df: pd.DataFrame) -> List[float]:

x_coord = track_df['X'].values

y_coord = track_df['Y'].values

timestamp = track_df['TIMESTAMP'].values

vel_x, vel_y = zip(*[(

x_coord[i] - x_coord[i - 1] /

(float(timestamp[i]) - float(timestamp[i - 1])),

y_coord[i] - y_coord[i - 1] /

(float(timestamp[i]) - float(timestamp[i - 1])),

) for i in range(1, len(timestamp))])

vel = [np.sqrt(x**2 + y**2) for x, y in zip(vel_x, vel_y)]
return vel
def get_is_track_stationary(track_df: pd.DataFrame) -> bool:

STATIONARY_THRESHOLD = 13
vel = compute_velocity(track_df)

sorted_vel = sorted(vel)

threshold_vel = sorted_vel[STATIONARY_THRESHOLD]
VELOCITY_THRESHOLD = 1.0

return True if threshold_vel < VELOCITY_THRESHOLD else False

读取并处理Other Agent的数据。

EXIST_THRESHOLD = 15other_agent_groups = seq_df.groupby('TRACK_ID')for group_name, group_data in other_agent_groups:

# Check if the track is long enoughif len(group_data) < EXIST_THRESHOLD:    continue

# Skip if agent trackif group_data['OBJECT_TYPE'].iloc[0] == 'AGENT':   continue

if get_is_track_stationary(group_data):  continue

x_coord = group_data['X'].valuesy_coord = group_data['Y'].valuestimestamp = group_data['TIMESTAMP'].values

other_agent_traj = np.column_stack((x_coord, y_coord))

offset_other_agent_traj = normalized_map(other_agent_traj, base_pt, angle)

create_agent_feature(args, offset_other_agent_traj, timestamp, 3, origin_center_lines, sub_pids, sub_graph, sub_graph_mask)

读取并处理Target Agent数据。

agent_x = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['X']

agent_y = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['Y']

agent_traj = np.column_stack((agent_x, agent_y))

agent_timestamp = seq_df[seq_df['OBJECT_TYPE'] == 'AGENT']['TIMESTAMP'].values
offset_agent_traj, angle = normalized_traj(args, agent_traj, base_pt)
base_pt = agent_traj[args.obs_len - 1]
offset_agent_traj, angle = normalized_traj(args, agent_traj, base_pt)

create_agent_feature(args, offset_agent_traj[:args.obs_len], agent_timestamp, 2, origin_center_lines, sub_pids, sub_graph, sub_graph_mask)

Part3训练效果

完整训练Argoverse数据耗时比较长，先用200个Sequence进行训练调试。

20/25, loss:55.78622817993164------------------------------------------------{'minADE': 4.3394534013715225, 'minFDE': 8.642288635816033, 'MR': 0.90625}------------------------------------------------21/25, loss:55.540924072265625------------------------------------------------{'minADE': 4.290202449056427, 'minFDE': 8.579150848481632, 'MR': 0.90625}------------------------------------------------22/25, loss:55.3294792175293------------------------------------------------{'minADE': 4.2439700423643965, 'minFDE': 8.519883228494539, 'MR': 0.9375}------------------------------------------------23/25, loss:55.14278030395508------------------------------------------------{'minADE': 4.200904121030685, 'minFDE': 8.466275328781865, 'MR': 0.9375}------------------------------------------------24/25, loss:54.96728515625------------------------------------------------{'minADE': 4.1585633970311235, 'minFDE': 8.41244784105069, 'MR': 0.9375}------------------------------------------------

下图中黄色是Observe序列，紫色是Ground Truth，绿色是预测结果。