[slides] Pointer Networks

The paper introduces Pointer Networks (Ptr-Nets), a new neural architecture designed to learn the conditional probability of an output sequence with discrete tokens corresponding to positions in an input sequence. This approach addresses problems where the number of target classes in each step of the output depends on the length of the input, such as sorting variable-sized sequences and combinatorial optimization problems. Ptr-Nets use a neural attention mechanism to select elements from the input sequence as the output, rather than blending hidden units of the encoder to a context vector. The authors demonstrate that Ptr-Nets can solve three challenging geometric problems—finding planar convex hulls, computing Delaunay triangulations, and the planar Traveling Salesman Problem (TSP)—using training examples alone. They show that Ptr-Nets outperform sequence-to-sequence models with input attention and can generalize to variable-sized output dictionaries. The models are trained to produce approximate solutions and can extrapolate beyond the maximum lengths they were trained on. The paper also discusses the motivation, datasets, and empirical results for each problem, highlighting the effectiveness of Ptr-Nets in handling variable-length inputs and outputs.The paper introduces Pointer Networks (Ptr-Nets), a new neural architecture designed to learn the conditional probability of an output sequence with discrete tokens corresponding to positions in an input sequence. This approach addresses problems where the number of target classes in each step of the output depends on the length of the input, such as sorting variable-sized sequences and combinatorial optimization problems. Ptr-Nets use a neural attention mechanism to select elements from the input sequence as the output, rather than blending hidden units of the encoder to a context vector. The authors demonstrate that Ptr-Nets can solve three challenging geometric problems—finding planar convex hulls, computing Delaunay triangulations, and the planar Traveling Salesman Problem (TSP)—using training examples alone. They show that Ptr-Nets outperform sequence-to-sequence models with input attention and can generalize to variable-sized output dictionaries. The models are trained to produce approximate solutions and can extrapolate beyond the maximum lengths they were trained on. The paper also discusses the motivation, datasets, and empirical results for each problem, highlighting the effectiveness of Ptr-Nets in handling variable-length inputs and outputs.

Pointer Networks

2 Jan 2017 | Oriol Vinyals*, Meire Fortunato*, Navdeep Jaitly

2 Jan 2017 | Oriol Vinyals, Meire Fortunato, Navdeep Jaitly