论文笔记:Learning task-oriented grasping for tool manipulation from simulated self-supervision

1. 文章大概

1.1 文章做了什么

文章做了一个Task-Oriented Grasping Network。目前很多机械臂抓取只是单纯的抓取,文章做的是一个任务导向的抓取。抓取工具然后执行操作,文章主要完成的两个任务是:sweeping and hammering。
Four keys aspects to learning task-oriented tool usage:

  • understanding the desired effect(预期)
  • identifying properties of an object that make it a suitable tool(目标抓取物体的属性)
  • determining the correct orientation of the tool prior to usage(抓取)
  • manipulating the tool(操作)

1.2 怎么做

a two-stage procedure:

  • robot picks up a tool
  • manipulates this grasped tool to complete a task
    use self-supervised learning paradigm
    training labels are collected through the robot performing grasping and manipulation attempts in a trialand-error fashion.
    采用了自监督学习范式 ,训练标签是通过机器人以试错的方式进行抓取和操作尝试来收集的。

1.3 创新点

  • learn-based model
  • develop a mechanism for generating large-scale simulated self-supervision
  • generalize well in both simulation and real world

2. Related work

2.1. Task-agnostic grasping

Only learning with rendered depth images as opposed to rendered RGB images enabled the trained models to transfer to execution on a real robot without further fine-tuning, because physical depth cameras produce images that are largely similar to rendered depth images

2.2.Task-oriented grasping

incorporate semantic constraints

  • 1.dataset
  • 2.do not entail the success of the downstream manipulation tasks

2.3.Affordance learning


3.Problem statement


3.1.Notation of grasping

observation space OO, camera observation point cloud
possible grasps GG, perpendicular to the table plane g=(gx,gy,gz,gϕ)g = (g_x, g_y, g_z, g_\phi)
Given oOo \in O and gGg \in G, SG(o,g){0,1}S_G(o,g)\in\{0,1\} denote a binary-values grasp success metric.
the probability of grasp success QG(o,g)Q_G(o,g), QG(o,g)=Pr(SG=1o,g)Q_G(o, g) = Pr(S_G =1 | o, g), SGS_G is task-agnostic.

3.2. Problem setup

  • grasp stage
  • manipulation stage, a policy π\pi produces actions to interact with the enviroment once the object is graspde

ST(o,g)0,1S_T(o, g) \in {0, 1}: a binary-valued task-specific success metric
QTπQ_T^\pi the probability of task success under policy π\pi, QTπ(o,g)=Pr(ST=1o,g)Q_T^\pi(o,g)=Pr(S_T =1 | o,g)The overall leaning objective is to train both policies simultaneously such that
g,π=argmaxg,πQTπ(o,g)g^*,\pi^* = argmax_{g,\pi}Q_T^\pi(o,g)

4.Task-oriented grasping for tool manipulation

4.1.Task-oriented grasp prediction

finding the corresponding grasp gg that maximizes the grasp quality QG(o,g)=Pr(SG=1o,g)Q_G(o,g) = Pr(S_G =1 |o,g)
QTGπ=Prπ(ST=1SG=1,o,g)Q_{T|G}^\pi = Pr_\pi(S_T = 1| S_G = 1, o, g) conditioned on a successful grasp
QTGπδQ_{T|G}^\pi \ge \delta
QTπ(og)=Prπ(ST=1o,g)=Pr(ST=,SG=1o,g)=Prπ(ST=1SG=1,o,g)×Pr(SG=1o,g)=QTGπ(o,g)×QG(o,g)Q_T^\pi(o,g) = Pr_\pi(S_T = 1|o,g) =Pr(S_T=,S_G=1|o,g) =Pr_\pi(S_T=1|S_G=1,o,g)\times Pr(S_G=1|o,g) = Q_{T|G}^\pi(o,g)\times Q_G(o,g)

predicted values denoted as Q^G(o,g;θ1)\hat Q_G(o,g;\theta_1) and Q^TGπ(o,g;θ2)\hat Q_{T|G}^\pi(o,g; \theta_2), θ1\theta_1andθ2\theta_2 represent the neural network parameters.

4.2. Manipulation policy

use a Gaussian policy
π(ao,g;θ3)=N(f(o,g;θ3),)\pi(a|o,g;\theta_3) = N(f(o,g;\theta_3),\sum), f(o,g;θ3)f(o,g;\theta_3) predict mean and diagonal matrix
