论文笔记：Learning task-oriented grasping for tool manipulation from simulated self-supervision

1. 文章大概

1.1 文章做了什么

文章做了一个Task-Oriented Grasping Network。目前很多机械臂抓取只是单纯的抓取，文章做的是一个任务导向的抓取。抓取工具然后执行操作，文章主要完成的两个任务是：sweeping and hammering。
Four keys aspects to learning task-oriented tool usage:

understanding the desired effect（预期）
identifying properties of an object that make it a suitable tool（目标抓取物体的属性）
determining the correct orientation of the tool prior to usage（抓取）
manipulating the tool（操作）

1.2 怎么做

a two-stage procedure:

robot picks up a tool
manipulates this grasped tool to complete a task
dataset:
use self-supervised learning paradigm
training labels are collected through the robot performing grasping and manipulation attempts in a trialand-error fashion.
采用了自监督学习范式 ,训练标签是通过机器人以试错的方式进行抓取和操作尝试来收集的。

1.3 创新点

learn-based model
develop a mechanism for generating large-scale simulated self-supervision
generalize well in both simulation and real world

2. Related work

2.1. Task-agnostic grasping

Only learning with rendered depth images as opposed to rendered RGB images enabled the trained models to transfer to execution on a real robot without further fine-tuning, because physical depth cameras produce images that are largely similar to rendered depth images
只有学习渲染的深度图像而不是渲染的RGB图像，才能使训练的模型在没有进一步微调的情况下转移到实际机器人上执行，因为物理深度摄影机生成的图像与渲染的深度图像基本相似。

2.2.Task-oriented grasping

incorporate semantic constraints

1.dataset
2.do not entail the success of the downstream manipulation tasks
相比之下作者的工作直接是根据下游任务优化的

2.3.Affordance learning

描述了物体的功能特性

3.Problem statement

目标：物体完成一个功能性操作，任务分成两部分：1、抓取物体；2、操作物体

3.1.Notation of grasping

observation space $O$ , camera observation point cloud
possible grasps $G$ , perpendicular to the table plane $g = (g_x, g_y, g_z, g_\phi)$
Given $o \in O$ and $g \in G$ , $S_G(o,g)\in\{0,1\}$ denote a binary-values grasp success metric.
the probability of grasp success $Q_G(o,g)$ , $Q_G(o, g) = Pr(S_G =1 | o, g)$ , $S_G$ is task-agnostic.

3.2. Problem setup

grasp stage
manipulation stage, a policy $\pi$ produces actions to interact with the enviroment once the object is graspde

$S_T(o, g) \in {0, 1}$ : a binary-valued task-specific success metric
$Q_T^\pi$ the probability of task success under policy $\pi$ , $Q_T^\pi(o,g)=Pr(S_T =1 | o,g)$ The overall leaning objective is to train both policies simultaneously such that
$g^*,\pi^* = argmax_{g,\pi}Q_T^\pi(o,g)$

4.Task-oriented grasping for tool manipulation

论文笔记：Learning task-oriented grasping for tool manipulation from simulated self-supervision

4.1.Task-oriented grasp prediction

finding the corresponding grasp $g$ that maximizes the grasp quality $Q_G(o,g) = Pr(S_G =1 |o,g)$
$Q_{T|G}^\pi = Pr_\pi(S_T = 1| S_G = 1, o, g)$ conditioned on a successful grasp
$Q_{T|G}^\pi \ge \delta$
$Q_T^\pi(o，g) = Pr_\pi(S_T = 1|o,g) =Pr(S_T=,S_G=1|o,g) =Pr_\pi(S_T=1|S_G=1,o,g)\times Pr(S_G=1|o,g) = Q_{T|G}^\pi(o,g)\times Q_G(o,g)$

predicted values denoted as $\hat Q_G(o,g;\theta_1)$ and $\hat Q_{T|G}^\pi(o,g; \theta_2)$ , $\theta_1$ and $\theta_2$ represent the neural network parameters.

4.2. Manipulation policy

use a Gaussian policy
$\pi(a|o,g;\theta_3) = N(f(o,g;\theta_3),\sum)$ , $f(o,g;\theta_3)$ predict mean and diagonal matrix
论文笔记：Learning task-oriented grasping for tool manipulation from simulated self-supervision

论文笔记：Learning task-oriented grasping for tool manipulation from simulated self-supervision

1. 文章大概

1.1 文章做了什么

1.2 怎么做

1.3 创新点

2. Related work

2.1. Task-agnostic grasping

2.2.Task-oriented grasping

2.3.Affordance learning

3.Problem statement

3.1.Notation of grasping

3.2. Problem setup

4.Task-oriented grasping for tool manipulation

4.1.Task-oriented grasp prediction

4.2. Manipulation policy

相关推荐