追逃逃避差分游戏

完美的状态信息游戏(Perfect State Information Game)

Consider a Perfect State Information (PSI), where the state x is available. The Linear Time-Invariant (LTI) system, with the appropriate dimensions, is given by:

考虑状态x可用的完美状态信息(PSI)。 具有适当尺寸的线性时不变(LTI)系统由下式给出:

追逃逃避差分游戏

where A is the dynamic matrix, B is the pursuer control vector, and D is the evader control vector. The problem is to find a control u which minimizes and w which maximizes the cost function:

其中A是动态矩阵, B是追踪者控制向量, D是逃避者控制向量。 问题在于找到一个使u最小化而w最大化成本函数的控制:

追逃逃避差分游戏

xf is the final state, Q is the weight matrix, R and W are the control weight matrices, and gamma is the disturbance attenuation ratio.

xf是最终状态, Q是权重矩阵, RW是控制权重矩阵, gamma是干扰衰减率。

不完善的状态信息游戏 (Imperfect State Information Game)

Consider an Imperfect State Information(ISI), where x is not available. The LTI system, with the appropriate dimensions, is given by:

考虑不完整状态信息(ISI),其中x不可用。 具有适当尺寸的LTI系统由以下方式给出:

追逃逃避差分游戏

Here z is the measurement equation, H is the observation matrix and v is the additive noise. Now, the problem is to find a control u which minimizes, and w which maximizes the following augmented cost function:

其中z是测量方程式, H是观测矩阵, v是加性噪声。 现在,问题是找到一个最小化的控件u和一个最大化以下增量成本函数的w

追逃逃避差分游戏

The uncertainty of the initial condition is weighted by Y0 and the uncertainty of the additive noise (its magnitude) is weighted by V. These weights should be determined by the game designer.

初始条件的不确定性由Y 0加权,附加噪声的不确定性(其大小)由V加权。 这些权重应由游戏设计师确定。

完美的状态信息游戏-解决方案 (Perfect State Information Game — Solution)

The solution for this kind of gam, zero sum game, holds the “saddle point inequality“, meaning:

此类游戏零和博弈的解决方案包含“鞍点不等式”,意思是:

追逃逃避差分游戏

Where the strategic pair u*,w* are called the saddle point controls (strategies). The saddle point inequality is satisfied if the strategic pair is

战略对u *,w *称为鞍点控制(战略)。 如果战略对为,则满足鞍点不等式

追逃逃避差分游戏

Where will be generated by a Riccati Differential Equation (RDE), given in the end of this post. These players strategies are optimal for each one of them as described in the cost function.

在此末尾给出的Riccati微分方程(RDE)将生成。 对于成本函数中所述的每个参与者,这些参与者策略都是最佳的。

A full derivation of the optimal strategies is given below to a one- dimensional case. The state equation, the cost function, and the RDE are given:

下面针对一维情况给出了最佳策略的完整推导。 给出了状态方程,成本函数和RDE:

追逃逃避差分游戏

In this example we penalize the state x, at the final time by considering a positive weight, b>0. Consider the identity (where Xf=b,x0=0):

在本示例中,我们在最后一次考虑正权重b > 0来惩罚状态x 。 考虑身份(其中Xf = b,x0 = 0 ):

追逃逃避差分游戏

The optimal strategies can be found as follows:

最佳策略可以发现如下:

追逃逃避差分游戏

We get the linear control feedback (player’s strategies):

我们得到线性控制反馈(玩家的策略):

追逃逃避差分游戏

As gamma>1, an optimal solution is promised.

gamma> 1时,将提出最佳解决方案。

不完善的状态信息游戏—解决方案 (Imperfect State Information Game — Solution)

In the ISI case, the intuition suggests to estimate the unavailable state and apply the same gain that obtains for the PSI (This is the idea of the “separation Theorem” which is valid in Linear Quadratic Gaussian (LQG) control, but is not valid in LQDG). The general solution to the ISI is provided by Speyer and it involves a standard Kalman Filter for estimating x for the noisy measurements and then just apply a modified version of the pursuer gain (from the PSI solution) on the estimated value of x:

在ISI情况下,直觉建议估计不可用状态,并应用与PSI相同的增益(这是“分离定理”的思想,在线性二次高斯(LQG)控制中有效,但无效在LQDG中)。 ISI的一般解决方案由Speyer提供,它涉及一个标准的卡尔曼滤波器,用于估计x进行噪声测量,然后仅对x的估计值应用跟踪器增益的修改版本(来自PSI解决方案)

追逃逃避差分游戏

The optimal pursuer strategy [Speyer]:

最佳追随者策略[Speyer]:

追逃逃避差分游戏

where X,Y are obtained from:

X,Y从以下位置获得:

追逃逃避差分游戏

概要 (Summary)

This post introduced the pursuit evasion differential games in perfect and imperfect information contexts, with a full strategies derivation for the former. Next, we deal with a simplified example for boat guidance, to demonstrate the mathematical power of this formulation.

这篇文章介绍了在完美和不完美的信息环境中进行追逃逃避的差分博弈,并针对前者进行了全面的策略推导。 接下来,我们处理一个用于船引导的简化示例,以演示此公式的数学能力。

Barak Or

巴拉克

www.barakor.com

www.barakor.com

Further reading

进一步阅读

[1] Primer on Optimal Control Theory. Jason L.Speyer, David H. Jacobson. SIAM 2010.

[1]最优控制理论入门。 Jason L.Speyer,David H.Jacobson。 SIAM 2010。

[2] Game in Aerospace: Homing Missile Guidance. Joseph Z. Ben-Asher and Jason L.Speyer. Handbook of Dynamic Game Theory, Tamar Basar and George Zaccour, Editors Springer 2017.

[2]《航空航天中的游戏:制导导弹制导》。 约瑟夫·本·阿瑟(Joseph Z.Ben-Asher)和杰森·斯Perl(Jason L.Speyer)。 《动态博弈论手册》,塔玛·巴萨尔(Tamar Basar)和乔治·扎科(George Zaccour),Springer编辑2017年。

[3] An Adaptive Terminal Guidance Scheme Based on an Exponential Cost Criterion with Application to Homing Missile Guidance. Jason L.Speyer. IEEE 1976.

[3]一种基于指数成本准则的自适应终端制导方案及其在制导导弹制导中的应用。 杰森·斯派尔(Jason L.Speyer)。 IEEE 1976年。

[4] Advances in Missile Guidance Theory. J. Z. Ben-Asher, Isaac Yaesh. AIAA 1998.

[4]导弹制导理论的进展。 JZ Ben-Asher,艾萨克·亚伊什(Isaac Yaesh)。 AIAA 1998年。

[5] “Differential Games: a mathematical theory with applications to warfare and pursuit, control and optimization”. Rufus Isaacs. Wiley, 1965.

[5]“差分博弈:一种数学理论及其在战争和追击,控制与优化中的应用”。 鲁弗斯·艾萨克斯(Rufus Isaacs)。 威利(Wiley),1965年。

[6] Applied Optimal Control. Bryson and Ho. Hemisphere, 1975.

[6]应用最优控制。 布赖森和何。 1975年,半球。

[7] H∞-optimal control and related minimax design problem. A game theoretic approach”. Basar and Bernhard. Springer, 1991.

[7]H∞最优控制和相关的极大极小设计问题。 游戏理论方法”。 巴萨尔和伯恩哈德。 施普林格(1991)。

[8] H-Infinity Control and Estimation of State-multiplicative Linear Systems. Eli Gershon, Uri Shaked, Isaac Yaesh. LNCIS, Springer 2005.

[8]状态无限线性系统的H-无穷大控制和估计。 以利·格申(Eli Gershon),乌里·沙克(Uri Shaked),以撒·亚伊什(Isaac Yaesh)。 LNCIS,Springer,2005年。

[9] “Pursuit Evasion Games with Imperfect Information Revisited”. Barak Or, Joseph Z. Ben-Asher and Isaac Yaesh, The 58th Israel Annual Conference on Aerospace sciences (IACAS).

[9]“重新审视具有不完美信息的追逃游戏”。 Barak Or,Joseph Z. Ben-Asher和Isaac Yaesh,第58届以色列航空航天科学年会(IACAS)。

[10] Estimation with Applications to Tracking and Navigation. Yaakov Bar-Shalom, X.Rong Li, Thiagalingam Kirubarajan. Wiley, 2001.

[10]估计与跟踪和导航的应用。 Yaakov Bar-Shalom,X.Rong Li,Thiagalingam Kirubarajan。 威利(Wiley),2001年。

[11] Or, Barak, Joseph Z. Ben-Asher, and Isaac Yaesh. “Optimal Disturbance Attenuation Approach with Measurement Feedback to Missile Guidance.” arXiv preprint arXiv:2001.04308 (2020).

[11]或者,巴拉克,约瑟夫·本·阿舍尔和以撒·亚伊斯。 “具有对导弹制导的测量反馈的最佳扰动衰减方法。” arXiv预印本arXiv:2001.04308 (2020)。

翻译自: https://towardsdatascience.com/pursuit-evasion-differential-games-3e778829b4e9