notes of course “Introduction to Data Science” from RWTH-Aachen in semester winter 19/20, Professor van der Aalst, Willibrordus

文章目录

Two questions
Conformance checking

Fitness of the model

Generating supervised learning problems
Tooling

Two questions

Performance Problems & Compliance Problems

Reporting : What happened?
Diagnosis: Why did it happen?
Prediction: What will happen?
Recommendation: What is the best that can happen?

Conformance checking

Fitness of the model

$\text { fitness }(\sigma, N)=\frac{1}{2}\left(1-\frac{m}{c}\right)+\frac{1}{2}\left(1-\frac{r}{p}\right)$

Counting tokens while replaying:

p = produced tokens
c = consumed tokens
m = missing tokens
r = remaining tokens

At any time: $p+m\geq c \geq m$ (You are not adding missing token that you do not need)
At the end: $r=p+m-c$
Not necessarily p = c
For instance:
Process Mining Supervised

Initialization: p = 1
Finalization: c = c + 1

Fitness at the log level
$\begin{aligned} \text { fitness }(L, N)=& \frac{1}{2}\left(1-\frac{\sum_{\sigma \in L} L(\sigma) \times m_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times c_{N, \sigma}}\right)+frac{1}{2}\left(1-\frac{\sum_{\sigma \in L} L(\sigma) \times r_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times p_{N, \sigma}}\right) \end{aligned}$

Generating supervised learning problems

Process Mining Supervised

Tooling

ProM
Disco
celonis
Lexmark
minit
myinvenio