Process Mining Supervised

notes of course “Introduction to Data Science” from RWTH-Aachen in semester winter 19/20, Professor van der Aalst, Willibrordus

Two questions

Performance Problems & Compliance Problems

  • Reporting : What happened?
  • Diagnosis: Why did it happen?
  • Prediction: What will happen?
  • Recommendation: What is the best that can happen?

Conformance checking

Fitness of the model

 fitness (σ,N)=12(1mc)+12(1rp)\text { fitness }(\sigma, N)=\frac{1}{2}\left(1-\frac{m}{c}\right)+\frac{1}{2}\left(1-\frac{r}{p}\right)

Counting tokens while replaying:

  • p = produced tokens
  • c = consumed tokens
  • m = missing tokens
  • r = remaining tokens

At any time: p+mcmp+m\geq c \geq m (You are not adding missing token that you do not need)
At the end: r=p+mcr=p+m-c
Not necessarily p = c
For instance:
Process Mining Supervised

Initialization: p = 1
Finalization: c = c + 1

Fitness at the log level
 fitness (L,N)=12(1σLL(σ)×mN,σσLL(σ)×cN,σ)+frac12(1σLL(σ)×rN,σσLL(σ)×pN,σ)\begin{aligned} \text { fitness }(L, N)=& \frac{1}{2}\left(1-\frac{\sum_{\sigma \in L} L(\sigma) \times m_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times c_{N, \sigma}}\right)+frac{1}{2}\left(1-\frac{\sum_{\sigma \in L} L(\sigma) \times r_{N, \sigma}}{\sum_{\sigma \in L} L(\sigma) \times p_{N, \sigma}}\right) \end{aligned}

Generating supervised learning problems

Process Mining Supervised

Tooling

  • ProM
  • Disco
  • celonis
  • Lexmark
  • minit
  • myinvenio