形式语言与自动机_笔记整理(一)_有穷自动机与正则表达式
昨天晚上入睡的时候已经将近三点了,最近两天无所事事却又异常失眠。偶然发现****blog的访问量已经超过了QQ空间,可见两年多自己在CS之路走得要比十多年的社交大道平坦得多呀,决定暂时中断社交平台的互动,以彰显期末复习之志。
回看博客记录的东西,除了冗余可读性差的代码,就是抱大腿的刻骨铭心记忆。回忆这半个学期确实没做什么事情。开学时不知天高地厚选了很多课,上起来才真正觉得力不从心。我以为我可以再强大一点,但是我并没有意识到,很多理论课自学难度稍微大于可以偶尔翘掉课堂然后自学并完成实验的应用课程。
一直清楚地知道自己并不能从完成大作业中获得很大的满足感,短暂的既视成就感不是我想要的,从不玩任何游戏的我以后也绝对不会去做也做不了游戏开发,但是我的心中一直不变的做一个优秀的并且快乐的程序员的梦想没有走远。
昨天晚上看到有人重提“写字台像乌鸦”的梗,又开始了一波感时伤世,你说啊,多愁伤感怎么会是我在这个时候应该想到的事情呢?细数自己应该不剩多少时间去挥霍游浪了,接下来的旅程怎么说都应该是道阻且艰的,怎就有胆量如此放肆任性而不计较呢?
Outline
Part 1
Preliminaries
Mathematical Knowledge
String and Language
Part 2
Finite Automata and Regular Expression
Context Free Grammar and Pushdown Automata
Turing Automata
Part 3 Modeling
Transition System
Petri Net
Timed and Hybrid Automata
Message Sequence Chart
Part 4 Tutorials
Computability
Model Checking
Trustworthy Software
Preliminary
Math Preliminary
SET
Powerset of S = the set of all the subsets of S, P(S), 2^S
FUNCTIONS
Given two sets A and B, a function from A into B
associates with each a in A at most one element b of B.
If A = domain, then f is a total function, otherwise f is a partial function.
f: A -> B is a bijection
- f is total
- for all a and a’ in A, a!=a’ implies f(a)!=f(a’)
- for all b in B, there is a in A with f(a)=b
RELATIONS
A binary relation R over A is a partial order if it is reflexive, transitive, and antisymmetric.
A binary relation R over A is a total order(linear order) if it is a partial order and for all a, b in A, either aRb or bRa.
GRAPHS
- Walk is a sequence of adjacent edges
- A path is a walk where no edge is repeated
- A simple path is a path where no node is repeated
- A cycle is a walk from a node (base) to itself
- A simple cycle: only the base node is repeated
- Given a digraph G = (V, E) and nodes u and v, we say v is reachable from u, or u-reachable, if there is a path from u to v.
- A tree is a directed graph that has no cycle.
PROOF TECHNIQUES
- Proof by induction
- Proof by contradiction
Languages
A language is a set of strings.
String: A sequence of letters/symbols.
Symbols are defined over an alphabet.
String Operations
Concatenation
Reverse
Operations on Languages
- The usual set operations
- Complement
- Reverse
- Concatenation
- Star-Closure (Kleene *):
L∗=L0∪L1∪L2⋯ - Positive Closure:
L+=L∗−{λ}
Finite Automata
Language of an Automaton: The set of strings accepted by an automaton A is the language of A. Denoted
Deterministic Finite Automata
Alphabet: An alphabet is any finite set of symbols.
String: A string over an alphabet
Subtlety: 0 as a string, 0 as a symbol look the same.
A language is a subset of
Formal Definition
A formalism for defining languages, consisting of:
- A finite set of states (
Q , typically). - An input alphabet (
Σ , typically). - A transition function (
δ , typically).- Takes two arguments: a state and an input symbol.
-
δ(q,a) = the state that the DFA goes to when it is in stateq and inputa is received. -
Note:
-
δ is a total function - always a next state (add a dead state if no transition)
-
- A start state (
q0 , inQ , typically). - A set of final states (
F⊆Q , typically).- ”Final” and “accepting” are synonyms.
Graph Representation of DFA’ s
Example: Strings With NO“11”
- Nodes = states.
- Arcs represent transition function.
- Arc from state p to state q labeled by all those input symbols that have transitions from p to q.
- Arrow labeled “Start” to the start state.
- Final states indicated by double circles.
Transition Table of DFA’ s
Inductive Definition of Extended δ
Induction on length of string.
Basis: δ(q, ε) = q
Induction: δ(q, wa) = δ(δ(q, w), a)
Remember: w is a string; a is an input symbol, by convention.
We don’ t distinguish between the given delta and the extended delta or delta-hat.
Language of DFA
For a DFA A, L(A) is the set of strings labeling paths from the start state to a final state.
L(A) = the set of strings w such that
Proofs of Set Equivalence
TODO:
Important trick: Expand the inductive hypothesis to be more detailed than the statement you are trying to prove.
Regular Language
A language L is regular if it is the language accepted by some DFA.
Theorem: The reverse of a regular language is also regular.
Nondeterministic Finite Automata
Nondeterministic
A nondeterministic finite automaton has the ability to be in several states at once.
Transitions from a state on an input symbol can be to any set of states.
Start in one start state.
Accept if any sequence of choices leads to a final state.
Formal NFA
- A finite set of states, typically
Q . - An input alphabet, typically
Σ . - A transition function, typically δ.
- δ(q, a) is a set of states. Extend to strings as follows:
-
Basis:
δ(q,ε)=q -
Induction:
δ(q,wa) = the union over all states p inδ(q,w) ofδ(p,a)
-
Basis:
- δ(q, a) is a set of states. Extend to strings as follows:
- A start state in
Q , typicallyq0 . - A set of final states
F⊆Q .
Language of an NFA
A string w is accepted by an NFA if
The language of the NFA is the set of strings it accepts.
Equivalence of DFA’s, NFA’s
A DFA can be turned into an NFA that accepts the same language.
IfδD(q,a)=p , let the NFA haveδN(q,a)=p .-
Subset Construction
- Given an NFA with states
Q , inputsΣ , transition functionδN , state stateq0 , and final statesF , construct equivalent DFA with:- States
2Q (Set of subsets ofQ ). - Inputs
Σ . -
δD(q1,…,qk,a) is the union over alli=1,…,k ofδN(qi,a) . - Start state
{q0} . - Final states = all those with a member of
F .
- States
- Given an NFA with states
NFA’s With ε-Transitions
Closure of States
CL(q) = set of states you can reach from state q following only arcs labeled ε.
Closure of a set of states = union of the closure of each state.
Language of an ε-NFA
TODO:
Language of an ε-NFA is the set of strings w such that
Equivalence of NFA, ε-NFA
Every NFA is an ε-NFA.
- It just has no transitions on ε.
Converse requires us to take an ε-NFA and construct an NFA that accepts the same language.
-
We do so by combining ε-transitions with the next transition on a real input.
-
Transition function
δN(q,a) as follows:- Let
S=CL(q) . -
δN(q,a) is the union over allp inS ofδE(p,a) .
- Let
F′ = the set of statesq such thatCL(q) contains a state ofF .
-
Regular Expressions
Operations on Languages
- union,
- concatenation
- Kleene star
Equivalence of RE’s and Finite Automata
Converting a RE to an ε-NFA
Proof is an induction on the number of operators (+, concatenation, *) in the RE.
We always construct an automaton of a special form.
DFA-to-RE
k-paths
A k-path is a path through the graph of the DFA that goes though no state numbered higher than k.
- Endpoints are not restricted; they can be any state.
- n-paths are unrestricted.
- RE is the union of RE’s for the n-paths from the start state to each final state.
Algebraic Laws for RE’s
Union and concatenation behave sort of like addition and multiplication.
Exception: Concatenation is not commutative.
Decision Properties of Regular Languages
The Membership Problem
Assume L is represented by a DFA A.
Simulate the action of A on the sequence of input symbols forming w.
The Emptiness Problem
Given a regular language, does the language contain any string at all?
Assume representation is DFA.
Compute the set of states reachable from the start state.
If at least one final state is reachable, then yes, else no.
The Infiniteness Problem
Start with a DFA for the language.
Key idea: if the DFA has n states, and the language contains any string of length n or more, then the language is infinite.
Otherwise, the language is surely finite.
- Limited to strings of length strictly less than n
Second key idea: if there is a string of length > n (number of states) in L, then there is a string of length between n and 2n-1.
The Pumping Lemma
Decision Property: Equivalence
product DFA
Let these DFA’ s have sets of states Q and R, respectively.
- Product DFA has set of states Q
× R, i.e., pairs [q, r] with q in Q, r in R. - Start state =
[q0,r0] (the start states of the DFA’s for L, M). - Transitions:
δ([q,r],a)=[δL(q,a),δM(r,a)] -
δL ,δM are the transition functions for the DFA’s of L, M. - That is, we simulate the two DFA’s in the two state components of the product DFA.
-
Decision Property: Containment
Product DFA
Define the final states [q, r] of the product so its language is empty iff L
q is final; r is not.
Constructing the Minimum-State DFA
State Minimization
Algorithm is a recursion on the length of the shortest distinguishing string.
- Basis: Mark pairs with exactly one final state.
-
Induction: mark
[q,r] if for some input symbol a,[δ(q,a),δ(r,a)] is marked.
After no more marks are possible, the unmarked pairs are equivalent and can be merged into one state.
Transitivity of Indistinguishable
If state p is indistinguishable from q, and q is indistinguishable from r, then p is indistinguishable from r.
Eliminating Unreachable States
Unfortunately, combining indistinguishable states could leave us with unreachable states in the “minimum-state” DFA.
Thus, before or after, remove states that are not reachable from the start state.
Proof: No Unrelated, Smaller DFA
Closure Properties of Regular Languages
Closure Under Union
If L and M are regular languages, so is L M.
Proof:
Let L and M be the languages of regular expressions R and S, respectively.
Then R+S is a regular expression whose language is L M.
Closure Under Concatenation and Kleene Closure
Same idea:
RS is a regular expression whose language is LM.
R* is a regular expression whose language is L*.
Closure Under Intersection
If L and M are regular languages, then so is L M.
Proof:
Let A and B be DFA’ s whose languages are L and M, respectively.
Construct C, the product automaton of A and B.
Make the final states of C be the pairs consisting of final states of both A and B.
Closure Under Difference
If L and M are regular languages, then so is L – M = strings in L but not M.
Proof:
Let A and B be DFA’ s whose languages are L and M, respectively.
Construct C, the product automaton of A and B.
Final states of C are the pairs whose A-state is final but whose B-state is not.
Closure Under complement
The complement of a language L (with respect to an alphabet Σ such that Σ* contains L) is Σ* – L.
Proof:
Since Σ* is surely regular, the complement of a regular language is always regular.
Closure Under Reversal
Given language L, LR is the set of strings whose reversal is in L.
Example: L = {0, 01, 100}; LR = {0, 10, 001}.
Proof:
Let E be a regular expression for L.
We show how to reverse E, to provide a regular expression ER for LR.
Basis: If E is a symbol a, ε, or ∅, then ER = E.
Induction: If E is
- F+G, then ER = FR + GR.
- FG, then ER = GRFR
- F*, then ER = (FR)*.
Closure Under Homomorphism
A homomorphism on an alphabet is a function that gives a string for each symbol in that alphabet.
Example: h(0) = ab; h(1) = ε.
Extend to strings by h(a1…an) = h(a1)…h(an).
Example: h(01010) = ababab.
If L is a regular language, and h is a homomorphism on its alphabet, then h(L) = {h(w) | w is in L} is also a regular language.
Proof:
Let E be a regular expression for L.
Apply h to each symbol in E.
Language of resulting RE is h(L).
Example:
Let h(0) = ab; h(1) = ε.
Let L be the language of regular expression 01* + 10*.
Then h(L) is the language of regular expression abε* + ε(ab)*.
abε* + ε(ab)* can be simplified.
ε* = ε, so abε* = abε.
ε is the identity under concatenation.
That is, εE = Eε = E for any RE E.
Thus, abε + ε(ab)* = ab + (ab)*.
Finally, L(ab) is contained in L((ab)
Closure Under Inverse Homomorphism
Inverse Homomorphism
Let h be a homomorphism and L a language whose alphabet is the output language of h.
h−1(L) = {w | h(w) is in L}.
Example:
Let h(0) = ab; h(1) = ε.
Let L = {abab, baba}.
Proof:
An induction on |w| (omitted) shows that
Thus, B accepts w if and only if A accepts