Notes on Automata - Week 2

In week 2 of Automata Course, regular expressions are introduced, along with some applications of regular expressions. Not surprisely, the regular expressions describe exactly the regular languages, the same as Finite Automatas. Also, the properties of regular languages are introduced, including decisions properties and closure properties. The latter indicates that the regular class is closed under operations such as union, intersection, difference, etc, while a decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., DFA) and tells whether or not some property holds. In particular, the pumping lemma is useful to tell whether a language is regular or not.

Week 2 Snippets – Regular Expression and Properties of Regular Languages

Regular Expressions

Regular Expressions describe languages by an algebra. If $E$ is a regular expression, then $L(E)$ is the language it defines. The RE’s and their languages are defined recursively.
There are three basic operations performed on languages, union, concatenation and Kleene star. For language $L$ and $L’$, the union is the usual thing, since languages are sets. And the concatenation contains every string contanated by strings from $L$ and $L’$, which means $LL’ = \{wx| w \in L, , x \in L’ \}$. The Kleene star of $L$ concatenates zero or more strings from $L$, or formally $L^* = \{\epsilon\} \cup L \cup LL \cup \ldots$. The RE’s definition is as follows:

  • Basis
    • If $a$ is any symbol, than $\bf{a}$ is a RE, and $L(\bf{a}) = \{a\}$
    • $\bf{\epsilon}$ is a RE, and $L(\bf{\epsilon}) = \{\bf{\epsilon}\}$
    • $\phi$ is a RE, and $L(\Phi) = \phi$
  • Induction
    • If $E_1$ and $E_2$ is RE, then $E_1 + E_2$ is a RE, and $L(E_1+E_2) = L(E_1)\cup L(E_2)$
    • If $E_1$ and $E_2$ is RE, then $E_1E_2$ is a RE, and $L(E_1E_2) = L(E_1)L(E_2)$
    • If $E$ is a RE, then $E^*$ is a RE, and $L(E^*) = (L(E))^*$

Actually, the all RE’s defined on an alphabet consist an algebra. The plus operation is union, and multiply operation is concatenation. The $\phi$ is identity for plus, $\epsilon$ is identity for multiply, and $\phi$ is annihilator for multiply.
It can be proved that RE and Finite Automata are equivalent. For each RE, we construct a $\epsilon$-NFA that accepts the same language as below.

Also a finite automata (DFA) can be converted to an equivalent RE.
Proof Sketch: Suppose states of the DFA are named $1, 2, \ldots, n$, we define a $k$-path as a path through the graph of the DFA that goes through no state numbered higher than $k$, then the union of RE’s for the $n$-paths from the start state to each final state is a RE which is equivalent to the DFA.

Decision Properties of Regular Languages

A language class is a set of languages, a decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., DFA) and tells whether or not some property holds. We might want a “smallest” representation for a language, e.g., a minimum-state DFA or a shortest RE. Then we have to be able to decide “Are these two languages the same?”. Some examples are listed below,

  • The Membership Problem: “Is string $w$ in regular language $L$?”. Suppose $L$ is represented by a DFA $A$, then we can simulate the action of $A$ on the sequence of input symbols forming $w$.
  • The Emptiness Problem: “Given a regular language, does the language contain any string at all?” Assume representation is DFA, then the set of states reachable from the start state can be computed, if at least one final state is reachable, then yes, else no.
  • The Infiniteness Problem: “Is a given regular language infinite?” The conclusion is that we can test for membership all strings of length between $n$ and $2n-1$ ($n$ is number of states), if any are accepted, then infinite, else finite. This follows from the key idea that if the language contains any string of length $n$ or more, then the language is infinite, otherwise the language is finite and that if there is a string of length $\geq n$ in $L$ (the DFA representation), then there is a string of length between $n$ and $2n-1$. The former is intuitive, since there must be a state that appear twice on the path labeled $w$ from start state to final state, where $w$ is a string with length $\geq n$. The latter follows from that we can shorten the string by remove the cycles on the path.
    However, it’s terrible to test all $n^{|alphabet|}$ strings. As an alternative, we can find cycles between the start and a final state.
    • Eliminate states not reachable from the start state
    • Eliminate states that do not reach a final state
    • Test if the remaining transition grph has any cycles.
  • Equivalence: “Given regular languages $L$ and $M$, is $L=M$?” Algorithm involves constructing the product DFA from DFA of $L$ and $M$. Let these DFA’s have sets of states $Q$ and $R$, respectively, product DFA has set of states $Q\times R$, start state $[q_0, r_0]$, and transitions $\delta([q, r], a)=[\delta_L(q, a), \delta_M(r, a)]$. The final states of the product DFA are those states such that exactly one of $q$ and $r$ is a final state of its own DFA. Thus, the product DFA accepts $w$ iff $w$ is in exactly one of $L$ and $M$ and $L=M$ if and only if the product automaton’s language is empty.
  • Containment: “Given regular languages $L$ and $M$, is $L \subset M$?” The algorithm also uses the product DFA, except that the $[q, r]$ is a final state iff $q$ is final and $r$ is not.

The Pumping Lemma

In the infiniteness problem, we have, almost accidentally, proved a statement that is quite useful for showing certain languages are not regular, which is called as the pumping lemma for regular language.

For every regular language $L$, there is an integer $n$, such that for every string $w$ in $L$ of length $\geq n$, we can write write $w=xyz$ such that $|xy| \leq n, |y| > 0$ and for all $i\geq 0, xy^iz$ is in $L$.

Examples: $\{0^k1^k|\, k\geq 1\}$ is not a regular language.
Proof Sketch:Suppose it were, then there would be an associated $n$ for pumping lemma, let $w=0^n1^n$, we can write $w=xyz$, where $x$ and $y$ consists of 0’s, and $y\neq \epsilon$, but then $xyyz$ would be in $L$, and this string contains more 0’s than 1’s.

The Minimum-State DFA for a Regular Language

Naturally, we may want to construct “smallest” representation for a regular language, either with minimum-state DFA or shortest RE. In principle, since we can test equivalence of DFA, we can find the DFA with the fewest states that accepts the same regular language. That’s a rather terrible algorithm. The efficient algorithm is listed as follows,

  • Construct a table with all pairs of states, mark pairs with exactly one final state,
  • Mark $[q, r]$ if for some input symbol $a$, $[\delta(q, a), \delta(r, a)]$ is marked,
  • After no more marks are possible, the unmarked pairs are equivalent and can be merged into one state,
  • Before or after the previous steps, remove states that are not reachable from the start state.

The detailed proof can be found in the slides. The intuition is that we can divide the states of DFA into equivalent components, which is determined by the indistinguishable relation, and then replace each of these components with a single state. State $p$ is indistinguishable from state $q$ if the following two properties hold,

  • Both of $p$ and $q$ are final states or both of $p$ and $q$ are not final states
  • For any symbol $a$, $p$ and $q$ transite to indistinguishable states.

The algorithm recursively marks distinguishable states, and replaces indistinguishable states with a single state. It can be proved that resulting DFA has fewest states. Suppose $A$ is the resulting DFA, and there exists another smaller equivalent DFA $B$, consider an automaton with the states of $A$ and $B$ combined. Then every state of $A$ is indistinguishable from some state of $B$, which means $B$ has at least as many states as $A$.

Closure Properties of Regular Languages

The regular language class is closed under union, concatenation, Kleene star, intersection, difference, complementation, reversal, homomorphism and inverse homomorphism. In other words, if $L$ and $M$ are regular languages, so is $L\cup M$, $L\cap M$, $L-M$, $L^*$, $LM$, $\Sigma^* - L$.

  • Reversal: The reversal of $L$ consists of reversal of every string in $L$. The proof is completed by provide a regular expression for $L^R$, which is got by reversing the regular expression of $L$.
  • Homomorphism: A homomorphism $h$ on an alphabet is a function that gives a string for each symbol in that alphabet. The homomorphism can be extended to map on strings. For a regular language $L$, $h(L)=\{h(w)| \, w \in L \}$ is also a regular language. The proof also follows from homomorphism of regular expression.
  • Inverse homomorphism: Let $h$ a homomorphism and $L$ is a language whose alphabet is the output language of $h$, $h^{-1} = \{w| \, h(w) \in L\}$ is also a regular language.