IndProp: Inductively Defined Propositionspart 2

Case Study: Regular Expressions

Many of the examples above were simple and the ev property even a bit artificial. To give a better sense of the power of inductively defined propositions, we now show how to use them to model a classic concept in computer science: regular expressions.

Regular expressions are a natural language for describing sets of strings. Their syntax is defined as follows:

Arguments EmptySet {T}.
Arguments EmptyStr {T}.
Arguments Char {T} _.
Arguments App {T} _ _.
Arguments Union {T} _ _.
Arguments Star {T} _.

Note that this definition is polymorphic: Regular expressions in reg_exp T describe strings with characters drawn from T -- which in this exercise we represent as lists with elements from T.

(*(Technical aside: We depart slightly from standard practice in
    that we do not require the type T to be finite.  This results in
    a somewhat different theory of regular expressions, but the
    difference is not significant for present purposes.) *)

We connect regular expressions and strings by defining when a regular expression matches some string. Informally this looks as follows:

The expression EmptySet does not match any string.
The expression EmptyStr matches the empty string [].
The expression Char x matches the one-character string [x].
If re₁ matches s₁, and re₂ matches s₂, then App re₁ re₂ matches s₁ ++ s₂.
If at least one of re₁ and re₂ matches s, then Union re₁ re₂ matches s.
Finally, if we can write some string s as the concatenation of a sequence of strings s = s_1 ++ ... ++ s_k, and the expression re matches each one of the strings s_i, then Star re matches s.

In particular, the sequence of strings may be empty, so Star re always matches the empty string [] no matter what re is.

We can easily translate this intuition into a set of rules, where we write s =~ re for s matches regular expression re:

	(MEmpty)

[] =~ EmptyStr

	(MChar)

[x] =~ (Char x)

s₁ =~ re₁ s₂ =~ re₂	(MApp)

(s₁ ++ s₂) =~ (App re₁ re₂)

s₁ =~ re₁	(MUnionL)

s₁ =~ (Union re₁ re₂)

s₂ =~ re₂	(MUnionR)

s₂ =~ (Union re₁ re₂)

	(MStar0)

[] =~ (Star re)

s₁ =~ re
s₂ =~ (Star re)	(MStarApp)

(s₁ ++ s₂) =~ (Star re)

This directly corresponds to the following Inductive definition. We use the notation s =~ re in place of exp_match s re. (By "reserving" the notation before defining the Inductive, we can use it in the definition.)

Reserved Notation "s =~ re" (at level 80).

Inductive exp_match {T} : list T → reg_exp T → Prop :=
  | MEmpty : [] =~ EmptyStr
  | MChar x : [x ] =~ (Char x )
  | MApp s₁ re₁ s₂ re₂
             (H₁ : s₁ =~ re₁)
             (H₂ : s₂ =~ re₂)
           : (s₁ ++ s₂ ) =~ (App re₁ re₂ )
  | MUnionL s₁ re₁ re₂
                (H₁ : s₁ =~ re₁)
              : s₁ =~ (Union re₁ re₂ )
  | MUnionR s₂ re₁ re₂
                (H₂ : s₂ =~ re₂)
              : s₂ =~ (Union re₁ re₂ )
  | MStar0 re : [] =~ (Star re )
  | MStarApp s₁ s₂ re
                 (H₁ : s₁ =~ re)
                 (H₂ : s₂ =~ (Star re ))
               : (s₁ ++ s₂ ) =~ (Star re )

  where "s =~ re" := (exp_match s re).

Notice that these rules are not quite the same as the intuition that we gave at the beginning of the section. First, we don't need to include a rule explicitly stating that no string matches EmptySet; we just don't happen to include any rule that would have the effect of some string matching EmptySet. (Indeed, the syntax of inductive definitions doesn't even allow us to give such a "negative rule.")

Second, the intuition we gave for Union and Star correspond to two constructors each: MUnionL / MUnionR, and MStar0 / MStarApp. The result is logically equivalent to the original intuition but more convenient to use in Coq, since the recursive occurrences of exp_match are given as direct arguments to the constructors, making it easier to perform induction on evidence. (The exp_match_ex₁ and exp_match_ex₂ exercises below ask you to prove that the constructors given in the inductive declaration and the ones that would arise from a more literal transcription of the intuition is indeed equivalent.)

Let's illustrate these rules with a few examples.

Example reg_exp_ex₁ : [1] =~ Char 1.

Proof.
apply MChar.
Qed.

Example reg_exp_ex₂ : [1; 2] =~ App (Char 1) (Char 2).

Proof.
  apply (MApp [1]).
  - apply MChar.
  - apply MChar.
Qed.

(Notice how the last example applies MApp to the string [1] directly. Since the goal mentions [1; 2] instead of [1] ++ [2], Coq wouldn't be able to figure out how to split the string on its own.)

Using inversion, we can also show that certain strings do not match a regular expression:

Example reg_exp_ex₃ : ¬ ([1; 2] =~ Char 1).

Proof.
intros H. inversion H.
Qed.

We can define helper functions for writing down regular expressions. The reg_exp_of_list function constructs a regular expression that matches exactly the string that it receives as an argument:

Fixpoint reg_exp_of_list {T} (l : list T) :=
  match l with
  | [] ⇒ EmptyStr
  | x :: l' ⇒ App (Char x) (reg_exp_of_list l')
  end.

Example reg_exp_ex₄ : [1; 2; 3] =~ reg_exp_of_list [1; 2; 3].

Proof.
  simpl. apply (MApp [1]).
  { apply MChar. }
  apply (MApp [2]).
  { apply MChar. }
  apply (MApp [3]).
  { apply MChar. }
  apply MEmpty.
Qed.

We can also prove general facts about exp_match. For instance, the following lemma shows that every string s that matches re also matches Star re.

Lemma MStar1 :
  ∀ T s (re : reg_exp T) ,
    s =~ re →
    s =~ Star re.

Proof.
  intros T s re H.
  rewrite <- (app_nil_r _ s).
  apply MStarApp.
  - apply H.
  - apply MStar0.
Qed.

(Note the use of app_nil_r to change the goal of the theorem to exactly the same shape expected by MStarApp.)

Exercise: 3 stars, standard (exp_match_ex₁)

The following lemmas show that the intuition about matching given at the beginning of the chapter can be obtained from the formal inductive definition.

Lemma EmptySet_is_empty : ∀ T (s : list T),
¬ (s =~ EmptySet ).
Proof.
(* FILL IN HERE *) Admitted.

Lemma MUnion' : ∀ T (s : list T) (re₁ re₂ : reg_exp T),
  s =~ re₁ ∨ s =~ re₂ →
  s =~ Union re₁ re₂.
Proof.
  (* FILL IN HERE *) Admitted.

The next lemma is stated in terms of the fold function from the Poly chapter: If ss : list (list T) represents a sequence of strings s₁, ..., sn, then fold app ss [] is the result of concatenating them all together.

Lemma MStar' : ∀ T (ss : list (list T)) (re : reg_exp T),
  (∀ s, In s ss → s =~ re ) →
  fold app ss [] =~ Star re.
Proof.
  (* FILL IN HERE *) Admitted.
☐

Exercise: 2 stars, standard, optional (EmptyStr_not_needed)

(* It turns out that the EmptyStr constructor is actually not
needed, since the regular expression matching the empty string can
also be defined from Star and EmptySet: *)
Definition EmptyStr' {T:Type} := @Star T (EmptySet).

(* State and prove that this EmptyStr' definition matches exactly
the same strings as the EmptyStr constructor. *)

(* FILL IN HERE *)
☐

Since the definition of exp_match has a recursive structure, we might expect that proofs involving regular expressions will often require induction on evidence.

For example, suppose we want to prove the following intuitive result: If a regular expression re matches some string s, then all elements of s must occur as character literals somewhere in re.

To state this as a theorem, we first define a function re_chars that lists all characters that occur in a regular expression:

Fixpoint re_chars {T} (re : reg_exp T) : list T :=
  match re with
  | EmptySet ⇒ []
  | EmptyStr ⇒ []
  | Char x ⇒ [x]
  | App re₁ re₂ ⇒ re_chars re₁ ++ re_chars re₂
  | Union re₁ re₂ ⇒ re_chars re₁ ++ re_chars re₂
  | Star re ⇒ re_chars re
  end.

The main theorem:

Theorem in_re_match : ∀ T (s : list T) (re : reg_exp T) (x : T),
  s =~ re →
  In x s →
  In x (re_chars re).
Proof.
  intros T s re x Hmatch Hin.
  induction Hmatch
    as [| x'
        | s₁ re₁ s₂ re₂ Hmatch1 IH₁ Hmatch2 IH₂
        | s₁ re₁ re₂ Hmatch IH | s₂ re₁ re₂ Hmatch IH
        | re | s₁ s₂ re Hmatch1 IH₁ Hmatch2 IH₂].
  (* WORKED IN CLASS *)
  - (* MEmpty *)
    simpl in Hin. destruct Hin.
  - (* MChar *)
    simpl. simpl in Hin.
    apply Hin.
  - (* MApp *)
    simpl.

Something interesting happens in the MApp case. We obtain two induction hypotheses: One that applies when x occurs in s₁ (which matches re₁), and a second one that applies when x occurs in s₂ (which matches re₂).

    rewrite In_app_iff in ×.
    destruct Hin as [Hin | Hin].
    + (* In x s₁ *)
      left. apply (IH₁ Hin).
    + (* In x s₂ *)
      right. apply (IH₂ Hin).
  - (* MUnionL *)
    simpl. rewrite In_app_iff.
    left. apply (IH Hin).
  - (* MUnionR *)
    simpl. rewrite In_app_iff.
    right. apply (IH Hin).
  - (* MStar0 *)
    destruct Hin.
  - (* MStarApp *)
    simpl.

Here again we get two induction hypotheses, and they illustrate why we need induction on evidence for exp_match, rather than induction on the regular expression re: The latter would only provide an induction hypothesis for strings that match re, which would not allow us to reason about the case In x s₂.

    rewrite In_app_iff in Hin.
    destruct Hin as [Hin | Hin].
    + (* In x s₁ *)
      apply (IH₁ Hin).
    + (* In x s₂ *)
      apply (IH₂ Hin).
Qed.

Exercise: 4 stars, standard, optional (re_not_empty)

Write a recursive function re_not_empty that tests whether a regular expression matches some string. Prove that your function is correct.

Fixpoint re_not_empty {T : Type} (re : reg_exp T) : bool
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.

Lemma re_not_empty_correct : ∀ T (re : reg_exp T),
(∃ s , s =~ re ) ↔ re_not_empty re = true.
Proof.
(* FILL IN HERE *) Admitted.
☐

The remember Tactic

One potentially confusing feature of the induction tactic is that it will let you try to perform an induction over a term that isn't sufficiently general. The effect of this is to lose information (much as destruct without an eqn: clause can do), and leave you unable to complete the proof. Here's an example:

Lemma star_app: ∀ T (s₁ s₂ : list T) (re : reg_exp T),
  s₁ =~ Star re →
  s₂ =~ Star re →
  s₁ ++ s₂ =~ Star re.
Proof.
  intros T s₁ s₂ re H₁.

Now, just doing an inversion on H₁ won't get us very far in the recursive cases. (Try it!). So we need induction (on evidence!). Here is a naive first attempt. (We can begin by generalizing s₂, since it's pretty clear that we are going to have to walk over both s₁ and s₂ in parallel.)

But now, although we get seven cases (as we would expect from the definition of exp_match), we have lost a very important bit of information from H₁: the fact that s₁ matched something of the form Star re. This means that we have to give proofs for all seven constructors of this definition, even though all but two of them (MStar0 and MStarApp) are contradictory. We can still get the proof to go through for a few constructors, such as MEmpty...

- (* MEmpty *)
simpl. intros s₂ H. apply H.

... but most cases get stuck. For MChar, for instance, we must show
s₂ =~ Char x' →
x'::s₂ =~ Char x' which is clearly impossible.

- (* MChar. *) intros s₂ H. simpl. (* Stuck... *)
Abort.

The problem here is that induction over a Prop hypothesis only works properly with hypotheses that are "completely general," i.e., ones in which all the arguments are variables, as opposed to more complex expressions like Star re.

(In this respect, induction on evidence behaves more like destruct-without-eqn: than like inversion.)

A possible, but awkward, way to solve this problem is "manually generalizing" over the problematic expressions by adding explicit equality hypotheses to the lemma:

Lemma star_app: ∀ T (s₁ s₂ : list T) (re re' : reg_exp T),
  re' = Star re →
  s₁ =~ re' →
  s₂ =~ Star re →
  s₁ ++ s₂ =~ Star re.

We can now proceed by performing induction over evidence directly, because the argument to the first hypothesis is sufficiently general, which means that we can discharge most cases by inverting the re' = Star re equality in the context. This works, but it makes the statement of the lemma a bit ugly. Fortunately, there is a better way...

Abort.

The tactic remember e as x eqn:Eq causes Coq to (1) replace all occurrences of the expression e by the variable x, and (2) add an equation Eq : x = e to the context. Here's how we can use it to show the above result:

Lemma star_app: ∀ T (s₁ s₂ : list T) (re : reg_exp T),
  s₁ =~ Star re →
  s₂ =~ Star re →
  s₁ ++ s₂ =~ Star re.
Proof.
  intros T s₁ s₂ re H₁.
  remember (Star re) as re' eqn:Eq.

We now have Eq : re' = Star re.

The Eq is contradictory in most cases, allowing us to conclude immediately.

  - (* MEmpty *) discriminate.
  - (* MChar *) discriminate.
  - (* MApp *) discriminate.
  - (* MUnionL *) discriminate.
  - (* MUnionR *) discriminate.

The interesting cases are those that correspond to Star.

- (* MStar0 *)
intros s H. apply H.

  - (* MStarApp *)
    injection Eq as Eq'.
    intros s₂ H₁. rewrite <- app_assoc.
    apply MStarApp.
    + apply Hmatch1.
    + apply IH₂.
      × rewrite Eq'. reflexivity.
      × apply H₁.

Note that the induction hypothesis IH₂ on the MStarApp case mentions an additional premise Star re'' = Star re, which results from the equality generated by remember.

Qed.

Exercise: 4 stars, standard (exp_match_ex₂)

The MStar'' lemma below (combined with its converse, the MStar' exercise above), shows that our definition of exp_match for Star is equivalent to the informal one given previously.

Lemma MStar'' : ∀ T (s : list T) (re : reg_exp T),
  s =~ Star re →
  ∃ ss : list (list T),
    s = fold app ss []
    ∧ ∀ s', In s' ss → s' =~ re.
Proof.
  (* FILL IN HERE *) Admitted.
☐

Exercise: 5 stars, advanced, optional (weak_pumping)

One of the first really interesting theorems in the theory of regular expressions is the so-called pumping lemma, which states, informally, that any sufficiently long string s matching a regular expression re can be "pumped" by repeating some middle section of s an arbitrary number of times to produce a new string also matching re. (For the sake of simplicity in this exercise, we consider a slightly weaker theorem than is usually stated in courses on automata theory -- hence the name weak_pumping.)

To get started, we need to define "sufficiently long." Since we are working in a constructive logic, we actually need to be able to calculate, for each regular expression re, the minimum length for strings s to guarantee "pumpability."

Module Pumping.

Fixpoint pumping_constant {T} (re : reg_exp T) : nat :=
  match re with
  | EmptySet ⇒ 1
  | EmptyStr ⇒ 1
  | Char _ ⇒ 2
  | App re₁ re₂ ⇒
      pumping_constant re₁ + pumping_constant re₂
  | Union re₁ re₂ ⇒
      pumping_constant re₁ + pumping_constant re₂
  | Star r ⇒ pumping_constant r
  end.

You may find these lemmas about the pumping constant useful when proving the pumping lemma below.

Lemma pumping_constant_ge_1 :
∀ T (re : reg_exp T),
pumping_constant re ≥ 1.

Proof.
  intros T re. induction re.
  - (* EmptySet *)
    apply le_n.
  - (* EmptyStr *)
    apply le_n.
  - (* Char *)
    apply le_S. apply le_n.
  - (* App *)
    simpl.
    apply le_trans with (n:=pumping_constant re₁).
    apply IHre1. apply le_plus_l.
  - (* Union *)
    simpl.
    apply le_trans with (n:=pumping_constant re₁).
    apply IHre1. apply le_plus_l.
  - (* Star *)
    simpl. apply IHre.
Qed.

Lemma pumping_constant_0_false :
∀ T (re : reg_exp T),
pumping_constant re = 0 → False.

Proof.
  intros T re H.
  assert (Hp₁ : pumping_constant re ≥ 1).
  { apply pumping_constant_ge_1. }
  inversion Hp₁ as [Hp₁'| p Hp₁' Hp₁''].
  - rewrite H in Hp₁'. discriminate Hp₁'.
  - rewrite H in Hp₁''. discriminate Hp₁''.
Qed.

Next, it is useful to define an auxiliary function that repeats a string (appends it to itself) some number of times.

Fixpoint napp {T} (n : nat) (l : list T) : list T :=
  match n with
  | 0 ⇒ []
  | S n' ⇒ l ++ napp n' l
  end.

This auxiliary lemma might also be useful in your proof of the pumping lemma.

Lemma napp_plus: ∀ T (n m : nat) (l : list T),
napp (n + m) l = napp n l ++ napp m l.

Proof.
  intros T n m l.
  induction n as [|n IHn].
  - reflexivity.
  - simpl. rewrite IHn, app_assoc. reflexivity.
Qed.

Lemma napp_star :
  ∀ T m s₁ s₂ (re : reg_exp T),
    s₁ =~ re → s₂ =~ Star re →
    napp m s₁ ++ s₂ =~ Star re.

Proof.
  intros T m s₁ s₂ re Hs₁ Hs₂.
  induction m.
  - simpl. apply Hs₂.
  - simpl. rewrite <- app_assoc.
    apply MStarApp.
    + apply Hs₁.
    + apply IHm.
Qed.

The (weak) pumping lemma itself says that, if s =~ re and if the length of s is at least the pumping constant of re, then s can be split into three substrings s₁ ++ s₂ ++ s₃ in such a way that s₂ can be repeated any number of times and the result, when combined with s₁ and s₃, will still match re. Since s₂ is also guaranteed not to be the empty string, this gives us a (constructive!) way to generate strings matching re that are as long as we like.

Lemma weak_pumping : ∀ T (re : reg_exp T) s,
  s =~ re →
  pumping_constant re ≤ length s →
  ∃ s₁ s₂ s₃ ,
    s = s₁ ++ s₂ ++ s₃ ∧
    s₂ ≠ [] ∧
    ∀ m, s₁ ++ napp m s₂ ++ s₃ =~ re.

Complete the proof below. Several of the lemmas about le that were in an optional exercise earlier in this chapter may also be useful.

Exercise: 5 stars, advanced, optional (pumping)

Now here is the usual version of the pumping lemma. In addition to requiring that s₂ ≠ [], it also requires that length s₁ + length s₂ ≤ pumping_constant re.

Lemma pumping : ∀ T (re : reg_exp T) s,
  s =~ re →
  pumping_constant re ≤ length s →
  ∃ s₁ s₂ s₃ ,
    s = s₁ ++ s₂ ++ s₃ ∧
    s₂ ≠ [] ∧
    length s₁ + length s₂ ≤ pumping_constant re ∧
    ∀ m, s₁ ++ napp m s₂ ++ s₃ =~ re.

You may want to copy your proof of weak_pumping below.

End Pumping.
☐

Case Study: Improving Reflection

We've seen in the Logic chapter that we often need to relate boolean computations to statements in Prop. But performing this conversion as we did there can result in tedious proof scripts. Consider the proof of the following theorem:

Theorem filter_not_empty_In : ∀ n l,
filter (fun x ⇒ n =? x) l ≠ [] → In n l.

Proof.
  intros n l. induction l as [|m l' IHl'].
  - (* l = nil *)
    simpl. intros H. apply H. reflexivity.
  - (* l = m :: l' *)
    simpl. destruct (n =? m) eqn:H.
    + (* n =? m = true *)
      intros _. rewrite eqb_eq in H. rewrite H.
      left. reflexivity.
    + (* n =? m = false *)
      intros H'. right. apply IHl'. apply H'.
Qed.

In the first branch after destruct, we explicitly apply the eqb_eq lemma to the equation generated by destructing n =? m, to convert the assumption n =? m = true into the assumption n = m; then we had to rewrite using this assumption to complete the case.

We can streamline this sort of reasoning by defining an inductive proposition that yields a better case-analysis principle for n =? m. Instead of generating the assumption (n =? m) = true, which usually requires some massaging before we can use it, this principle gives us right away the assumption we really need: n = m.

Following the terminology introduced in Logic, we call this the "reflection principle for equality on numbers," and we say that the boolean n =? m is reflected in the proposition n = m.

Inductive reflect (P : Prop) : bool → Prop :=
| ReflectT (H : P) : reflect P true
| ReflectF (H : ¬ P) : reflect P false.

The reflect property takes two arguments: a proposition P and a boolean b. It states that the property P reflects (intuitively, is equivalent to) the boolean b: that is, P holds if and only if b = true.

To see this, notice that, by definition, the only way we can produce evidence for reflect P true is by showing P and then using the ReflectT constructor. If we invert this statement, this means that we can extract evidence for P from a proof of reflect P true.

Similarly, the only way to show reflect P false is by tagging evidence for ¬ P with the ReflectF constructor.

To put this observation to work, we first prove that the statements P ↔ b = true and reflect P b are indeed equivalent. First, the left-to-right implication:

Theorem iff_reflect : ∀ P b, (P ↔ b = true ) → reflect P b.
Proof.
  (* WORKED IN CLASS *)
  intros P b H. destruct b eqn:Eb.
  - apply ReflectT. rewrite H. reflexivity.
  - apply ReflectF. rewrite H. intros H'. discriminate.
Qed.

Now you prove the right-to-left implication:

Exercise: 2 stars, standard, optional (reflect_iff)

Theorem reflect_iff : ∀ P b, reflect P b → (P ↔ b = true ).
Proof.
(* FILL IN HERE *) Admitted.
☐

We can think of reflect as a kind of variant of the usual "if and only if" connective; the advantage of reflect is that, by destructing a hypothesis or lemma of the form reflect P b, we can perform case analysis on b while at the same time generating appropriate hypothesis in the two branches (P in the first subgoal and ¬ P in the second).

Let's use reflect to produce a smoother proof of filter_not_empty_In.

We begin by recasting the eqb_eq lemma in terms of reflect:

Lemma eqbP : ∀ n m, reflect (n = m) (n =? m).
Proof.
intros n m. apply iff_reflect. rewrite eqb_eq. reflexivity.
Qed.

The proof of filter_not_empty_In now goes as follows. Notice how the calls to destruct and rewrite in the earlier proof of this theorem are combined here into a single call to destruct.

(To see this clearly, execute the two proofs of filter_not_empty_In with Coq and observe the differences in proof state at the beginning of the first case of the destruct.)

Theorem filter_not_empty_In' : ∀ n l,
  filter (fun x ⇒ n =? x) l ≠ [] →
  In n l.
Proof.
  intros n l. induction l as [|m l' IHl'].
  - (* l =  *)
    simpl. intros H. apply H. reflexivity.
  - (* l = m :: l' *)
    simpl. destruct (eqbP n m) as [EQnm | NEQnm].
    + (* n = m *)
      intros _. rewrite EQnm. left. reflexivity.
    + (* n <> m *)
      intros H'. right. apply IHl'. apply H'.
Qed.

Exercise: 3 stars, standard, optional (eqbP_practice)

Use eqbP as above to prove the following:

Fixpoint count n l :=
  match l with
  | [] ⇒ 0
  | m :: l' ⇒ (if n =? m then 1 else 0) + count n l'
  end.

Theorem eqbP_practice : ∀ n l,
  count n l = 0 → ~(In n l ).
Proof.
  intros n l Hcount. induction l as [| m l' IHl'].
  (* FILL IN HERE *) Admitted.
☐

This small example shows reflection giving us a small gain in convenience; in larger developments, using reflect consistently can often lead to noticeably shorter and clearer proof scripts. We'll see many more examples in later chapters and in Programming Language Foundations.

This use of reflect was popularized by SSReflect, a Coq library that has been used to formalize important results in mathematics, including the 4-color theorem and the Feit-Thompson theorem. The name SSReflect stands for small-scale reflection, i.e., the pervasive use of reflection to simplify small proof steps by turning them into boolean computations.

Additional Exercises

Exercise: 3 stars, standard, optional (nostutter_defn)

Formulating inductive definitions of properties is an important skill you'll need in this course. Try to solve this exercise without any help.

We say that a list "stutters" if it repeats the same element consecutively. (This is different from not containing duplicates: the sequence [1;4;1] has two occurrences of the element 1 but does not stutter.) The property "nostutter mylist" means that mylist does not stutter. Formulate an inductive definition for nostutter.

Inductive nostutter {X:Type} : list X → Prop :=
(* FILL IN HERE *)
.

Make sure each of these tests succeeds, but feel free to change the suggested proof (in comments) if the given one doesn't work for you. Your definition might be different from ours and still be correct, in which case the examples might need a different proof. (You'll notice that the suggested proofs use a number of tactics we haven't talked about, to make them more robust to different possible ways of defining nostutter. You can probably just uncomment and use them as-is, but you can also prove each example with more basic tactics.)

Example test_nostutter_1: nostutter [3;1;4;1;5;6].
(* FILL IN HERE *) Admitted.
(*
Proof. repeat constructor; apply eqb_neq; auto.
Qed.
*)

Example test_nostutter_2: nostutter (@nil nat).
(* FILL IN HERE *) Admitted.
(*
Proof. repeat constructor; apply eqb_neq; auto.
Qed.
*)

Example test_nostutter_3: nostutter [5].
(* FILL IN HERE *) Admitted.
(*
Proof. repeat constructor; auto. Qed.
*)

Example test_nostutter_4: not (nostutter [3;1;1;4]).
(* FILL IN HERE *) Admitted.
(*
  Proof. intro.
  repeat match goal with
    h: nostutter _ ⊢ _ => inversion h; clear h; subst
  end.
  contradiction; auto. Qed.
*)

(* Do not modify the following line: *)
Definition manual_grade_for_nostutter : option (nat ×string) := None.
☐

Exercise: 4 stars, advanced, optional (filter_challenge)

Let's prove that our definition of filter from the Poly chapter matches an abstract specification. Here is the specification, written out informally in English:

A list l is an "in-order merge" of l₁ and l₂ if it contains all the same elements as l₁ and l₂, in the same order as l₁ and l₂, but possibly interleaved. For example,
    [1;4;6;2;3] is an in-order merge of
    [1;6;2] and
    [4;3]. Now, suppose we have a set X, a function test: X→bool, and a list l of type list X. Suppose further that l is an in-order merge of two lists, l₁ and l₂, such that every item in l₁ satisfies test and no item in l₂ satisfies test. Then filter test l = l₁.

First define what it means for one list to be a merge of two others. Do this with an inductive relation, not a Fixpoint.

Inductive merge {X:Type} : list X → list X → list X → Prop :=
(* FILL IN HERE *)
.

Theorem merge_filter : ∀ (X : Set) (test: X →bool) (l l₁ l₂ : list X),
  merge l₁ l₂ l →
  All (fun n ⇒ test n = true) l₁ →
  All (fun n ⇒ test n = false) l₂ →
  filter test l = l₁.
Proof.
  (* FILL IN HERE *) Admitted.

(* FILL IN HERE *)
☐

Exercise: 5 stars, advanced, optional (filter_challenge_2)

A different way to characterize the behavior of filter goes like this: Among all subsequences of l with the property that test evaluates to true on all their members, filter test l is the longest. Formalize this claim and prove it.

(* FILL IN HERE *)
☐

Exercise: 4 stars, standard, optional (palindromes)

A palindrome is a sequence that reads the same backwards as forwards.

Define an inductive proposition pal on list X that captures what it means to be a palindrome. (Hint: You'll need three cases. Your definition should be based on the structure of the list; just having a single constructor like
c : ∀ l, l = rev l → pal l may seem obvious, but will not work very well.)
Prove (pal_app_rev) that
∀ l, pal (l ++ rev l).
Prove (pal_rev that)
∀ l, pal l → l = rev l.

Inductive pal {X:Type} : list X → Prop :=
(* FILL IN HERE *)
.

Theorem pal_app_rev : ∀ (X:Type) (l : list X),
pal (l ++ (rev l )).
Proof.
(* FILL IN HERE *) Admitted.

Theorem pal_rev : ∀ (X:Type) (l: list X) , pal l → l = rev l.
Proof.
(* FILL IN HERE *) Admitted.
☐

Exercise: 5 stars, standard, optional (palindrome_converse)

Again, the converse direction is significantly more difficult, due to the lack of evidence. Using your definition of pal from the previous exercise, prove that
∀ l, l = rev l → pal l.

Theorem palindrome_converse: ∀ {X: Type} (l: list X),
l = rev l → pal l.
Proof.
(* FILL IN HERE *) Admitted.
☐

Exercise: 4 stars, advanced, optional (NoDup)

Recall the definition of the In property from the Logic chapter, which asserts that a value x appears at least once in a list l:

(* Fixpoint In (A : Type) (x : A) (l : list A) : Prop :=
   match l with
   |  => False
   | x' :: l' => x' = x \/ In A x l'
   end *)

Your first task is to use In to define a proposition disjoint X l₁ l₂, which should be provable exactly when l₁ and l₂ are lists (with elements of type X) that have no elements in common.

(* FILL IN HERE *)

Next, use In to define an inductive proposition NoDup X l, which should be provable exactly when l is a list (with elements of type X) where every member is different from every other. For example, NoDup nat [1;2;3;4] and NoDup bool [] should be provable, while NoDup nat [1;2;1] and NoDup bool [true;true] should not be.

(* FILL IN HERE *)

Finally, state and prove one or more interesting theorems relating disjoint, NoDup and ++ (list append).

(* FILL IN HERE *)

(* Do not modify the following line: *)
Definition manual_grade_for_NoDup_disjoint_etc : option (nat ×string) := None.
☐

Exercise: 4 stars, advanced, optional (pigeonhole_principle)

The pigeonhole principle states a basic fact about counting: if we distribute more than n items into n pigeonholes, some pigeonhole must contain at least two items. As often happens, this apparently trivial fact about numbers requires non-trivial machinery to prove, but we now have enough...

First prove an easy and useful lemma.

Lemma in_split : ∀ (X:Type) (x:X) (l:list X),
  In x l →
  ∃ l₁ l₂ , l = l₁ ++ x :: l₂.
Proof.
  (* FILL IN HERE *) Admitted.

Now define a property repeats such that repeats X l asserts that l contains at least one repeated element (of type X).

Inductive repeats {X:Type} : list X → Prop :=
(* FILL IN HERE *)
.

(* Do not modify the following line: *)
Definition manual_grade_for_check_repeats : option (nat ×string) := None.

Now, here's a way to formalize the pigeonhole principle. Suppose list l₂ represents a list of pigeonhole labels, and list l₁ represents the labels assigned to a list of items. If there are more items than labels, at least two items must have the same label -- i.e., list l₁ must contain repeats.

This proof is much easier if you use the excluded_middle hypothesis to show that In is decidable, i.e., ∀ x l, (In x l) ∨ ¬ (In x l). However, it is also possible to make the proof go through without assuming that In is decidable; if you manage to do this, you will not need the excluded_middle hypothesis.

Theorem pigeonhole_principle: excluded_middle →
  ∀ (X:Type) (l₁ l₂:list X),
  (∀ x, In x l₁ → In x l₂ ) →
  length l₂ < length l₁ →
  repeats l₁.
Proof.
  intros EM X l₁. induction l₁ as [|x l₁' IHl1'].
  (* FILL IN HERE *) Admitted.
☐

Extended Exercise: A Verified Regular-Expression Matcher

We have now defined a match relation over regular expressions and polymorphic lists. We can use such a definition to manually prove that a given regex matches a given string, but it does not give us a program that we can run to determine a match automatically.

It would be reasonable to hope that we can translate the definitions of the inductive rules for constructing evidence of the match relation into cases of a recursive function that reflects the relation by recursing on a given regex. However, it does not seem straightforward to define such a function in which the given regex is a recursion variable recognized by Coq. As a result, Coq will not accept that the function always terminates.

Heavily-optimized regex matchers match a regex by translating a given regex into a state machine and determining if the state machine accepts a given string. However, regex matching can also be implemented using an algorithm that operates purely on strings and regexes without defining and maintaining additional datatypes, such as state machines. We'll implement such an algorithm, and verify that its value reflects the match relation.

We will implement a regex matcher that matches strings represented as lists of ASCII characters:

Require Import Coq.Strings.Ascii.

Definition string := list ascii.

The Coq standard library contains a distinct inductive definition of strings of ASCII characters. However, we will use the above definition of strings as lists as ASCII characters in order to apply the existing definition of the match relation.

We could also define a regex matcher over polymorphic lists, not lists of ASCII characters specifically. The matching algorithm that we will implement needs to be able to test equality of elements in a given list, and thus needs to be given an equality-testing function. Generalizing the definitions, theorems, and proofs that we define for such a setting is a bit tedious, but workable.

The proof of correctness of the regex matcher will combine properties of the regex-matching function with properties of the match relation that do not depend on the matching function. We'll go ahead and prove the latter class of properties now. Most of them have straightforward proofs, which have been given to you, although there are a few key lemmas that are left for you to prove.

Each provable Prop is equivalent to True.

Lemma provable_equiv_true : ∀ (P : Prop), P → (P ↔ True ).
Proof.
  intros.
  split.
  - intros. constructor.
  - intros _. apply H.
Qed.

Each Prop whose negation is provable is equivalent to False.

Lemma not_equiv_false : ∀ (P : Prop), ¬P → (P ↔ False ).
Proof.
  intros.
  split.
  - apply H.
  - intros. destruct H₀.
Qed.

EmptySet matches no string.

Lemma null_matches_none : ∀ (s : string), (s =~ EmptySet ) ↔ False.
Proof.
  intros.
  apply not_equiv_false.
  unfold not. intros. inversion H.
Qed.

EmptyStr only matches the empty string.

Lemma empty_matches_eps : ∀ (s : string), s =~ EmptyStr ↔ s = [ ].
Proof.
  split.
  - intros. inversion H. reflexivity.
  - intros. rewrite H. apply MEmpty.
Qed.

EmptyStr matches no non-empty string.

Lemma empty_nomatch_ne : ∀ (a : ascii) s, (a :: s =~ EmptyStr ) ↔ False.
Proof.
  intros.
  apply not_equiv_false.
  unfold not. intros. inversion H.
Qed.

Char a matches no string that starts with a non-a character.

Lemma char_nomatch_char :
  ∀ (a b : ascii) s, b ≠ a → (b :: s =~ Char a ↔ False ).
Proof.
  intros.
  apply not_equiv_false.
  unfold not.
  intros.
  apply H.
  inversion H₀.
  reflexivity.
Qed.

If Char a matches a non-empty string, then the string's tail is empty.

Lemma char_eps_suffix : ∀ (a : ascii) s, a :: s =~ Char a ↔ s = [ ].
Proof.
  split.
  - intros. inversion H. reflexivity.
  - intros. rewrite H. apply MChar.
Qed.

App re₀ re₁ matches string s iff s = s₀ ++ s₁, where s₀ matches re₀ and s₁ matches re₁.

Lemma app_exists : ∀ (s : string) re₀ re₁,
  s =~ App re₀ re₁ ↔
  ∃ s₀ s₁ , s = s₀ ++ s₁ ∧ s₀ =~ re₀ ∧ s₁ =~ re₁.
Proof.
  intros.
  split.
  - intros. inversion H. ∃ s₁, s₂. split.
    × reflexivity.
    × split. apply H₃. apply H₄.
  - intros [ s₀ [ s₁ [ Happ [ Hmat0 Hmat1 ] ] ] ].
    rewrite Happ. apply (MApp s₀ _ s₁ _ Hmat0 Hmat1).
Qed.

Exercise: 3 stars, standard, optional (app_ne)

App re₀ re₁ matches a::s iff re₀ matches the empty string and a::s matches re₁ or s=s₀++s₁, where a::s₀ matches re₀ and s₁ matches re₁.

Even though this is a property of purely the match relation, it is a critical observation behind the design of our regex matcher. So (1) take time to understand it, (2) prove it, and (3) look for how you'll use it later.

Lemma app_ne : ∀ (a : ascii) s re₀ re₁,
  a :: s =~ (App re₀ re₁ ) ↔
  ([ ] =~ re₀ ∧ a :: s =~ re₁ ) ∨
  ∃ s₀ s₁ , s = s₀ ++ s₁ ∧ a :: s₀ =~ re₀ ∧ s₁ =~ re₁.
Proof.
  (* FILL IN HERE *) Admitted.
☐

s matches Union re₀ re₁ iff s matches re₀ or s matches re₁.

Lemma union_disj : ∀ (s : string) re₀ re₁,
  s =~ Union re₀ re₁ ↔ s =~ re₀ ∨ s =~ re₁.
Proof.
  intros. split.
  - intros. inversion H.
    + left. apply H₂.
    + right. apply H₁.
  - intros [ H | H ].
    + apply MUnionL. apply H.
    + apply MUnionR. apply H.
Qed.

Exercise: 3 stars, standard, optional (star_ne)

a::s matches Star re iff s = s₀ ++ s₁, where a::s₀ matches re and s₁ matches Star re. Like app_ne, this observation is critical, so understand it, prove it, and keep it in mind.

Hint: you'll need to perform induction. There are quite a few reasonable candidates for Prop's to prove by induction. The only one that will work is splitting the iff into two implications and proving one by induction on the evidence for a :: s =~ Star re. The other implication can be proved without induction.

In order to prove the right property by induction, you'll need to rephrase a :: s =~ Star re to be a Prop over general variables, using the remember tactic.

Lemma star_ne : ∀ (a : ascii) s re,
  a :: s =~ Star re ↔
  ∃ s₀ s₁ , s = s₀ ++ s₁ ∧ a :: s₀ =~ re ∧ s₁ =~ Star re.
Proof.
  (* FILL IN HERE *) Admitted.
☐

The definition of our regex matcher will include two fixpoint functions. The first function, given regex re, will evaluate to a value that reflects whether re matches the empty string. The function will satisfy the following property:

Definition refl_matches_eps m :=
∀ re : reg_exp ascii, reflect ([ ] =~ re) (m re).

Exercise: 2 stars, standard, optional (match_eps)

Complete the definition of match_eps so that it tests if a given regex matches the empty string:

Fixpoint match_eps (re: reg_exp ascii) : bool
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
☐

Exercise: 3 stars, standard, optional (match_eps_refl)

Now, prove that match_eps indeed tests if a given regex matches the empty string. (Hint: You'll want to use the reflection lemmas ReflectT and ReflectF.)

Lemma match_eps_refl : refl_matches_eps match_eps.
Proof.
(* FILL IN HERE *) Admitted.
☐

We'll define other functions that use match_eps. However, the only property of match_eps that you'll need to use in all proofs over these functions is match_eps_refl.

The key operation that will be performed by our regex matcher will be to iteratively construct a sequence of regex derivatives. For each character a and regex re, the derivative of re on a is a regex that matches all suffixes of strings matched by re that start with a. I.e., re' is a derivative of re on a if they satisfy the following relation:

Definition is_der re (a : ascii) re' :=
∀ s, a :: s =~ re ↔ s =~ re'.

A function d derives strings if, given character a and regex re, it evaluates to the derivative of re on a. I.e., d satisfies the following property:

Definition derives d := ∀ a re, is_der re a (d a re).

Exercise: 3 stars, standard, optional (derive)

Define derive so that it derives strings. One natural implementation uses match_eps in some cases to determine if key regex's match the empty string.

Fixpoint derive (a : ascii) (re : reg_exp ascii) : reg_exp ascii
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
☐

The derive function should pass the following tests. Each test establishes an equality between an expression that will be evaluated by our regex matcher and the final value that must be returned by the regex matcher. Each test is annotated with the match fact that it reflects.

Example c := ascii_of_nat 99.
Example d := ascii_of_nat 100.

"c" =~ EmptySet:

Example test_der0 : match_eps (derive c (EmptySet)) = false.
Proof.
(* FILL IN HERE *) Admitted.

"c" =~ Char c:

Example test_der1 : match_eps (derive c (Char c)) = true.
Proof.
(* FILL IN HERE *) Admitted.

"c" =~ Char d:

Example test_der2 : match_eps (derive c (Char d)) = false.
Proof.
(* FILL IN HERE *) Admitted.

"c" =~ App (Char c) EmptyStr:

Example test_der3 : match_eps (derive c (App (Char c) EmptyStr)) = true.
Proof.
(* FILL IN HERE *) Admitted.

"c" =~ App EmptyStr (Char c):

Example test_der4 : match_eps (derive c (App EmptyStr (Char c))) = true.
Proof.
(* FILL IN HERE *) Admitted.

"c" =~ Star c:

Example test_der5 : match_eps (derive c (Star (Char c))) = true.
Proof.
(* FILL IN HERE *) Admitted.

"cd" =~ App (Char c) (Char d):

Example test_der6 :
match_eps (derive d (derive c (App (Char c) (Char d)))) = true.
Proof.
(* FILL IN HERE *) Admitted.

"cd" =~ App (Char d) (Char c):

Example test_der7 :
match_eps (derive d (derive c (App (Char d) (Char c)))) = false.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 4 stars, standard, optional (derive_corr)

Prove that derive in fact always derives strings.

Hint: one proof performs induction on re, although you'll need to carefully choose the property that you prove by induction by generalizing the appropriate terms.

Hint: if your definition of derive applies match_eps to a particular regex re, then a natural proof will apply match_eps_refl to re and destruct the result to generate cases with assumptions that the re does or does not match the empty string.

Hint: You can save quite a bit of work by using lemmas proved above. In particular, to prove many cases of the induction, you can rewrite a Prop over a complicated regex (e.g., s =~ Union re₀ re₁) to a Boolean combination of Prop's over simple regex's (e.g., s =~ re₀ ∨ s =~ re₁) using lemmas given above that are logical equivalences. You can then reason about these Prop's naturally using intro and destruct.

Lemma derive_corr : derives derive.
Proof.
(* FILL IN HERE *) Admitted.
☐

We'll define the regex matcher using derive. However, the only property of derive that you'll need to use in all proofs of properties of the matcher is derive_corr.

A function m matches regexes if, given string s and regex re, it evaluates to a value that reflects whether s is matched by re. I.e., m holds the following property:

Definition matches_regex m : Prop :=
∀ (s : string) re, reflect (s =~ re) (m s re).

Exercise: 2 stars, standard, optional (regex_match)

Complete the definition of regex_match so that it matches regexes.

Fixpoint regex_match (s : string) (re : reg_exp ascii) : bool
(* REPLACE THIS LINE WITH ":= _your_definition_ ." *). Admitted.
☐

Exercise: 3 stars, standard, optional (regex_match_correct)

Finally, prove that regex_match in fact matches regexes.

Hint: if your definition of regex_match applies match_eps to regex re, then a natural proof applies match_eps_refl to re and destructs the result to generate cases in which you may assume that re does or does not match the empty string.

Hint: if your definition of regex_match applies derive to character x and regex re, then a natural proof applies derive_corr to x and re to prove that x :: s =~ re given s =~ derive x re, and vice versa.

Theorem regex_match_correct : matches_regex regex_match.
Proof.
(* FILL IN HERE *) Admitted.
☐

(* 2023-10-12 11:33 *)

IndProp: Inductively Defined Propositionspart 2

Case Study: Regular Expressions

Exercise: 3 stars, standard (exp_match_ex1)

Exercise: 2 stars, standard, optional (EmptyStr_not_needed)

Exercise: 4 stars, standard, optional (re_not_empty)

The remember Tactic

Exercise: 4 stars, standard (exp_match_ex2)

Exercise: 5 stars, advanced, optional (weak_pumping)

Exercise: 5 stars, advanced, optional (pumping)

Case Study: Improving Reflection

Exercise: 2 stars, standard, optional (reflect_iff)

Exercise: 3 stars, standard, optional (eqbP_practice)

Additional Exercises

Exercise: 3 stars, standard, optional (nostutter_defn)

Exercise: 4 stars, advanced, optional (filter_challenge)

Exercise: 5 stars, advanced, optional (filter_challenge_2)

Exercise: 4 stars, standard, optional (palindromes)

Exercise: 5 stars, standard, optional (palindrome_converse)

Exercise: 4 stars, advanced, optional (NoDup)

Exercise: 4 stars, advanced, optional (pigeonhole_principle)

Extended Exercise: A Verified Regular-Expression Matcher

Exercise: 3 stars, standard, optional (app_ne)

Exercise: 3 stars, standard, optional (star_ne)

Exercise: 2 stars, standard, optional (match_eps)

Exercise: 3 stars, standard, optional (match_eps_refl)

Exercise: 3 stars, standard, optional (derive)

Exercise: 4 stars, standard, optional (derive_corr)

Exercise: 2 stars, standard, optional (regex_match)

Exercise: 3 stars, standard, optional (regex_match_correct)

Exercise: 3 stars, standard (exp_match_ex₁)

Exercise: 4 stars, standard (exp_match_ex₂)