LEA
Fakultät für Informatik der Technischen Universität München
Lehrstuhl für Effiziente Algorithmen
Postadresse: 80290 München; Hausadresse: Arcisstr.21, 80333 München

Proving the PCP-Theorem

Volker Heun and Wolfgang Merkle and Ulrich Weigand


The PCP-Theorem states that NP is equal to the class PCP(log n,1) of languages which are recognized by verifiers which on an input of length n use at most O(log n) random bits and query only O(1) positions of their proofs. In the following we give an essentially self-contained account of the PCP-Theorem and its proof which is based on the original work due to Arora [Aro94], Lund, Motwani, Sudan, and Szegedy [ALMSS92], and Arora and Safra [ArSa92]. We also use ideas and concepts from the presentation given by Hougardy, Prömel, and Steger [HPS94].

It is well-known that the class NP can be characterized in the following way: A decision problem belongs to NP if each of its solutions has a membership proof which can be verified in polynomial time by a deterministic Turing machine. For instance, consider the language of satisfiable Boolean formulae in conjunctive normal form with exactly three literals per clause, abbreviated by 3-SAT. A short, efficiently checkable membership proof for a Boolean formula is a satisfying assignment of its variables. Such membership proofs for solutions of a decision problem belonging to NP are usually concise. On the other hand, it is in general necessary to read the whole membership proof to decide whether it is in fact a proof.

In case of probabilistically checkable proofs it is not necessary for a verifier to read the whole proof to decide whether a given instance is a solution of the considered decision problem. Here, a verifier is a polynomial time bounded Turing machine with additional access to a stream of random bits and random access to a given proof. After processing the input and some random bits, the verifier decides which bits of the proof will be read. Depending on these queried bits, the verifier either accepts or rejects the input. We say a language has a probabilistically checkable proof iff for each input belonging to the language there exists a proof which the verifier accepts for all possible streams of random bits, while, for all inputs not belonging to the language, the verifier rejects all proofs for most streams of random bits.

Obviously, all decision problems belonging to NP have probabilistically checkable proofs. Surprisingly, for each language in NP, there exists a probabilistically checkable proof and a verifier that reads only a constant number of bits of this proof and a `few' random bits. Such a probabilistically checkable proof must obviously be more structured than membership proofs used for the class NP. As we will see, for each language belonging to NP, it is relatively easy to construct a probabilistically checkable proof where a verifier reads only a constant number of bits, if the verifier is allowed to read a polynomial number of random bits. As a consequence, the length of those proofs grows exponentially. Highly nontrivial and amazing is the fact that, for each language in NP, it is possible to construct a probabilistically checkable proof of polynomial length such that only a constant number of bits of the proof are read by a suitable verifier.

In this section, we give an outline of the proof of the PCP-Theorem, which states that NP=PCP(log n,1). It can be easily verified that PCP(log n,1) subset NP. Hence it remains to show the reverse containment NP subset PCP(log n,1). Clearly, it is sufficient to show that an NP-complete problem is contained in PCP(log n,1). In the remainder of this chapter, we will prove that 3-SAT in PCP(log n,1), which completes the proof of the PCP-Theorem.

The structure of the proof that 3-SAT in PCP(log n,1) is illustrated in the following Figure:

As introduced in Chapter 4, PCP(r,q) denotes the class of languages with probabilistically checkable proofs such that a verifier requires O(r) random bits and queries O(q) bits. In this figure only, constant-PCP(r,q) denotes the restricted class of languages with probabilistically checkable proofs, where the verifier is restricted to query the O(q) bits from a constant number of segments. Here a segment is a contiguous block of O(q) bit positions.

Obviously, a satisfying assignment of a given Boolean formula is a proof of its satisfiability. But clearly it is not sufficient to read only a constant number of bits of this proof for a verification. A key idea is to extend such a proof using linear functions from F2m into F2, where the images of basis vectors depend on the given assignment in a more or less natural way. Due to the linearity the images of the basis vectors of F2 determine the images of all other vectors and m will be a polynomial in the length of the given Boolean formula. We will refer to this encoding as linear code and in general we call the conversion of an assignment into a function an arithmetization. A proof will be constructed from the linear code by listing all values of the considered linear function using some fixed ordering on the arguments.

The redundancy of such a proof has the following three important properties. It is possible to check whether a given proof is nearly a linear encoding of an assignment for 3-SAT by reading only a constant number of bits of the proof. Moreover, the linear encoding is fault-tolerant, i.e., whenever only a small fraction of the proof differs from a linear code, we are able to reconstruct the correct values using only a constant number of bits of the proof. Finally, a constant number of bits of the proof will suffice to decide whether the linear encoding corresponds to a satisfying assignment. Since the dimension of the considered vector space is polynomial in the size of the Boolean formula, the length of the corresponding proof is exponential in the size of the given Boolean formula and hence a polynomial number of random bits is required to get random access to the proof. This establishes that 3-SAT in PCP(n3,1).

The key idea for reducing the number of random bits, which also implies a reduction of the length of the proof, is a more concise encoding of an assignment. Instead of linear functions we use multivariate polynomials of low degree as an encoding. We will refer to such an encoding based on multivariate polynomials as the polynomial code. In this case the length of such an encoding will be polynomially related to the size of the given Boolean formula. The polynomial code has similar properties as the linear code. It is possible to verify whether a given proof is a polynomial encoding of an assignment and the polynomial code is fault-tolerant. Unfortunately, for the verification and the reconstruction a polylogarithmic number of bits of the proof have to be inspected. Nevertheless, this is the second major step in proving the PCP-Theorem. The proof of these details is rather complicated and requires mathematical tools like the so-called LFKN-Test and the Low Degree Test. This result can be improved in the following sense. There exists a probabilistically checkable proof for 3-SAT such that only a constant number of `contiguous blocks' of polylogarithmic length have to be inspected by the verifier instead of polylogarithmic bits at arbitrary positions.

It is possible to compose two verifiers. This is the main idea for a further reduction of the number of queried bits while the length of the proof does not increase significantly. First, the proof system based on the polynomial code is composed with itself obtaining a probabilistically checkable proof of polynomial length, where only a constant number of blocks of size O(poly(loglog n)) have to be inspected by the verifier. Then these blocks of length O(poly(loglog n)) in the composed proof system will be encoded using the proof system based on linear codes. So we obtain a proof system of polynomial length where only a constant number of bits have to be inspected by the verifier. As a consequence we obtain NP=PCP(log n,1).