\documentstyle{article} \input{preamble-light.tex} \setlength{\oddsidemargin}{-0.1in} \setlength{\evensidemargin}{-0.1in} \setlength{\textwidth}{6.7in} \setlength{\topmargin}{-0.6in} \setlength{\textheight}{8.9in} \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \begin{document} \lecture{6}{February 25, 1997}{Dan Spielman}{Yevgeniy Dodis} \section{Diagonalization and NP} The main result of this lecture is the following somewhat surprising theorem: \begin{theorem} There are (computable) oracles $A$ and $B$ such that $P^A = NP^A$, but $P^B\neq NP^B$. \end{theorem} Before proving the theorem, let's make a remark of its importance. A lot (most) of ``inequality results'' (i.e. the results stating that 2 classes are different) in complexity are proven using the {\it diagonalization method}. We somehow construct an object of one class that is guaranteed to be different from all objects of the second class, thus proving that the classes are different. A typical example of this approach is unsolvability of the Halting problem. Another one is time/space hierarchy theorem, where a ``stronger'' machine simulates any weaker one for a particular input (number of that machine) and reverses the output of the ``weaker'' machine for that particular input. It was believed initially that this powerful and common diagonalization idea could be used in proving that $P\neq NP$. The above result shows that diagonalization is very unlikely to help in this problem. The intuitive reason for that is that if diagonalization works to prove inequality of non-relativized classes $P$ and $NP$, we expect it to prove inequality of $P^A$ and $NP^A$, which are equal. On the other hand, even in a very unlikely case that $P=NP$, we don't expect the proof of this fact to be a direct simulation of any non-deterministic machine by a deterministic one, since that simulation is likely to work to prove equality of $P^B$ and $NP^B$, which are not equal. In essence, it the theorem is a further evidence that $P=NP$ question is hard to prove. Now, let's go to the proof of the theorem. a) Let's construct required language $A$, s.t. $P^A=NP^A$. IDEA: we need an oracle, for which non-determinism ``does not matter''. We know that $PSPACE=NPSPACE$, suggesting that we can try any $PSPACE$-complete language as a good candidate for $A$.\\ PROOF: Let $A=QBF$ (quantified boolean formulas language). We claim that $PSPACE\subseteq P^A$. Indeed, if $L\in PSPACE$, we construct $M^A$ as follows: since $QBF$ is $PSPACE$-complete, map instance of problem in $L$ to equivalent instance of $QBF$ using the polynomial time reduction, and then simply ask the oracle $QBF$ for an answer. Next, we clearly have that $P^A\subseteq NP^A$. Finally, we claim that $NP^A\subseteq NPSPACE$. Our $NPSPACE$ machine simply simulates the $NP^A$ machine, and whenever our $NP^A$-machine needs to ask the $QBF$-oracle, the $NPSPACE$ machine simply computes the answer by brute force in $PSPACE$, which it can do since $QBF\in PSPACE$. And recalling that $PSPACE = NPSPACE$, we get a chain of inclusions: \\ $PSPACE\subseteq P^A\subseteq NP^A\subseteq NPSPACE=PSPACE$, making $P^A=NP^A$. b) Now, let's construct oracle $B$, such that $P^B\ne NP^B$. We want to construct a language $L\in NP^B-P^B$. Clearly, $L$ should crucially depend on $B$. For any oracle B, define $$L(B)=\{x|\exists y\in B s.t. |y|=|x|\},$$ so $L(B)$ contains all strings of the same length as some element of $B$. First, it's easy to see that for any $B$ we have $L(B)\in NP^B$. Indeed, we just guess an appropriate string $y\in B$ of same length as input $x$ and ask our oracle $B$ whether $y\in B$. We will construct $B$ in such a way that $L(B)\in NP^B-P^B$, which by above remark is equivalent to saying that $L(B)\not\in P^B$. To decide $L(B)$ deterministically in polynomial time, some polynomial oracle TM $M^B$, given an input $x$ of size $n$, should be able to ``go through'' $2^n$ possible choices for $y$ in polynomial time, and so, it has to do it without going through all the choices for big enough $n$. But this suggests a way to construct $B$. $B$ will ``deceive'' any polynomial time oracle TM $M^?$ ($?$ stands for the fact that $M$ uses some, yet unspecified, oracle) in the following way: we choose large enough $n$, so that $M^?$ runs in less that $2^n$ time, and so asks less than $2^n$ questions to the oracle. We feed $M^?$ input $0^n$, and answer ``no'' to all queries that $M^?$ makes. If at the end $M^?$ rejects (which any reasonable machine should do when getting constant ``no''s, but $M^?$ might be ``stupid''), we find a string $y$ of length $n$ that was not queried by $M^?$ (it exists since $M^?$ asked less than $2^n$ queries), and declare $y\in B$, thus making $0^n\in L(B)$ and making $M^?$ be wrong on input $0^n$ in trying to decide $L(B)$. If, for a strange reason, $M^?$ accepts, we say that, indeed, no string of length $n$ is in $B$, thus making $0^n\not\in L(B)$ and making $M^?$ wrong again. This way we guarantee that $M^?$ does not decide $L(B)$. Any particular polynomial-time machine $M^?$ can be ``deceived'' for ANY large enough $n$. Now, we only need to make $B$ deceive ALL polynomial time oracle machines. Here we use brute force, by enumerating all such machines and picking larger and larger n for successive machines to avoid overlapping (so here we are using the diagonalization method despite the fact that we want to show that it's useless in proving $P\not=NP$). Here is precise (algorithmic) description of B: Let $M^{?}_{1}$, $M^{?}_{2}$, $M^{?}_{3}$, \ldots be an enumeration of all polynomial-time oracle TM's (for now, we don't care how to compute it, we are {\it defining} $B$, so we just need the existence of enumeration). \begin{enumerate} \item Let $B_0=\emptyset$, $n=0$ \item for $i=1,2,3,\ldots$ do \begin{enumerate} \item choose $n_i>n$, such that $M^{B_{i-1}}_i$ on input $0^{n_i}$ asks fewer than $2^{n_i}$ questions (such $n_{i}$ exists). \item If $M^{B_{i-1}}_i$ accepts $0^{n_i}$ then let $B_i=B_{i-1}$ (so we don't include any new strings to $B$). Else if it rejects Let $B_i=B_{i-1} \cup \{$ some word $x$ s.t. $|x|=n_i$ and $x$ was not asked by $M^{B_{i-1}}_i$\}. \item set $n=max($length of longest quiery asked by $M^{B_{i-1}}_i,$ on input $0^{n_{i}}$. \end{enumerate} \item set $B=\bigcup_{i=0}^{\infty}B_i$ \end{enumerate} From above remarks we see that $L(B)\in NP^B-P^B$, as $M^{B}_i$ gives incorrect answer for input $0^{n_i}$. Remark: Above description of $B$ was mathematical, but we can make $B$ computable if we modify a bit the above algorithm. Instead of enumerating all polynomial oracle TM's, we can enumerate only the oracle TM's with explicit polynomial timer that will stop the machine if it runs for too long. This will include eventually all different (i.e. not giving same answer on all inputs) polytime oracle TM's. Then we can pick the smallest $n_i$ satisfying the criteria we want. Same for picking a string $x$ to be included in $B$ when we extend $B$ with a new string of size $n_i$. Then to decide whether $x\in B$ we run the above algorithm until $n_i\ge |x|$ and see whether $x$ was included into $B$. This makes $B$ computable. \section{Randomized Complexity Classes} Very often a brute force deterministic approach to a problem requires examining a lot (or all) of the possibilities, which is very costly to do. On the other hand, we might be sure that a good fraction of the possibilities (say a half) will be ``good'', i.e. they will provide the answer we want, we just don't know how to organize the search in the deterministic manner to be sure that we soon encounter such a good choice. Of course, in theory we can use a non-deterministic guess, but that's not realistic. On the other hand, if we have a good source of random numbers, we can choose a possibility at random and have a good chance to succeed. Contrary to non-deterministic approach, this one is very realistic and quite powerful. A motivating example to show the power of randomness (that will lead directly to the definition of randomized complexity classes) is the problem of polynomial identities. We are given 2 multivariable polynomials in some (maybe different for the two) implicit form (say, one is a determinant, the other is a product of multinomials). Basically, we assume that the only thing we can do efficiently with either polynomial is to evaluate it at an arbitrary point (the details of other things we can do depends on what implicit form is used). We need to check whether the given multivariable polynomials $p$ and $q$ are identical. Let $r=p-q$. The simplest idea that comes in mind is to evaluate $p$ and $q$ over some appropriately chosen set of points, so that $p\equiv q$ iff for all above mentioned points $x$, $p(x)=q(x)$, i.e. we want a set of points containing at least one non-root of $r$, if there is any. Unfortunately, $r$ might have a lot (even infinite number) of zeros, while being not identically zero, and since we don't know much about $r$, we don't know how to determine whether $r\not\equiv0$ in a systematic fashion. Here comes randomization. Choose a point at random form an appropriate domain (see below). If $r\not\equiv0$, we should have a decent chance to catch it very-very quickly. That's the correct intuition. The details follow. {\bf Schwartz lemma:} Assume $p(x_1, \ldots,x_n)$ is a non-zero polynomial, where each variable $x_i$ has degree at most $d>0$. Let $I$ be any domain of size $\ge nd$. Then $$\Pr_{\{x_1,\ldots,x_n\}\in I^n}(p(x_1,\ldots,x_n)=0)\le\frac{dn}{|I|}$$ Proof: We use induction on $n$. Let $n=1$. A non-zero polynomial of degree $\le d$ can have at most $d$ roots, so when we choose $x_1$ from a domain of size $|I|$, we have at least $\frac{d}{|I|}$ chance to pick a non-root, exactly as inequality says. Now, the inductive step. Let $d\ge{d_1}=$ highest degree of $x_1$ in $p$. And let $p_2(x_2,\ldots,x_n)$ be the coefficient in front of $x_1^{d_1}$. We have by inductive hypothesis that $$\Pr_{\{x_2,\ldots,x_n\}\in I^{n-1}}(p(x_2,\ldots,x_n)=0)\le\frac{d(n-1)}{|I|}$$ We have: \begin{eqnarray*} &\Pr_{\{x_1,\ldots,x_n\}\in I^n}&(p(x_1,\ldots,x_n)=0)\\ &=&\Pr_{\{x_1,\ldots,x_n\}\in I^n}(p(x_1,\ldots,x_n)=0 | p_2(x_2,\ldots, x_n) = 0) \times \Pr_{\{x_2,\ldots,x_n\}\in I^{n-1}}(p(x_2,\ldots,x_n)=0) \\ &+& \Pr_{\{x_1,\ldots,x_n\}\in I^n}(p(x_1,\ldots,x_n)=0 | p_2(x_2,\ldots, x_n) \ne 0) \times \Pr_{\{x_2,\ldots,x_n\}\in I^{n-1}}(p(x_2,\ldots,x_n)\ne 0) \\ &\le& 1\times \frac{d(n-1)}{|I|} + \frac{d}{|I|}\times 1 = \frac{dn}{|I|}. \end{eqnarray*} The first bound $\frac{d(n-1)}{|I|}$ was made by the inductive assumption for $n-1$, the second bound $\frac{d}{|I|}$ - by observing that as long as we know that $p_2(x_2,\ldots,x_n)\ne 0$, no matter what $x_2,\ldots ,x_n$ are chosen, there can be at most $d_1 \le d$ out of $|I|$ possible values of $x_1$ making $p(x_1,\ldots,x_n)=0$, and so the above bound follows, and thus is the lemma. {\bf Corollary:} Let $|I|\ge 2nd$, $p(x_1,\ldots ,x_n), q(x_1,\ldots ,x_n)$ be 2 polynomials, and each variable in either of them has maximum degree $\le d$. If we set $r=p-q$ then Schwartz lemma gives us the following: \begin{enumerate} \item If $r\equiv 0$ (i.e. $p\equiv q$), then $$\Pr_{\{x_1,\ldots,x_n\}\in I^n}(r(x_1,\ldots,x_n)\ne 0) = 0$$ \item If $r\not\equiv 0$ (i.e. $p\not\equiv q$), then $$\Pr_{\{x_1,\ldots,x_n\}\in I^n}(r(x_1,\ldots,x_n)\ne 0) \ge \frac{1}{2}$$ \end{enumerate} As we'll define next lecture, above corollary means that the language of polynomial identities is in the randomized complexity class $RP$. Let's try to answer a big question though: can we derandomize (i.e. make deterministic) the above algorithm, and, in general, how essential is randomization? First, observe that if we make ${I}\ge knd$ for some $k$, then probability of error (i.e. $r\not\equiv 0$, but we chose a root of $r$ as a sample point; call such misleading points ``bad'' for a particular identity) is $\le \frac{1}{k}$. Now, assume we look at all possible polynomial identities that can be written using $m$ bits. There are no more than $2^m$ such identities. As we observed, at most $\frac{1}{k}$ of possible sample points is bad for a particular identity, so, by union bound, at most $\frac{2^m}{k}$ fraction of sample points can be bad for {\it at least one} of $2^m$ identities. But now, if we let $k>2^m$, say $k=2^{m + 1}$, we know that there must exist a sample point ${y_1,\ldots y_n}$ good for {\it all} identities. The magnitude of $k$ is exponential, but it has has polynomial length, so we are OK. What it means is that we can feed above point as an advice for all identities of size $m$, and those identities can be trivially verified in polynomial time, so it looks like we don't need randomization. But the problem, of course, is that we'll have different sample points for different input lengths $m$, and we don't know how to find such points efficiently (that's why we used randomization in the first place). What we showed, however, is that the problem of polynomial identities is in $P/poly$, i.e. we have a non-uniform polynomial algorithm for deciding it. As we'll see, it's a general pattern: any problem that can be solved in polynomial time using randomization, can be solved in non-uniform polynomial time (the general proof is almost identical copy of the one we have for polynomial identities), but it remains one of the biggest questions whether we can avoid randomization altogether, i.e. whether $P=RP$. \end{document}