\documentclass{article} \newcommand{\class}[1]{\mathsf{#1}} \newcommand{\vect}[1]{{\overline #1}} \newcommand{\VV}{\mbox{VV}} \newcommand{\pr}{\mathrm{Pr}} \newcommand{\GapP}{\class{Gap\!-\!P}} \newcommand{\mymod}[1]{\!\!\!\pmod{#1}} \begin{document} \input{preamble.tex} \lecture{11}{March 13, 1997}{Daniel A. Spielman}{Salil Vadhan} \section{$\class{AC_0}$, Probabilistic Polynomials, and\\ Toda's Theorem (cont.)} In this lecture, we will complete the proof of Toda's theorem that $\class{PH}\subset \class{P}^{\class{\#P}}$. Recall that, en route to Toda's result, we were trying to prove that $\class{AC_0}$ predicates could be computed by ``probabilistic polynomials'' of polylogarithmic degree using polylogarithmic random inputs. Before proceeding, we recall the notation from last time and introduce some more notation. For $\vect{u},\vect{v}\in\{0,1\}^m$, define $$\delta_\vect{u}(\vect{v}) = \cases{ 1, & if $\vect{u}=\vect{v}$\cr 0, & otherwise.}$$ Note that $\delta_\vect{u}$ can be represented by a polynomial of degree $m$ as follows: $$\delta_\vect{u}(\vect{v}) = \prod_{i=1}^m (u_iv_i+(1-u_i)(1-v_i))$$ Last time we defined the Valiant-Vazirani function $\VV(\vect{i},\vect{r})$, which can be written in terms of $\delta_\vect{u}$ functions as follows: $$\VV(\vect{i},\vect{r}) = \sum_{\vect{u}: \langle \vect{u},\vect{i} \rangle = 0} \delta_\vect{u}(\vect{r}) = \cases{ 1, & if $\langle \vect{r},\vect{i} \rangle=0$\cr 0, & otherwise.} $$ Thus, for a fixed $\vect{i}$, $\VV(\vect{i},\vect{r})$ is a polynomial of degree $m$ in $\vect{r}$. We then defined functions $S_j$ by $$S_j(x_1,\ldots,x_n,\vect{r}^{(1)},\ldots,\vect{r}^{(m)}) = \sum_{\vect{i}} x_\vect{i} \left( \prod_{k=1}^j \VV(\vect{i},\vect{r}^{(k)}) \right).$$ In this definition, $n=2^m$, and we consider the variables $x_1,\ldots,x_n$ to be labelled by vectors $\vect{i}\in \{0,1\}^m$. Notice that $S_j$ is of degree at most $m^2$. Finally, we defined functions $T_\ell$ of degree polylogarithmic in $m$ by $$T_\ell(x_1,\ldots,x_n,\vect{r}^{(1,1)},\ldots,\vect{r}^{(m,1)}, \vect{r}^{(2,1)},\ldots,\vect{r}^{(m,\ell)}) = \prod_{a=1}^\ell \prod_{j=0}^m (1-S_j(x_1,\ldots,x_n,\vect{r}^{(1,a)},\ldots,\vect{r}^{(m,a)})).$$ In order to clarify the meaning of these functions, let us explicitly describe how they parallel the proof of the Valiant-Vazirani Theorem. Recall that, in that proof, we took a formula $\phi$ on $m$ variables and repeatedly intersected its set of satisfying assignments with random hyperplanes to obtain new formulae $\phi_1,\phi_2,\ldots,\phi_m$, where $\phi_i$ is the formula obtained after $i$ restrictions. An ``isolation lemma'' told us that if $\phi$ had at least one satisfying assignment then, with probability at least 1/8, one of the $\phi_i$ would have at {\em exactly} one satisfying assignment. In the above functions, the $\vect{i}$ vectors correspond to the satisfying assignments in the proof of Valiant-Vazirani and the $\vect{r}$ vectors correspond to the random hyperplanes. The $\VV(\vect{i},\vect{r})$ function tells us whether $\vect{i}$ lies in the hyperplane defined by $\vect{r}$. Think of $x_\vect{i}$ as being 1 if $\vect{i}$ is a satisfying assignment and 0 otherwise. Then $\sum_\vect{i} x_{\vect{i}}$ corresponds to the number of satisfying assignments. Multiplying $x_\vect{i}$ by $\VV(\vect{i},\vect{r}^{(k)})$ kills $x_\vect{i}$ if $\vect{i}$ doesn't lie in the hyperplane defined by $\vect{r}$. Thus $S_j$ corresponds to the number of satisfying assignments after $j$ random restrictions have been imposed. Finally, the inner product in $T_\ell$ is 1 when the restrictions $\vect{r}^{(1,a)},\ldots,\vect{r}^{(m,a)}$ only produce formulae with 0 satisfying assignments and is 0 when at least one of the formulae has 1 satisfying assignment. Thus, by the ``isolation lemma'' we used to prove Valiant-Vazirani, we have the following: \begin{proposition} If $\sum_\vect{i} x_\vect{i}=0$ then $T_\ell(\cdots)=1$.\\ If $\sum_\vect{i} x_\vect{i}>0$ then $\pr_{\{r^{(k,a)}\}}[T_\ell(\vect{x},\{\vect{r}^{(k,a)}\})=0]> 1-\left(\frac{7}{8}\right)^\ell$. \end{proposition} Thus $T_\ell$ probabilistically ``computes'' the NOR of the $x_\vect{i}$'s. By replacing the $x_\vect{i}$'s with $1-y_\vect{i}$ or by replacing the $T_\ell$ with $1-T_\ell$, we get probabilistic polynomials which compute OR and AND. (NOT is trivial.) We can now prove what we wanted about $\class{AC_0}$. \begin{proposition} For any $\class{AC_0}$ predicate $A$ of depth $d$, there exists a family of probabilistic polynomials $P$ of degree $(\log n)^{O(d)}$ with $(\log n)^{O(1)}$ random inputs such that $$\pr_r[P(x,r)=A(x)] > \frac{2}{3}.$$ \end{proposition} \begin{proof-idea} Starting with the leaves, replace each gate in the $\class{AC_0}$ circuit with the corresponding AND/OR/NOT polynomial in its inputs. Use the same random inputs $r$ for each polynomial, and if a gate is used more than once, simply use the same polynomial. In constructing the polynomials, set $\ell=m^2$, so that the probability of any fixed one giving the wrong answer on random input $r$ is at most $(7/8)^{m^2} = 1/n^{\Omega(\log n)}$. Since there are only $n^{O(1)}$ gates in all, the probability that any one fails on random input $r$ is $n^{O(1)}/n^{\Omega(\log n)} < 1/3$ for sufficiently large $n$. \end{proof-idea} We've shown how to compute $\class{AC_0}$ predicates with probabilistic polynomials, but it'd be better if we could remove the random inputs from our polynomials. A first attempt at this would be to consider $$Q(x)=\sum_r P(x,r),$$ where $P(x,r)$ is the probabilistic polynomial which computes an $\class{AC_0}$ predicate $A$. We know that if $A(x)=0$, then $P(x,r)$ is zero for most $r$ and that if $A(x)=1$, then $P(x,r)$ is 1 for most $r$, so it seems like we should be able to recover $A(x)$ from $Q(x)$. However, for the $r$ on which $P(x,r)$ is incorrect, $P(x,r)$ can take on all kinds of crazy values other than 0 and 1 and completely throw off the summation. Toda offered a clever solution to this problem. His idea was to work modulo a large power of 2 with not the original values $P(x,r)$, but values $f(P(x,r))$ for a suitably chosen function $f$. Define $h^{(1)}(z)=4z^3+3z^4$, and inductively define $h^{(c)}(z)=h(h^{(c-1)}(z))$. Then we can relate the value of $h(z)$ to the values of $z$ modulo 2 as follows: \begin{proposition} If $z\equiv 0 \mymod{2}$, then $h^{(c)}(z)\equiv 0 \mymod{2^{2^c}}$.\\ If $z\equiv -1 \mymod{2}$, then $h^{(c)}(z)\equiv -1 \mymod{2^{2^c}}$. \end{proposition} \begin{proof} Straightforward induction on $c$. \end{proof} Intuitively, if we let $f=h^{(c)}$ for $c$ sufficiently large, then $\sum_r f(P(x,r))$ taken modulo $2^{2^c}$ should enable us to estimate the probability that $P(x,r)=1$, because even when $P(x,r)$ takes on nasty integer values other than 0 or 1, $f(P(x,r))$ will still be only $0$ or $-1$ modulo $2^{2^c}$, so these bad values can't throw off the sum modulo $2^{2^c}$ too much. Formally, we have: \begin{proposition} Let $P$ be a probabilistic polynomial which computes $\class{AC_0}$ predicate $A$ as constructed earlier. Set $c=\lceil \log R\rceil +1$, where $R$ is the number of random inputs used by $P$. Then $$\sum_r h^{(c)}(P(x,r)) \mymod{2^{2R}} \in [2^{2R}-2^R,2^{2R}-\frac{2}{3}2^R]$$ if and only if $A(x)=1$. \end{proposition} \begin{proof} Since $2^c \geq 2R$, $2^{2R}$ divides $2^{2^c}$, so any equation that holds modulo $2^{2^c}$ also holds modulo $2^{2R}$. If $A(x)=1$, we know that, for at least $(2/3)2^R$ values of $r$, $P(x,r)$ equals 1 and thus $f(P(x,r))$ equals -1 modulo $2^{2R}$. For the other values of $r$, $f(P(x,r))$ equals 0 or -1 modulo $2^{2R}$. Since there are only $2^R$ choices for $r$, $\sum_r f(P(x,r))$ must lie in the range $[-2^{R},-(2/3)2^{R}]$ modulo $2^{2R}$, which is the same as the range listed in the proposition. Similarly, if $A(x)=0$, then for at least $(2/3)2^R$ values of $r$, $f(P(x,r))$ equals $0$ modulo $2^{2R}$. Thus $f(P(x,r))$ equals $-1$ for at most $(1/3)2^R$ values of $R$ and $\sum_r f(P(x,r)) \mymod{2^{2R}}$ must lie in the range $[-(1/3)2^R,0]$, which does not intersect the range listed in the proposition. \end{proof} We now have all the machinery we need to prove Toda's Theorem: \begin{theorem} $\class{PH}\subset \class{P}^\class{\#P}$. \end{theorem} \begin{proof} Let $L=\mbox{\sc QBF}_k$, which is complete for $\Sigma_k^p$, and let $M$ be the alternating Turing machine with at most $k$ alternations which accepts $L$. WLOG, we may assume that, on inputs $\phi$ of length $\mu$, every computation of $M$ has $\mu$ existential quantifiers, followed by $\mu$ universal quantifiers, followed by $\mu$ existential quantifiers, and so on, for a total of $k$ alternations, where these quantifiers correspond to choosing a vector of values $\vect{i}\in \{0,1\}^{k\mu}$ for the variables of $\phi$ (and some dummy variables for this normalization), and that the order in which variables are quantified is independent of the computation path. The computation of $M$ on input $\phi$ can be regarded as an $\class{AC_0}$ circuit as follows: replace the universal quantifiers with AND gates, replace the existential quantifiers with OR gates, and consider $\phi(\vect{i})\in\{0,1\}$ as input at the leaf corresponding to the nondeterministic choices $\vect{i}$. This circuit has depth $k$ and $2^{k\mu}$ inputs. (Group a sequence $\mu$ of AND gates into a single AND gate of fan-in $2^\mu$.) The construction we just completed shows that there is a polynomial $Q$ on the $2^{k\mu}$ variables $x_\vect{i}$ of degree polynomial in $m=k\mu$ whose value modulo a suitable power of $2$ reveals the value of this $\class{AC_0}$ predicate. If we substitute $x_\vect{i} = \phi(\vect{i})$, then we can recover the output of $M$ on input $\phi$ from the value of $Q$. Since it is a polynomial-time computable function, $f_0(\phi,\vect{i})=\phi(\vect{i})\in \GapP$. Using the closure properties of $\GapP$ from last lecture to analyze our construction of $Q$, we can see that $g(\phi)=Q(\{\phi(\vect{i})\})\in \GapP$. Thus, a single call to a $\GapP$ oracle for $g$ enables us decide $L$. We now show a bit more explicitly, why the function $g$ described above is in $\GapP$. We already mentioned that $f_0(\phi,\vect{i})\in \GapP$. $\VV(\vect{i},\vect{r})$ is also in $\GapP$, because it too is polynomial-time computable. (Just have a machine with one computation which accepts if the function is 1 and rejects if the function is 0.) For the same reason, $$f_1(k,j)= \cases{ 1, & if $k\leq j$\cr 0, & otherwise}$$ is also in $\GapP$. Repeatedly applying the closure properties from last lecture, we get \begin{eqnarray*} f_2(\vect{i},\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j) &=& \prod_{k=1}^j \VV(\vect{i},\vect{r}^{(k)})\\ &=& \prod_{k=1}^m f_2(k,j)VV(\vect{i},\vect{r}^{(k)}) \in \GapP\\ f_3(\phi,\vect{i},\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j) &=& f_0(\phi,\vect{i})f_2(\vect{i},\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j)\in \GapP\\ f_4(\phi,\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j) &=& S_j(\{\phi(\vect{i})\},\vect{r}^{(1)},\ldots,\vect{r}^{(m)})\\ &=& \sum_\vect{i} f_3(\phi,\vect{i},\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j) \in\GapP\\ f_5(\phi,\vect{r}^{(1)},\ldots,\vect{r}^{(m)}) &=& \prod_{j=0}^m (1-f_4(\phi,\vect{r}^{(1)},\ldots,\vect{r}^{(m)},j)) \in\GapP\\ f_6(\phi,\vect{r}^{(1,1)},\ldots,\vect{r}^{(m^2,m)}) &=& T_{m^2}(\{\phi(\vect{i})\},\vect{r}^{(1,1)},\ldots,\vect{r}^{(m^2,m)})\\ &=& \prod_{a=1}^{m^2} f_5(\phi,\vect{r}^{(a,1)},\ldots,\vect{r}^{(a,m)})\in \GapP \end{eqnarray*} This shows that the a single AND/OR gate in our circuit can be replaced by a $\GapP$ function of $\phi$ and the random variables $\vect{r}^{(a,k)}$. Notice that the above argument didn't use any special properties of the function $f_0(\phi,\vect{i})=\phi(\vect{i})$ other than it being in $\GapP$. Thus, the same argument tells us that the polynomials for gates higher up in our $\class{AC_0}$ circuit are in $\GapP$, since we can use $\GapP$ functions for their inputs. This can be written out formally as above, but the notation gets messy. So, we have shown that the probabilistic polynomial $P(\{\phi(\vect{i})\},\{\vect{r}^{(a,k)}\})$ is a $\GapP$ function $p(\phi,\{\vect{r}^{(a,k)}_{(a,k)}\})$ of $\phi$ and the random inputs. We would now like to conclude that $$g(\phi)= Q(\{\phi(\vect{i})\})= \sum_{\{\vect{r}^{(a,k)}\}} h^{(R)}(P(\{\phi(\vect{i})\},\{\vect{r}^{(a,k)}\})$$ is a $\GapP$ function of $\phi$, where $R=m^{O(1)}$. Unfortunately, none of our closure properties say anything about iterated composition, which is how we defined $h^{(R)}$. Fortunately, since $h(z)=4z^3+3z^4$ has such a simple form, the monomials in $h^{(R)}$ can be described explicitly, {\it i.e.} the function $C(d,1^R)$ which gives the coefficient of $z^d$ in $h^{(R)}$ can be computed in time polynomial in $R$. Thus, we have $$g(\phi) = \sum_{\{\vect{r}^{(a,k)}\}} \sum_d C(d,1^R) p(\phi,\{\vect{r}^{(a,k)}\}) \in \GapP.$$ and we are finished. \end{proof} \end{document}