Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak
Proceedings of 32nd Annual Conference on Neural Information Processing Systems (NeurIPS), 2018

Suppose we have many copies of an unknown \(n\)-qubit state \(\rho\).  We measure some copies of \(\rho\) using a known two-outcome measurement \(E_{1}\), then other copies using a measurement \(E_{2}\), and so on.  At each stage \(t\), we generate a current hypothesis \(\omega_{t}\) about the state \(\rho\), using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that \(\left\vert \text{Tr} \!\left( E_{i} \omega_{t}\right) -\text{Tr} \!\left( E_{i}\rho\right) \right\vert \), the error in our prediction for the next measurement, is at least \(\varepsilon\) at most \(\operatorname{O}\!\left( n / \varepsilon^2 \right)\) times. Even in the non-realizable setting – where there could be arbitrary noise in the measurement outcomes – we show how to output hypothesis states that incur at most \(\operatorname{O}( \sqrt {Tn}\;)\) excess loss over the best possible state on the first \(T\) measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results – using convex optimization, quantum postselection, and sequential fat-shattering dimension – which have different advantages in terms of parameters and portability.