We do Deep-Learning.
Jul 22, 2018 • Noon van der Silk
Hello! I thought it might be nice to occasionally review a few interesting papers that have come across my desk, that were interesting in one way or the other.
Let’s take a look at an odd bunch that I was interested in.
The background on this one is that no-one actually knows if quantum computers are fundamentally better than classical computers at any specific task. All the big results so far indicate that this could be the case; because we’ve not been able to find classical algorithms that are as good as the known quantum ones, for, say, factoring. But in the world of complexity theory, evidence isn’t enough; we want proof!
In order to get a feel for this result, we need to understand the basic idea of the “PH”: the Polynomial Hierarchy, and “BQP”: Bounded-Error Quantum Polynomial time.
Loosely, we can think of BQP as the set of problems that a quantum computer could solve with high probability.
Unfortunately, the Polynomial Hierarchy is harder to easily visualise. At the very least, loosely, it contains:
- all the problems that a classical computer can solve;
- all the problems that an NP-computer could solve: those problems for which one can easily verify the answer,
- all of the problems that a co-NP computer could solve: those problems for which one can easily prove that there is no answer,
- …, and so on, in that fashion
Lrrr (Futurama): We will begin with the firemen, then the math teachers, and so on in that fashion until everyone is eaten.
The main understanding to take away from this is that the computing power of the Polynomial Hierarchy is huge. Certainly we shouldn’t expect that a quantum computer could compute any arbitrary problem (efficiently) inside of it. The question we’d like to know, however, is, even if we had the entire power of PH, would we be able to efficiently compute any function that a quantum computer could? I.e. is PH at least as powerful as BQP?
This paper resolves this question, with respect to an Oracle, with the (somewhat-surprisng!) answer of “no”!
The “respect to an Oracle” part is interesting. Programmatically, one can think of an Oracle as a black-box function, that you can evaluate at any time for no cost; that is, it gives you the answer immediately.
While we were discussing this paper at the office, Ruth and I came up with the idea, here, of thinking of an oracle as a particular type of tool, that both classes, PH and BQP, have access to, but only BQP knows how to use really well. This is what allows the authors to show that, given this tool, a quantum computer is more powerful than, if you will, a PH-computer. Less generally and abstractly, it at least shows that a quantum computer with this particular tool, is more powerful than a classical computer with the same tool.
In many ways, this is a very abstract result, because it doesn’t yield any new algorithms for programming, nor is it proof that quantum computers are more powerful than classical computers, in an unrestricted way. The reason for this is that, in reality, these “Oracles” don’t exist, so the result is only, if you will, theoretical.
Never the less, this is, in fact, quite a big result in the world of quantum computing. To my mind, it proves some intuition that we’ve had about quantum computing for a while: it’s fundamentally different than classical computing.
This paper is particularly nice because it’s summarised nicely by a single picture:
Basically, the key idea in this paper is that predictions should be made for the “right” reasons; and that this notation can be enforced mathematically.
I like this paper because of its nice ideas, and because of the demonstration of a technique for handling bias that is based on a neat idea: when making gendered judgements, don’t use context, use direct evidence. I.e. don’t guess it’s a man because of the computer; guess it’s a man because you are looking at the man. This is very neat, because it’s an approach to managing bias in data sets that isn’t about adding more data to some dataset, to debias it; it’s about the invention of simple techniques (new loss terms, in the case of this paper) that make predictions in existing datasets better.
This is an awesome paper that I think has flown under the radar a little bit. Even better, Alexia, the author, even provided an implementation on GitHub.
They key idea of this paper is that there is a missing observation in the standard way GANs are formulated. In the paper the author explains and justifies the missing ingredient in a few ways. The main observation is that in a typical GAN, you have the two networks, G, and D. D is supposed to be able to tell the difference between fake and real data, while G is supposed to be able to generate fake data that D will think is real.
From this paper I learned that there are two ways of thinking of the loss functions in GANs, one is as an alternating maximisation/minimisation of the same loss function, to step-by-step improve the two networks, which is referred to as “saturating”. The other interpretation, “non-saturating”, is when we think about optimising the same loss function, but swapping fake data with real data. In this setting, we can do this by having a minibatch contain 50/50 real and fake data.
The observation of this paper is that, in this setting, what’s missing is the a priori knowledge that we construct the minibatch in this manner. The conclusion is that, because of this, the discriminator on real data, should decrease while the discriminator on fake data, should increase. In the readme of the code, Alexia explains how you can easily add these terms into your own GAN models.
Naturally, the proof is in the results, so check out the code for more!
This one is reasonably interesting. As we opened with, we’re still searching for an unconditional proof that quantum computers are exponentially more efficient than classical computers.
A while back, the now-famous HHL algorithm (see Quantum linear systems algorithms: a primer for an intro) was introduced. People were excited, but there were many caveats that restricted its use in “practical” settings (if having access to a quantum computer can be called practical …).
More recently, the HHL algorithm was used in a setting where all of these caveats were addressed; that of “Quantum recommendation systems”. Therein they achieved a “poly-log” algorithm in the number of products and users, whereas (until recently!) the best known classical algorithms run in time linear in the number of products and users.
Recommendations are basically the most common application of machine learning, and arguably the most successful; so an algorithm to make them radically faster is a big deal.
What’s fascinating is that Ewin has managed to obtain a fast classical algorithm from the quantum one! On the one hand, this indicates that this exact approach isn’t a way we can separate quantum from classical computing, but as mentioned on Scott Aaronson’s blog the upside is that we have classical computers now that we can program with Ewin’s algorithm!
There are some caveats, namely that the exponents on the terms are large, but judging from the comments on the Scott’s blog, we’ll surely be seeing some big improvements there in the coming months.
Thanks for reading! Let me know if you’ve been reading any cool papers, or if you’ve got any thoughts on the above! Always interested in collaborating with people.