Today, there were only 3 talks; and we had a free afternoon! So we climbed* the big mountain!

*: Okay okay, by “climbed” I mean “took the lifts”. But there were 3 lifts!

### Opening Talk: Learning from Quantum Data: Strengths and Weaknesses

by Ronald de Wolf

This was a great and classic talk in the same vein as many talks in the theory of quantum computation.

Ronald addressed the natural problem of what you could do if you your data was given you to as a quantum state (thereby ignoring all the problems that have been brought up with QRAM in the past few days; let’s just suppose we have the state!).

The proposal is to consider supervised learning in a very formal sense; imagine we have a function:

$$f: {0,1}^n \to {0,1}$$

Then, we can think of it as a supervised learning problem where we have $$n$$ binary features, producing a single binary output, and we have our examples in the form: $$(x, f(x))$$

Ronald wanted us to consider the so-called *sample* complexity, i.e. how many times do we have to evaluate $f$, instead of the more standard $time$ complexity. In this sense here, the ideas are at least related, because fewer samples will take less time.

In general we’d need $$2^n$$ examples to learn $$f$$ fully, but we’ll want to do *efficient* learning, so we’d like to learn from far fewer queries than that.

Ronald preferred to work in the framework of PAC-learnability, which was introduced by Leslie Valiant in 1984. Leslie has a nice readable book on the topic (I’ve unfortunately lost my copy a while ago) which is well worth a read.

It turns out that in 1995, Bshouty and Jackson introduced a quantum version of this notion (see also Quantum DNF Learnability Revisited).

Their idea is to consider the state of training data:

$$ \sum_{x \in {0,1}^n } \sqrt{D(x)} |x, f(x)\rangle $$

To demonstrate a speedup, suppose that we have the uniform distribution over the samples; so then our state becomes

$$ \frac{1}{2^n} \sum_{x \in {0,1}^n } |x, f(x)\rangle $$

The main idea is to hit this with the Fourier sampling tool, that is a classic trick in quantum algorithms.

The essential idea is, firstly support that $$(x) = \pm 1$$, then we can do another standard trick, the Hadamard transformation, and obtain the state

$$ \frac{1}{2^n} \sum_{x \in {0,1}^n } \hat{f}(s)|s\rangle $$

where $$\hat{f}(s) = \frac{1}{2^n} \sum_{x} f(x) (-1)^{s \cdot x}$$.

The point here is that when you measure this final state, the state you see is $$s$$ with probability $$\hat{f}(s)^2$$. For a certain choice of $$f$$, namely when $$f$$ is linear mod 2, then you can learn the function perfectly in exactly one query. Classically you would require at least $$n$$ queries. Great reduction!

He then went into a few more examples, getting a little bit more technical each time; one was for the so-called “Coupon collector” problem, which is simply: imagine you get a random baseball card from a store, how many times do you need to visit the store before you have the entire set of cards?

Again, because this is a uniform problem, we can use similar techniques to improve on it. Classical, one can find that the expected number of store visits (samples) is approximately $$N \log N$$ (where $$N$$ is the number of cards in total), but quantumly it can be done with $$O(N)$$ samples.

Perhaps of interest, was some conclusions that they were able to show that *no* quantum speedup can be obtained when for *all* distributions; i.e. there are some bad distributions that we will just struggle with. I don’t think anyone finds this particularly surprising, and shouldn’t have a big impact on real-world problems, because typically, most data isn’t, shall we say, adversially prepared; the distributions tend to be wildly different (a natural example being the set of all images of mountains as a subset of all possible images).

The main idea here is that by utilising standard tricks from quantum algorithms we can get significant speedups when we know something about the distribution that we are learning from.

### Limitations on learning of quantum processes

by Mario Ziman

This one was a little bit technical, but it had some interesting ideas.

Mario phrased the problem of quantum machine learning as follows:

The point being that in typical ML we want to modify the function $$f$$, and in quantum ML then we wish to modify the unitary $$U_f$$. This turns out to be quite problematic, because this function is truly quantum, and we know that quantum states suffer from the no-cloning principle: you cannot copy an unknown quantum state; so you can only use it once. Mario mentioned that this results in a *No-programming* theorem, which means that we cannot perfectly store an unknown quantum transformation (i.e., *even if* we find a $$U_f$$ that works well; we can’t save it! We have to know how many times we want to use it *in advance!*).

However, it turns out we can still make some progress by, instead of requiring some kind of exact copy, we aim for probabilistic performance. The only remaining thing I got from this talk was that probabilistic learning, in their sense involving some kind of “probabilistic Q learning” is related to quantum teleportation. You can read about this work morehere.

### Discovering physical concepts with neural networks

by Renato Renner

This was a talk I was quite excited about. The paper for it is here. They’ve also made the code available, in TensorFlow.

The fundamental idea is really nice: Maybe we can build a neural network to act exactly as a standard human physicist would: they shall observe data, try and form theories, ask questions of their theories, and then check the answers.

This is very nicely expressed by the picture from the paper:

The point is that they train an autoencoder on the data, but the put constraints on the latent vector so that the entries by adding the mutual information between them on to the loss so-as to require independence; i.e. the parameters that the network learns should be independent.

They then introduce this idea of questions and answers. Unfortunately, I think the way this idea doesn’t go far enough is by, it seems to me, treating the answers as the input data itself; so that truly this network only functions as a special kind of autoencoder (at least, this is how it is in the code, and as far as I can see in the paper).

In any case, they’re able to show several problems where it is able to look at experimental data and where the latent variables are correlated to the terms that they expected to see in the equations that model these phenomena. I think that’s pretty cool.

One thing Renato noted quite strongly was that perhaps this isn’t super surprising, and maybe the network received a lot of help by the way the data was sent to it. I think that this could be mitigated, maybe, by thinking a bit more about the quesiton/answer set up, and more generally thinking about how we can allow the network to reject data samples.

Some ideas I had during this talk related to the kind of “falsifiability” ideas; namely that if this system is forced to come up with theories for data, then how can we also ensure that the theories it has can be proven wrong?

My other idea was, what if instead of just having a latent variable, i..e an independent variable, have an entire neural network in there, that could perhaps serve as an “independent explanation”.

Overall, I quite enjoyed this talk and idea.

### Open areas of investigation

The interesting things that came to me regarding today were:

- What other cool quantum algorithm tricks should we be using?
- Bomb testing?
- The bomb-testing thing is a kind of learning anyway. Probably interesting?

- If we do really want to learn truly quantum unitaries, then what’s the deal with having to know how many times we want to use it!? That’s crazy!
- How can we make a better machine for generating theories?

### That’s it!

The day ended with a conference dinner. I’ll have a little bit to say about the organisation of this conference on the final day! Hope you’re enjoying these updates!

See also: