Post by Kang-Hsu Brian Wang
This is a summary of a research I did while I was interning at Silverpond. I was experimenting on improving the data efficiency by reducing the amount of labelled data needed for training a model. The model I tried was Convolutional Autoencoder with support vector machine. This is a semi-supervised learning algorithm which trained the autoencoder model as a feature extractor with unlabelled data and used the extracted features with label to train the support vector machine classifier.
Why do we need semi-supervised learning for deep learning models?
Supervised deep learning algorithms require heaps of labelled data to train with. However, in some fields or for some types of data, it costs too much to label. Semi-supervised learning requires little labelled data and is mostly trained with unlabelled data.
Why do we use autoencoder with SVM?
We use autoencoder due to its simplicity. It is an unsupervised learning algorithm that acts as a feature extractor. Furthermore, SVM is known for working well with little amounts of data and a high number of features. This might not be the perfect method, but it is a good strategy to start from.
The process of the applied semi-supervised learning can be split into two parts: the Convolutional Autoencoder (CAE) and the Support Vector Machine (SVM).
1. CAE training
The extracted features and reconstruction performance are two important things for the CAE.
However, the goal of training an autoencoder is to extract useful features for data reconstructions. On the other hand, the goal of our semi-supervised model is to classify data accurately. Since the training goals are different, we shouldn’t over rely on the loss score of the autoencoder.
CAE SVM on MNIST:
The first problem I faced when constructing a CAE was that I did not set a bottleneck to force the model to learn. The model simply memorised the image but did not learn from it.
After adding a bottleneck for the autoencoder’s architecture, it reached a loss score of 0.023 and helped SVM achieve a 97% accuracy on MNIST.
CAE SVM on CIFAR10:
The second problem was that after tuning the learning rate and other hyper-parameters, the model only achieved a result of (train 31) (test 29) on CIFAR10.
Since the CIFAR10 is much more complicated than MNIST, I tried increasing the feature amount from 10 to 30 because the input channel of CIFAR10 was three times larger than MNIST. As a result, the accuracy increased from (train 31) (test 29) to (train 42) (test 39).
2. SVM training
Apart from CAE-SVM, I also tested the performance of SVM itself to get a baseline performance to compare with.
The classifier I used is SVM with OneVSRestClassifier to do multiclass classification and I had all the pixels as features. The result I got for the SVM model and pixel wise features was (train 85) and (test 56).
Results and conclusions
Increasing the feature amount given to the SVM model did improve the performance of the classifier. However, this method has limitations. If the feature amount is too large, the bottleneck will no longer exist and will return to the first problem of the model not learning anything. This is the trade off between the feature size and bottleneck. From the improvement of the accuracy, the model’s performance didn’t seem to be able to reach 100% before its limitation.
From the second comparison, we can see that CNN has the best performance and the CAE-SVM model has the worst. Despite sharing a similar network structure, the CNN model is trained end to end, which may be the reason why it has a better performance. We can also see that even the baseline performance, the pure SVM model, has a higher accuracy than the CAE-SVM model. The reason for this may be that the CAE part is not providing useful features. Moreover, the pure SVM model has a similar performance to CNN on small datasets.
In conclusion, SVM seems to be a good classifier, but the feature extractor needs to be improved. Improvements can be made either by changing the CAE architecture or trying other feature extractors.