WPS model optimization, a case study

Report by Bhavesh Sharma.

WPS and Silverpond's HighLighter

Wildlife Protection Solutions (WPS) is a non-profit organisation that provides solutions for wildlife conservation through their app wpsWatch. They use motion-activated cameras placed around protected areas to capture images of people and animals. Each image is reviewed for suspicious activity, such as dogs, trucks, or people in the protected areas during the night. However, this being a tedious manual process it is a perfect opportunity to employ machine learning. Silverpond was engaged to help solve this problem which they did using HighLighter, their cloud based machine learning platform. HighLighter analyses the images for WPS, and in turn it raises an alert if it spots anomalies. This alert is sent to a WPS team member, who then checks the alert and if they deem it to be dangerous they notify the rangers on-site. This has allowed WPS to identify threats with a more streamlined process and increased accuracy. You can read more about Silverpond’s work with WPS here.

Measuring performance

The Silverpond team were tasked with optimising the current model used by WPS. The WPS model is a Faster-RCNN (Region based convolutional neural network) model. An argument could be made for using more computationally efficient  models, such as YOLO or SSD, however given the rate at which the images are passed on to highLighter, Faster-RCNN gives the best ratio of speed and accuracy. Using slower but more accurate models would consume additional computational resources and hence increase the cost of running the system, whereas faster models would come at a cost of accuracy.

In 2018, Microsoft released the MegaDetector, an open-source model for use in conservation with some similar capabilities to the WPS model. Therefore, a key question in deciding how best to optimise the WPS model was: should we replace the WPS model with the MegaDetector? To answer this, we designed and carried out an evaluation to compare the performance of the existing WPS model to MegaDetector v3, the most recent version available at the time. The evaluation was designed to compare the models’ accuracy in detecting the presence of people in the park, since the MegaDetector did not have a class to detect vehicles and handles the detection of animals differently to the WPS model (detecting them as a single class ‘animal’, as opposed to the WPS model which distinguishes between different species).

The MegaDetector is another Faster-RCNN model, created to harness machine learning to the benefit of conservation biologists by helping them identify the presence of people, animals, and vehicles in wildlife parks. In Microsoft’s own words “The current model is based on Faster-RCNN with an InceptionResNetv2 base network, and was trained with the TensorFlow Object Detection API”. The data MegaDetector has been trained on remains confidential but the team that designed it has confirmed to using a couple of open datasets.

With both models deployed, we tested them out by feeding them both the same dataset; more than 50,000 random images captured by WPS cameras in production over the last year, including roughly 6,000 images with peoples in them. This allowed us to gauge the models in a head-to-head fashion, and the results are as follows:

The MegaDetector model started out strong, with a high rate of identifying people (recall of 62.9% and precision of 99%) on a sample of 6,942 people images, based on a confidence threshold of 80%. It also exhibited a very low number of false positives (0.9%), with an F1 score of 76.9%. The Microsoft model does well in terms of having low false positives, but this can be attributed to the fact that it tags less. Of the 6,942 people images, MegaDetector tagged 4,415. In layman’s terms the model prefers to miss a person detection rather than raising a false alarm. However, this is not appropriate for WPS’s business case, where the cost of a false negative is potentially high (for example, if poachers enter the park).

On the other hand, the existing WPS model shows a high identification rate as well (recall of 75.9% and precision of 94.3%) on a sample of 6,942 people images, using the same confidence threshold of 80%. It too, had a low false positive rate of 5.71%. However, it was ahead compared to the MegaDetector, whereby it tagged 5,545 images. A further analysis would compare the models across a range of confidence thresholds, or at a fixed level of precision that results in an acceptable rate of false positives for WPS.

Distribution of WCS camera trap images
Highlighting results for images containing persons

What we found

These results, though not decisive, motivated us to probe in a different direction. Now we sought to change our approach and compare the models using only images captured at night, since poachers usually roam the grounds in the cover of the night.

Thinking along those lines, the results were in favour of the WPS model. It managed to tag 96% of the nighttime images containing people, whereas MegaDetector tagged 55%. The WPS model tagged more false positives, but WPS finds these preferable to false negatives which can prove fatal. However, our work was not done. These findings merely proved that the current WPS model is better suited to the use case than Megadetector. But how to improve it? To find the answer we delved into the findings once again.

The most peculiar finding was that most of the false positives (50%) for the WPS model came from a single camera. The camera in question is one pointed at a guard post, which allows people in and out of the protected area. As the model was trained on data from this camera which showed Park Rangers at the same locations on a frequent basis, it associated those locations with the presence of a person, and tends to return false positives there. This makes a strong case for retraining the model with more images from those cameras, to address the issue of the number of false positives. With proper retraining, this model can iron out those lone incidents and further reduce the false positive rate. This will improve the efficiency of the system, since each false positive causes an overhead to WPS and the teams that work to keep the protected areas safe.

On review with the client, we found that WPS was already aware of this issue and had recently moved the camera to another area of the park. Undertaking this retraining would lift this restriction and allow them to reinstall the camera at the park entrance if desired.

Another observation we made was that clustering sightings of people over a short period of time into a single incident can yield higher confidence for the whole incident and reduce the number of alerts sent out to the WPS team. Reducing their work can help them to be more alert to single incidents rather than single sightings. This could in theory reduce their workload.

Is the MegaDetector a bad model? It fits the purpose it was intended for and can clearly identify animals and people. But it requires clear images as it was never meant to be deployed in the night looking for people. Perhaps after retraining it can yield better results in the night, but for our purposes the WPS model seems to fit the bill and it can be made more accurate with certain tweaks that will provide more support for the guards working to arrest poachers. Furthermore, for applications that require the classification of species of animals (e.g. population counting for conservation research or detecting dogs used by poachers) the WPS model is currently more suitable as it distinguishes between the relevant species out of the box. MegaDetector would require custom retraining to accomplish this, which was not investigated as part of this evaluation.

Our recommendations

For the choice of model, either: Continue with the existing WPS model, or; Run the existing model and the MegaDetector in tandem, since this has other benefits (for example, the detections that both models agree on have a very low false positive rate).

To improve the workflow, consider: Retraining the WPS model on images from the park-entrance camera site at times when no guards were present; and clustering groups of sightings into single incidents.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email

Comments are closed.