By Anne Nijsten - 21 February 2022
As the use of machine learning models becomes more and more popular in this world, naturally the question arises whether these models can also be misused. In this question, we do not concern ourselves with the ethical considerations related to machine learning, but rather how existing models can be exploited to perform malicious attacks.
There are a variety of attack scenarios. One can, for example, poison the training data of the model by adding wrong data, which is dangerous for models that often re-train data. Another way is using adversarial examples: adding perturbations to inputs that are normally correctly labeled, causing them to be mislabeled. In particular, these perturbations may not be distinguishable for a human observant, as in figure 1 [3]. While this example may be funny, consider the case where researchers came up with changes to stop signs made with some tape looking like regular graffiti. This caused their model to misclassify the signs as speed signs, which shows that these perturbations could result in dangerous real-world situations, for example for autonomous cars [2]. This type of attack is not limited to image processing but is also found in e.g. speech recognition and natural language processing (NLP). By replacing some words that are almost synonymous, the sentiment classification (positive/negative) of a sentence may change for a NLP algorithm, while humans hardly notice a difference. Note that these word changes are more visible for the human eye than the small pixel perturbations in images [1].
Figure 1
While researching these attacks, people sometimes also came up with explanations why the models could be vulnerable. Based on these results and more general defensive techniques, countermeasures are developed to make the models more robust. For your own work: start by thinking about what goals an attacker may have concerning your model. Could you come up with an attack strategy?
The vulnerability of these machine learning models shows that the models do not understand the task that is given to them, even if they generally correctly label test data. From that conclusion, it seems like the intelligence in this form of AI still has some way to go.
References:
[1] Moustafa Alzantot et al. 2018. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998
[2] Ivan Evtimov et al. 2017 Robust physical-world attacks on machine learning models. arXiv preprint arXiv:1707.08945
[3] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
Intermate is the study association of the bachelor Technical Innovation Sciences, the majors Sustainable Innovation and Psychology & Technology and the masters Human Technology Interaction and Innovation Sciences.
040-247 4430