Friday, December 2, 2016

AI, Machine Learning, Deep Learning and Cyber Security

Many of us might have heard the terms AI, machine learning and deep learning. Some of us also might have heard that they can have a big impact on cyber security. What are AI, machine learning and deep learning actually? And, how can they improve cyber security? In this article we would discuss about that.

What is Artificial Intelligence?

Artificial Intelligence or AI is the science and engineering of making a machine intelligent, so that it can perform tasks similar to those that require human intelligence. It can give machines the ability to learn without being explicitly programmed. For example, a machine can know about the facts about a specific situation and based upon that it can decide upon its action to achieve a goal. It can look at the previous steps of a game of chess and decide on what can be the best possible next move. Or a machine can know about the general facts of the world, facts about a particular situation and a statement of a goal and it can plan a strategy or sequence of actions using AI to achieve its goal.

Artificial Intelligence is widely used in many areas, like:

  • Playing games like chess
  • Speech Recognition
  • Understanding natural language
  • Computer vision
  • Building expert systems

What is Machine Learning?

Machine learning is a sub-field of AI that gives machines the ability to learn from data and make predictions based on that. For example, a machine can use machine learning to learn from a set of inputs and its corresponding outputs and based on that it can predict the output of a new input data. Applications of machine learning includes spam filtering, Optical Character Recognition, search engines, computer vision and cyber security.

There can be three types of machine learning algorithms:

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

Supervised Learning – In this technique, the machine is provided with a set of inputs and its corresponding outputs. The machine uses supervised learning to obtain general rules that map the inputs with the outputs. The algorithm typically iteratively makes predictions on the training input data and adjusts itself from the feedback. It stops when an acceptable level of performance is achieved. This is called supervised learning because the training dataset supervises the learning process.

Unsupervised Learning In unsupervised learning, the machine is provided with only the input data with no labels on them. The goal is to learn the underlying structure or distribution in the data and predict outcome of similar input data based on that. For example, it can extract features on the input dataset and divide them into similarity groups, so that when a new data comes, it can predict its output based on the information. A common application can be in an ecommerce website, where machine learning can be used to divide the customers into segments and draw inferences based on that to use it in a marketing campaign.

In many applications, semi-supervised learning algorithm is used, where the machine uses both supervised and unsupervised algorithms to learn from the training datasets.

Reinforcement Learning In reinforcement learning, the machine interacts with the dynamic environment to perform a certain goal. A good example can be playing a game of chess, where the machine can use machine learning to learn from the previous steps and decide on its next move. And, based on the user’s next move, it can again decide on its next action.

What is Deep Learning?

There are several approaches of machine learning algorithms. One such approach is to use artificial neural network. An artificial neural network is a machine learning algorithm that is inspired by the structure and functional aspects of biological neural networks. The neurons in the neural network are connected to each other, through which data can propagate. In a simple case, there can be two sets of neurons – ones that receive the input signals and ones that send the output signals. Deep Learning uses several layers between the input layer and the output layer.

In Deep Learning, when an input is given to the input layer, the input layer processes the input and passes on a modified version of the input to the next layer. Each neuron in the neural network assigns a weighting to its input and the final output is determined by the total of those weightings.

A simple example of using deep learning can be recognizing a stop sign from an image. Attributes of the stop sign image like its octagonal shape, red color, letters used, size of the traffic sign etc are examined by the neurons and based on that each neuron gives a weighting. Depending on the weightings, the deep learning algorithm can come up with a probability vector whether the image can be a stop sign.

So, to summarize, machine learning is evolved from a sub-field of artificial intelligence. And, a sub-field of machine learning is deep learning. Falling hardware prices and the development of GPUs have contributed to the development of Deep Learning.

AI, Machine Learning, Deep Learning and Cyber Security

Let’s try to understand, how AI, machine learning and deep learning can be used to improve cyber security.

Traditional Malware Detection Techniques

There are several ways malware are detected using traditional anti-malware programs. Some most common of them are:

Signature Based Detection – In this technique, an unidentified piece of code is compared with a database of signatures of known malware. If a match is found, the new piece of code is identified as a malware. But, the problem with this approach is, signature based detection cannot detect new malware the signatures of which are not updated with the database. Moreover, sometimes it takes months to release signatures of newly found malware. And so, this technique is extremely inefficient in detecting malware especially Zero Day Threats and APTs.

Heuristic Techniques – In this technique, the unidentified piece of code is made to run and the behavioral characteristics of the new code is observed. Malware behavior is typically observed at runtime, once the code starts execution. So, the prevention mechanism gets delayed which makes it ineffective at times.

Sandbox – In sandbox solutions, the unidentified code is executed in a virtual environment and its behavior is observed to determine whether it can be a malware. This process is time consuming and ineffective for real-time protection. Moreover, the malware can stall its execution once it detects a virtual environment, which makes its detection challenging at times.

Malware Detection using AI, Machine Learning and Deep Learning

Machine Learning can be used in more effective malware detection. In this technique, a file’s behavior is observed to detect whether it can contain a malware. This is done by training the machine learning algorithm with the help of some manually selected features, that can determine whether the file is malicious or legitimate.

This is no doubt a better approach, but it has its own disadvantages. This technique requires human intervention to teach the machine the parameters, variables or features based on which malware detection can be done. And, to address that an advanced technique is used that uses deep learning to detect malware.

In this technique, a dataset of huge number of malicious and legitimate files are fed into the machine. The machine uses deep learning to self-learn the features necessary for malware detection. When the learning completes, the machine can detect any malicious file type. Also, threats can be detected in real time and potential threats can be blocked. This technique can be quite effective in detecting even Zero Day threats and APTs.

AI, machine learning and Deep Learning technologies are evolving day by day. And, if used properly, they can improve cyber security up to a great extent.

1 comment:

  1. Nice article. Personally I have lurking around this for sometime now. I want to make this my MS thesis. I am pretty sure that Deep Learning can impact Malicious Code Detection in ways we didn't imagine.