Machine learning for anomaly detection: a technical overview
February 9, 2023
ML-enabled anomaly detection applications differ from traditional software in terms of detection technique:
- Rule-based anomaly detection
Traditional anomaly detection solutions typically trigger an answer when one or more predefined conditions are violated.
Example: A credit card payment exceeding a certain threshold.
- ML-based anomaly detection
ML algorithms are trained to autonomously discover recurring patterns or clusters among key variables and data points by processing large datasets. Once an ML system runs into data that doesn't fit any existing pattern, it may have identified an anomaly.
Example: An unusual credit card payment deviating from its holder's typical purchasing patterns.
According to Technavio's 2022 Anomaly Detection Market Forecast and Analysis, machine learning consulting represents one of the main tech and business trends in the anomaly detection market, which is estimated to grow by $4.23bn from 2021 to 2026 at a maximum CAGR of 15.08%.
Table of contents
Anomaly detection market stats
Data source: Technavio — Anomaly detection market by deployment and geography. Forecast and analysis 2022-2026
Market size growth (2021–2026)
Estimation of year-over-year growth rate of 2022
of the growth will originate from North America
CAGR with accelerating momentum
- Key market trends
- AI & machine learning
- Internet of Things
- Data analytics
Types of anomalies
Anomalies can be classified into three main archetypes according to their relationship to the majority of the data under consideration.
Point anomalies
Point anomalies
An individual data point assumes an abnormal value compared to the common value range in the dataset.
Example: A suspiciously high-value card payment or bank deposit considering the account holder’s previous transactions.
Contextual anomalies
Contextual anomalies
This type of anomaly is context-specific as it entails a data point that is anomalous compared to most data points in the same scenario (typically from a temporal perspective).
Example: A spike in network traffic overnight or a skyrocketing sales growth outside the holiday season.
Collective anomalies
Collective anomalies
These are subsets of data points that might not seem anomalous data per se but raise suspicion when occurring together.
Example: Multiple login attempts from the same account or a sequence of unusually expensive purchases.
11 ML-based anomaly detection use cases by industry
Let's take a look at some of the real-world data science and machine learning use cases for anomaly detection in key industry scenarios.
Finance and banking
1 Stock market manipulation
Machine learning-based anomaly detection systems, combined with financial predictive analytics tools, are commonly deployed by major financial players, such as Nasdaq. These embodiments of machine learning in the stock market can easily detect brokers' anomalous trading patterns to prevent fraud (including churning, spoofing, and wash trade) and ensure compliance with strict market regulations.
2 Money laundering
Machine learning-powered anomaly detection solutions can identify and report unusual transactions carried out by suspicious organizations, such as a small group of newly created companies located in tax havens and exchanging large sums of money despite their limited number of customers.
3 Tax fraud
Machine learning-based systems can examine the companies’ general ledgers and recognize signs of tax fraud. Among suspicious anomalies, we can include inconsistent itemized deductions, multiple tax refunds filed from the same IP address, and significant changes in corporate sales.
IT sector
4 Cyber attack
Hackers may try to violate corporate systems or networks to steal assets and data. A machine learning-powered intrusion detection system (IDS) using network behavior anomaly detection (NBAD) can hinder similar attempts by tracing any atypical event, such as coordinated access via multiple accounts provoking a spike in traffic volume and bandwidth, and flag it as a potential cyberattack.
5 Data preparation
Preparing high-quality training data for processing is essential in training an algorithm for anomaly detection. Meanwhile, an ML-based anomaly detection system can help perform the opposite procedure, spotting inconsistent or corrupted data and thus facilitating data cleaning.
Healthcare
6 Medical diagnostics
Machine learning systems can examine radiological images, body scans, and other medical sources to quickly identify patient condition anomalies that could be signs of upcoming health complications (including brain aneurysms and tumors). This allows physicians to speed up clinical procedures, set up suitable preventive treatments, and dedicate more time to patients’ psychological well-being.
7 Healthcare fraud
Insurance companies and healthcare institutions leverage machine learning techniques to prevent fraud. Along with natural language processing software, ML-based fraud detection solutions can scan medical reports and insurance claims to identify anomalies and inconsistencies, such as incorrect diagnoses or inflated medical coverage costs.
Train GAN to reconstruct next 3 healthy MRI slices from previous 3 ones
Based on reconstruction, classify MRI scans into healthy or diseased
Image title: Example of unsupervised medical anomaly detection
Data source: bmcbioinformatics.biomedcentral.com — MADGAN: unsupervised medical anomaly detection GAN using multiple adjacent brain MRI slice reconstruction, 2021
Retail and ecommerce
8 Electronic payment fraud
This type of crime has become a major threat to retailers, shopping platforms, and their customers. Deploying machine learning-driven systems in retail can help de-escalate this threat by spotting anomalous account behaviors (such as a rising transaction frequency and a change in IP addresses or login times), flagging suspicious users, and even blocking them.
9 Security
ML-powered video surveillance systems leverage machine learning and computer vision to distinguish anomalous behavioral patterns (such as a customer grabbing a product and hiding it in their pocket) and therefore ensure a safe shopping environment.
Manufacturing
10 Quality assurance
Combined with computer vision in manufacturing, ML-based anomaly detection allows manufacturers to double-check the quality of their products and packaging before they leave the factory. This involves accurate visual inspection via high-res cameras to spot design anomalies that may hinder product usability.
11 Predictive maintenance
Real-time condition monitoring relies on ML-based anomaly detection and IoT-powered sensors to collect data from industrial equipment, spot any shift from their standard performance, and predict impending failures. Based on such forecasts, manufacturers can perform maintenance operations to fix their assets.
Enhance your business with our machine learning solutions
Examples of ML-based anomaly detection
Anomaly detection approaches
An ML algorithm can learn to identify patterns and anomalies via three different training techniques:
The anomaly detection algorithm is trained with already labeled data, namely the data already labeled as normal or anomalous.
Pros
Cons
Unsupervised anomaly detection
Our data scientists and ML engineers provide the algorithm with unlabeled datasets and let it discover patterns or anomalies on its own.
Scheme title: Unsupervised machine learning for anomaly detection
Data source: pwc.com—Using machine learning to identify unusual patterns in data
Pros
Cons
Semi-supervised anomaly detection
This approach combines the previous anomaly detection techniques to maximize their pros. Data engineers provide an algorithm with a small amount of labeled data to partially train it, then use the same algorithm to label a larger dataset autonomously (pseudo-labeling). If the generated labels prove reliable, these newly labeled data points are added to the original set to fine-tune the algorithm.
Pros
Cons
ML algorithms for anomaly detection
Data engineers rely on several machine learning techniques and algorithms to build machine learning models for anomaly detection systems. Here's just a brief selection of the most common ones.
Data source: IEEE — Machine Learning for Anomaly Detection: A Systematic Review, 24 May 2021
Machine learning techniques
- Random tree (RT)
- Random forest (RF)
- L48/C.45
- Entropy
- One-class SVM
- Two-class SVM
- Core vector machine (CVM)
- Kernel methods
- Genetic algorithm (GA)
- Linear embedding
- K-means
- Hierarchical clustering (HC)
- Fuzzy clustering
- Nearest clustering
- Logistic
- Linear
Support vector machine
A supervised learning algorithm that performs very well with large datasets but requires high computing power and is less reliable than other options when analyzing complex anomalies.
Decision tree
Another supervised learning algorithm following a tree-like decision-making model in which every branching represents the analysis of a specific variable to predict if a particular event is anomalous or not.
Random forest
As well as the isolation forest, is a powerful algorithm combining multiple decision trees to analyze larger datasets and enhance its pattern recognition and anomaly detection capabilities.
Logistic Regression
A supervised learning algorithm designed to assess the probability of a certain outcome between two alternatives (normal event or anomaly) depending on a range of key variables.
K-nearest neighbor
A distance-based, supervised learning algorithm that predicts the nature of a potentially anomalous event by comparing it with similar events recorded in the past and defined as "neighbors".
Neural networks
Complex sets of deep learning algorithms comprising interconnected layers of artificial neurons that mimic the human brain's architecture, typically deployed to detect the most subtle patterns and anomalies via unsupervised learning. For example, Convolutional Neural Networks and Bayesian Neural Networks.
The roadmap for adopting ML-based anomaly detection software
These are the main steps required to build and deploy an anomaly detection software solution using machine learning algorithms.
1
Data strategy
2
Data source selection
3
Data collection
4
Data preparation
5
Data modeling
6
Software development
7
Data analysis
8
Ongoing support
Benefits of machine learning for anomaly detection
ML-powered anomaly detection systems offer several advantages over traditional solutions.
Superior reactivity
Enhanced scalability
Wider data pool
Greater accuracy
Solving ML-based anomaly detection challenges
Potential challenge
Recommendation
Training times
Algorithm training for anomaly detection is a time-consuming and computationally demanding process, as the datasets should be large enough to provide sufficient examples of outliers.
Training times
Algorithm training for anomaly detection is a time-consuming and computationally demanding process, as the datasets should be large enough to provide sufficient examples of outliers.
A common trick for training optimization is to select a smaller subset of essential features (such as IP address, transaction data, or payment method) and discard irrelevant attributes, depending on your scenario.
Compliance
The challenging trade-off between ML algorithms' data hunger and strict data management legislation can be a massive downside in highly regulated industries such as finance and medicine.
Compliance
The challenging trade-off between ML algorithms' data hunger and strict data management legislation can be a massive downside in highly regulated industries such as finance and medicine.
Ensure that your ML-based anomaly detection solution complies with all major standards and regulations applicable to your industry, such as GDPR, HIPAA, and PCI DSS.
Unbalanced datasets
Anomalies, by their very nature, are much less abundant than standard data points with normal behavior. This can make training datasets unbalanced and algorithms potentially biased.
Unbalanced datasets
Anomalies, by their very nature, are much less abundant than standard data points with normal behavior. This can make training datasets unbalanced and algorithms potentially biased.
You can use synthetic minority oversampling or majority undersampling techniques to artificially reduce the number of outliers compared to normal data instances and therefore ensure a more balanced dataset.
Addressing risks with algorithms
ML-based anomaly detection systems have shown their potential in proactively addressing risks in different industries and applications, from fraud prevention and cybersecurity to advanced diagnostics and real-time asset monitoring. Furthermore, anomaly detection with machine learning has proved superior to its more traditional, rule-based counterparts, thanks to a successful mix of reactivity, scalability, and accuracy. Despite some algorithm training and compliance challenges, machine learning in anomaly detection can make the famous motto "prevention is better than cure" a reality. If you aim at enhancing your risk management capabilities, consider implementing a machine learning-based solution expertly crafted by Itransition.
Learn how your business can benefit from machine learning
FAQs
Why do you need machine learning for anomaly detection?
Compared to traditional methods, machine learning solutions for anomaly detection show a lower rate of false positives, enhance performance as they process new data, and better deal with new types of anomalies.
What are the approaches to ML-based anomaly detection?
Anomaly detection with machine learning can take three approaches, depending on the training technique used to teach an algorithm to identify anomalies: supervised, unsupervised, or semi-supervised.
Which machine learning algorithm is used to detect anomalies?
Machine learning engineers can count on various machine learning and deep learning algorithms, including one-class support vector machines (one-class SVMs), DBSCAN, decision trees, random forests, logistic regression, k-nearest neighbor, Python Outlier Detection (PyOD), and different types of neural networks.
Service
Machine learning consulting services & solutions we deliver
Explore our range of machine learning consulting services, along with related technologies, use cases, implementation roadmap, and payoffs.
Insights
AI in radiology: top 10 use cases & best practices
Discover how radiologists use AI to streamline medical image processing and learn about the most prominent AI-based radiology solutions available today.
Insights
Machine learning in manufacturing: key applications, examples & adoption guidelines
Learn how machine learning can help manufacturers to improve operational efficiency, discover real-life examples, and learn when and how to implement it.
Insights
Machine learning for stock market prediction: a tech overview
Explore the trading opportunities, key algorithms, implementation guidelines, and challenges of machine learning for stock market prediction.
Case study
ML PoC for a plant pathology recognition solution
Learn how we developed a PoC for an ML plant pathology recognition solution, helping the customer attract investments and partner with scientific institutes.
More about machine learning services
Services