Thank you for Subscribing to CIO Applications Europe Weekly Brief
Know the Limitations of your Machine
By Johannes Bauer, Data Scientist, IHS Markit
There have been considerable developments in the field of ML. For instance, in 2013, speech recognition technology still delivered error rates of 1 in 5 words, but today it gives largely accurate results (< 5 percent error), as we ourselves can easily experience through our smartphones. Image classification is on-a-par with human performance. Tech optimists argue that ML and Artificial Intelligence (AI) will soon be solving most of our problems.
I will argue here that whilst it is certainly true that considerable progress has been made in the field, certain challenges will remain. These are fundamental in nature and cannot be easily overcome.
Structuring Problems and Providing Interpretations
How to frame problems in a certain context, what data to use, which relationships to expect? Answers to these questions are typically provided by humans with relevant domain expertise, and it is currently not clear how these questions could realistically be answered by ML. Complex machine learning models deliver accurate performance in many contexts. This does, however, not mean that it is easy to understand and communicate how predictions are made. For human decision makers, it is often important to understand why a certain result is the outcome of an algorithm. In certain cases, like in the credit space, it is even mandatory to provide explanations, for example, for the rejection of a mortgage application.
Machine learning is a greedy animal hungry for data. If not fed properly, it can behave in an uncontrolled fashion
There are tools which locally approximate complex models by simpler ones or use other criteria to measure the impact of features on predictions. Interpretability however remains a challenge that is unlikely to be easily overcome.
Separating Signal from Noise
How many people came to a large event? What is the true value of a company? How many aeroplanes are currently in the air? Complex questions like these often produce imprecise answers, manifested as noisy data. Thus, noise is not restricted to imprecise sensor measurements, but rather a ubiquitous feature accompanying many data sets. In certain settings, such as financial data, noise even dominates the signal. This doesn’t mean, however, that nothing can be done. Indeed, a small edge in understanding can lead to formidable returns as demonstrated by certain hedge funds. In situations where the objective is to anticipate a share price one month ahead, filtering techniques and a careful selection of features can help substantially to extract signal. However, the challenge of separating signal from noise will persist.
Managing Low Frequency or Small Data Sets and Rare Events
Machine learning is a greedy animal hungry for data. If not fed properly, it can behave in an uncontrolled fashion; this is referred to as a model with high variance or simply overfitting. Some argue that this no longer occurs in the information age, where data is plentiful, but this is certainly not true. Sometimes data naturally comes in low-frequent intervals (quarterly or annually) such as economic figures, company accounts, defects/accidents or default events. In such small data situations, ML researchers should remember that this is exactly the setting where many techniques in statistics were developed. In the past, statisticians were not blessed with large data sets and had to come up with an ingenious tool box to deal with this challenge. For instance, Bayesian approaches can include prior knowledge or carefully framed assumptions, which are updated based on few data points.
Dealing With Regime Shifts or Non-Stationarity
Many machine learning applications implicitly assume static relationships between a set of features and a target. Even if time is explicitly included in the modelling process, the underlying assumption is that of reoccurring patterns. Indeed, it is hard to deal with regime shifts and non-stationarity. There are techniques to identify and model such situations, such as the use of latent state variables to encode regime switches; however, no generic solution exists.
As we develop and apply machine learning at IHS Markit more and more, we too face these challenges as we further expand our product portfolio, whether it’s focused on forecasting dividends, predicting global vehicle prices or demand, or estimating oil production curves, just to name a few examples. Boundaries are being pushed and many exciting developments are on the way, but these fundamental issues will remain. For me personally, this is a welcome challenge and it makes working in this space more interesting, since creative thought and critical thinking, in combination with ML, can lead to very powerful applications. A machine can only be operated safely and successfully when its limitations are clear.
AI Engineering Through the Hype
James Luke, CTO, IBM Distinguished Engineer
Dell Customer Communication Digitally Transforming IT Infrastructures
Paul Brook, Director, Data Analytics, Dell EMC
How AI will Improve Human Life in the Next Decade
Christian Guttmann, VP, Tieto
Powering Customer Centricity with Machine Learning
Dr. John Carney, Chief Data Scientist, OpenJaw Technologies