DOCO Thoughts – Machine Learning

August 15, 2017

Advanced data analytics technology today can magnify text (or data) that is indistinct and brings it into more granular focus with practical benefits for insurance companies and their clients.

The Power of Machine Learning 
What we mean by human knowledge is our understanding of the world around us. Today’s IoT sensors provide a platform that enable us to use Machine Learning to model the world in ways we can’t possibly imagine right now.

The implications for a sector such as insurance are that it will allow us to move away from a predictive modelling environment where we seek to ask what might happen? We already have many sophisticated and established tools and models to help us do that.

The power of Machine Learning is that it should be able to move us into a world of preventive modelling where we don’t just say what is going to happen or what could happen but also how can we influence that?

What are the things that we need to be able to do in order to prevent claims or manage our risk internally? The ultimate goal is a market that progresses to dynamic risk management, real time exposure modelling, and real time exposure management.

According to Jason Cabral, Chief Actuary, Markerstudy, speaking at a recent TOM event that was organised by Artelligen in the Lloyd’s building:

“At last year’s Gyro Conference for actuaries delegates talked about Machine Learning and its applications for pricing, reserving, modelling and the future of the insurance industry. There is a certain amount of scaremongering about Machine Learning taking away jobs but I don’t think that is going to happen. Machine Learning will supplement Generalised Linear Models (GLMs) to actually provide more information.”

Machine Learning Myths 
A number of myths have grown up around Machine Learning’s potential applications. Driverless cars are frequently hyped up, drones, robotics are said to be the only areas where Machine Learning can have a huge impact but that is not necessarily true.

These may well be areas in which Machine Learning principles can be applied but the opportunity is not limited to these scenarios. Machine Learning is a set of tools or rather algorithms designed by computer scientists that can be applied to any problem, particularly if you can’t specify the problem well enough. It’s not just for driverless cars!

Not a Black Box 
One myth is that Machine Learning is a black box. Compared to the traditional statistical models that actuaries generally tend to use, the world of Machine Learning may seem very opaque. I would say, however, that if you delve into the subject deep enough it becomes possible to understand how a Machine Learning algorithm is coming up with a prediction even if the prediction does bizarre or even counter intuitive. So it is not a black box as some of the media represent it.

Another myth is that you can simply throw in some data, turn the handle, apply the algorithm and expect to find the answer! It’s not that easy, of course! It is also possible to feed the machine pre-existing knowledge, pre-existing experience about claim events, for example, which informs the machine even before it starts predicting.

The machine can incorporate this kind of knowledge before it even starts building complex algorithms. If you have some experience or inside knowledge about a line of business or particular problem it is possible to put that in mathematical terms before you run the machine.

A Silver Bullet Machine? 
Machine Learning is not a silver bullet. It is not going to solve every problem. If you put garbage data in you will get garbage data out. It may not mean the data is bad but it may be uninformative.  A common example cited is if you are trying to credit the intelligence of students by collecting data of their shoe size. You know that collecting shoe size data has no correlation with intelligence of a person. So even if the information is correct it is uninformative data.

Another myth is that Machine Learning data is not biased but that is not true. The person inputting the data has to be careful when setting up the Machine Learning pipeline to insure against biases that can creep in.

For example, a project organised ten years ago by the U.S. defence agency DARPA tasked computer visual researchers to design a Machine Learning algorithm to distinguish between U.S. tanks and non-U.S. tanks. The researchers created a new algorithm but the outcome was a disappointment

The reason that project failed was because it emerged subsequently that when they trained the model to recognise all the pictures of the U.S. tanks these had been taken in broad daylight. The non-U.S. tank pictures had all been taken on cloudy days so the machine had only learnt how to distinguish between sunny and cloudy days!

Machine Learning Only Works With Big Data 
Another myth is that Machine Learning only works with big data. Again, that is untrue. If you input the data using classical, statistical or traditional methods side by side with applied Machine Learning using the same data set it is very likely that the machine will outperform classical methods to a large degree. It does not have to be thousands of TB of data. Even in small data sets, the right Machine Learning tools will outperform traditional methods.

The final myth is that Machine Learning only applies to big companies. The point is that the software algorithms are all free to use, they can be easily downloaded from the Internet. All you have to pay for is the hardware resources, so if you buy cloud company resources off MS Azure or Amazon you pay the hardware resources, if you buy the server then you pay for the server. That’s it.

Smaller business don’t necessarily need a big budget or require huge amounts of data is the key take away here.

The Advantages of Machine Learning 
What Machine Learning does is remove linearity assumptions. Machine Learning explores the vast area of non-linear models – a much more vast data set of models, which makes it a more powerful tool.

One of the bedrocks that traditional models have been built on is that you assume a statistical distribution. All those distributions have certain theoretical mathematical properties that provide a deceptively nice fit but real life data does not come from any textbook mathematical distribution.

As a result people make approximations and try to fit the model hoping that their assumptions and approximations are not too far off. Machine Learning tools can take account of any type of statistical distribution. Another issue with traditional statistical tools is their reliance on extreme values in data. We call them outliers.

The Claims Example 
Suppose you have claims and a few of them are very large – they probably go to the reinsurer. What a traditional practitioner would do for a Generalised Linear Model (GLM) is remove those claims from the analysis, run a separate analysis on the smaller, non-extreme claims and then do something special for those outlier values.

You have to do that otherwise your model will go horribly wrong. That happens when you shoehorn data into a traditional model and it does not know how much importance to attribute to each claim. The GLM treats every data point as equally important. It fits a model in a broader sense – but badly on those high extreme values. It will therefore try to adjust itself so that the prediction for those extreme values is better. As a result the model will miss-predict the rest of the data set.

How Netflix Replaced Blockbuster 
Jason Cabral says: “Let’s take the Netflix example. Machine Learning is transforming well-established industries, for example, the way Netflix replaced Blockbuster. Netflix applied software that works out ‘because you like that, then you might like this’ so the company look at what you’re watching, when and how that can help them to be more profitable.

Netflix have said that its modelling decreases customer churn by several percentage points and saves the company about $1 billion a year! Similarly for insurance companies’ pricing it will give them another tool.

“The applications and benefits for insurance companies using Machine Learning tools are huge. When my business got information from Experian about credit scores, CCJs and other data fields that had to be filled we had about 1000 factors to test and that took us 3 months to test those factors manually but using a Machine Learning tool it took us about 3 hours.”

Assessing Complex Multi-interactions 
Jason Cabral continues: “Machine Learning helps us to assess complex multi-interactions so it can include lots of factors at the same time that you wouldn’t be able to put into a GLM typically. In the future ML will allow underwriters to write to whatever loss ration they are prepared to accept.”

Machine Learning is able to model all the data and will not try to adjust itself just because of a few rogue data points. It helps practitioners to really see what is going on in the world without having to create handcrafted solutions. Machine Learning elegantly handles extreme values (or outliers) in data.

So to summarise, advantages include:

  • Automatically filters out irrelevant predictors (without performing statistical tests)
  • Can handle incomplete or missing data
  • Highly scalable – capable of assimilating immense volumes of structured and unstructured data
  • Works with small data
  • Improves over time when exposed to new data

Risk Selection and Pricing 
Risk selection and pricing is also very important for insurers. People have used GLMs a lot for statistical Personal Lines but applying the same context and the same data you can use Machine Learning to improve your model’s predictive power even if you don’t want to replace your GLM.

Another scenario is Monte Carlo Simulation where underwriters are pricing a very high reinsurance layer that likely requires thousands of simulations. Machine Learning tools can establish how many times a reinsurance entity has taken a line on that layer, which helps to assist with pricing and aggregation risks and exposures.

Other potential uses for Machine Learning, include:

  • Customer retention modelling
  • Claims fraud detection
  • Cross-selling – which customer can I cross sell to?

How To Get Started with Machine Learning 
The complexities of Machine Learning are daunting and there is no getting away from that, however, the benefits will be transformative to businesses that make the effort to educate themselves on the subjects. I recommend this six-step process, to get started.

  • Identify the business problem and goal
  • Identify the relevant data (both private and public data)
  • Choose a modelling platform and open source tool e.g. Amazon ML, Windows Azure ML,, R)
  • Build a prototype model
  • Test the model and fine tune e.g. change your model parameters or structure; what results in the best prediction?
  • Deploy the model, and then work out how to build a better model.