Explanatory versus Predictive Modelling Approach

Today we will review different types of analyses; and all the different terms people like to give them (that's a downside of an emerging field; R.I.P. us). We're also going to look at what constitutes a good or bad prediction. So, today we'll explore some theoretical concepts and allow you to form a clearer picture of what constitutes data science. Therefore, we won't do a mock assignment as we don't apply this theory to our paper directly. You can, however, use your improved vocabulary to supplement the discussion and conclusion section of your conference poster.

Learning objectives:

understanding predictive analysis;
understanding explanatory analysis;
understanding the difference between descriptive, predictive and explanatory analysis;
understanding what Big Data is;
making good and bad predictions.

Table of contents:

Difference between descriptive, predictive and explanatory analysis: 0.5 hours
Descriptive analyses: 0.5 hours
Predictive analyses: 0.5 hours
Good and bad predictions: 0.5 hours

Questions or issues?

If you have any questions or issues regarding the course material, please first ask your peers or ask us in the daily Q&A at 16:00!

Good luck!

0) Differences between descriptive, predictive and explanatory analysis

In the last weeks, we mainly looked at different kinds of descriptive analyses: what can we tell by the features of the data, such as summary statistics, clustering of data or a visualisation of the distribution of the dataset. Descriptive statistics are like the groundwork and foundations of a building: you need to prepare them before you start building any models. Once set up, you can begin conducting explanatory or predictive analyses. These are distinct in that explanatory analyses are based on a hypothesis: you make assumptions about the data, whereas predictive models don't make these assumptions and just try to predict an outcome based on the dataset.

The following videos explain this difference in more detail:

Predictive analyses (or models) are also called predictive analytics or inferential statistics depending on the context you used them in; for our purposes, they are interchangeably, though.

Predictive analytics from Data Science Foundations: Fundamentals by Barton Poulson

Explanatory analyses (or models) are also called prescriptive analytics and rely on causal relationships and probability theory. Given the likelihood that a causal relationship is true, one should observe X or Y. That's why you need a hypothesis and need to make assumptions about your data; otherwise, you're doing predictive analysis. Explanatory/Prescriptive analysis are terms more often used in the business world, but in the world of science, they will be called inferential (or inductive) statistics.

Prescriptive analytics from Data Science Foundations: Fundamentals by Barton Poulson

Next year, we will dive deeper into explanatory analyses and hypothesis testing. But, for now, all you need to know is the difference between the two.

1) Descriptive analyses

Descriptive statistics now might seem a bit boring. So why don't we explain and predict all these cool and exciting things right now? What are we waiting for?!?! ☹ Well, that's because the data science landscape is getting more complex as technology gets more advanced. To illustrate, let's get a brief overview of Big Data, a fancy term, but it shows why descriptive analyses are so critical in this day and age.

Big data from Data Science Foundations: Fundamentals by Barton Poulson

So, let's review descriptive statistics again with renewed vigour and enthusiasm!

Descriptive analyses from Data Science Foundations: Fundamentals by Barton Poulson

1) Predictive models

Predictive modelling is a flavour of data science where statistics are used to predict outcomes. It is not based on probability theory but on detection theory: trying to identify meaningful information which helps the prediction and separates the unhelpful information (or noise) in the data.

Predictive models from Data Science Foundations: Fundamentals by Barton Poulson

Machine learning and deep learning are good examples of predictive models, and we'll dive into machine learning next block!

Good and bad predictions

We're going to talk about why many predictions fail - specifically we'll take a look at the 2008 financial crisis, the 2016 U.S. presidential election, and earthquake prediction in general. From inaccurate or just too little data to biased models and polling errors, knowing when and why we make inaccurate predictions can help us make better ones in the future. And even knowing what we can't predict can help us make better decisions too.

We're going to take a look at some of the times we've used statistics to gaze into our crystal ball, and actually got it right! We'll talk about how stores know what we want to buy (which can sometimes be a good thing), how baseball was changed forever when Paul DePodesta created a record-winning Oakland A's baseball team, and how statistics keeps us safe with the incredible strides we've made in weather forecasting.

2) Finished?

Use your extra time to start working on your poster!

Daily Q&A

At 16:00, there's a online meeting on our Microsoft Team Channel you're encouraged to take part in to ask questions and to discuss our progress and reflect on today activities.

Tomorrow we will recap block A, specifically descriptive analyses because they're so fundamental.