5 Data Analytics Reflections during COVID-19

It’s been more than four months since COVID-19 outbreak started sweeping through the globe, affecting millions of lives in a way that we had only seen in films featuring catastrophic virus outbreak. 

Throughout this period, we’ve witnessed the progression from the confirmed cases and deaths picking up in China during the Chineses New Year (and the media primarily focusing on the global economic impact the virus would have, through the disrupted supply chain in China), to WHO declaring the COVID-19 a pandemic on 11 March, and global lockdowns with countries in different continents shutting their borders one after another. As of today (24 June), there are more than 9 million confirmed cases and 478+k deaths due to COVID-19 reported on worldometers.

Perhaps due to the nature of the event, combined with the data literacy and technology advancement in data collection and analytics (and my professional bias), I sense that we have also witnessed an explosive amount of quantitative information becoming accessible, in the mainstream media and other online platforms such as websites, blogs and social media over the past few months. People are re-sharing the numbers they read or see in the news and reports, in formal and informal ways, which sometimes sparks intriguing discussions. 

Seeing how numbers, charts and analytics have been used to inform the public, as well as decision-making on government and community policies excites me, and gives me a sense of familiarity even though my field of work is far from epidemiology. I am a strong believer in using data analytics and Machine Learning to help us sift through misconception and noise, so as to tell the true story with facts and numbers; but I also think that, if not careful, using the wrong data or interpreting the results in a biased or partial way can be dangerous and might lead us to the wrong destination. 

I thought I’d share a few reflections I had about data and analytics during this crisis. Not surprisingly, these reflections also apply to many business contexts.


Here are the 5 reflections I had on data analytics during the COVID-19 crisis:


1. Know the context of your data

The raw data collected forms the foundation of any subsequent analysis. It is essential to understand the context of the data that has been collected. The quality of the data anchors the quality of the analysis and the trustworthiness of the conclusions drawn. 

As my Econometrics professor often said, rubbish in rubbish out.

A good (negative) example is the recent paper retraction scandal with The Lancet. 

The paper, led by a big name Harvard professor but with questionable data sources, claimed that using Hydroxychloroquine on COVID patients increases heartbeat irregularities and death rates. As one of the oldest and most respected medical journals in the world, The Lancet had published this paper, which had resulted in several major hydroxychloroquine trials being halted - studies that could determine how many people live or die from COVID19 in the future. Then they subsequently retracted this paper days after publication, due to its authors “can no longer vouch for the veracity of the primary data sources”.

The paper had already caused a huge impact and consequences on the research and treatment landscape for COVID19. And imagine if the flaws in the paper weren’t flagged?? And how many more of these flawed analyses are out there?

I can’t stress enough how important it is to understand the source and the collection process of the data you are going to analyse, whether it’s system collected, reported or survey-based. It will be worth the time and effort, especially if you know that key decisions will be made based on the insights from your analysis. Most importantly, once you understand what goes into the numbers (there will always be a level of ambiguity, uncertainty and noise), the next question is what you should do to clean and process it before the analysis, and whether it places any implication to the interpretation and conclusions you are going to draw.

We NEED to make sure the data is accurate and reliable, especially in healthcare where the margin for error is smaller. We need to ensure data-driven decisions are made based on analyses that are scientifically rigorous and robust. Like the Bloomberg Medical Science & Tech Reporter Michelle Cortez said in the Prognosis podcast: We need to know the right answers, not just the fast ones.

Here is an article where you can see more examples of why it's important to know the context of the data, and how you can go about it.


2. Speed to insight - Availability and timeliness of data

I don’t think there is a need to point out how important the timeliness of data and insights are. Like many have said, “…the world is learning at a speed that we’ve never seen before”. Organisations and businesses are compelled to make data-driven decisions within hours or even minutes, not days and months. The value of the data could really diminish exponentially like the radioactive decay, leaving missed opportunities if we failed to capture them in time.

 
 

As ThoughtSpot rightly pointed out: “The potential value of data within organisations is well known, but in the new environment, the ability to collaborate on and share data is also a competitive differentiator.” If I may add here, the ability to collaborate on and share data quickly and efficiently is a strong competitive differentiator. 

All the economists are talking about High Frequency or Real-time Data now, such as job postings and weekly unemployment claims because indicators like GDP have lags. We need to make decisions NOW, based on data that tells us how we, healthcare services, businesses, and organisations are doing NOW. 

What is the High-Frequency Data for your organisation? 



3. Not all Data Science is about Machine Learning and AI

Data Science and Machine Learning are hot topics these days. They elevate organisations that perform data analysis from the Descriptive and Diagnostic analytics, to the more advanced and complex Predictive or even Prescriptive analytics space. More and more clients I work with are showing eagerness to learn about how they can leverage Machine Learning to help them better plan for the future, whether it’s about supply chain planning, marketing budget allocation, staff resourcing, talent management, or customer retention, across all kinds of industries. 

In this “unprecedented time”, naturally we want to utilise the power of Machine Learning to help us learn as quickly as we can, so that we can plan for the near and mid-term future and act now. 

However, the idea of Machine Learning is to let the “machine” LEARN from existing or historical data, so as to extract the underlying patterns or to predict the unknown. If we unpack this, it means that this technique is useful and relevant typically when you have a sufficient amount of data you can feed to the machine so that it can learn from the past behaviours and patterns in the data. When events like COVID19 strike, businesses want to know how this will impact their customers, existing and future contracts, and cash flow, but there might not be historical data to inform the machine how these things had been impacted by such an event, especially in the early days of the crisis. 

There are two potential solutions to that. The first one is to stay with machine learning, and use other historical events as a proxy. The second is to consider other alternative techniques, such as Scenario Modelling. I will talk more about these two options in the next post.



4. Data visualisation ≠ storytelling.

We see charts and graphs every day everywhere these days. Data analysis and visualisation skills have been largely democratised, thanks to modern technology such as desktop software like Tableau and PowerBI, and online visualisation tools such as Google Data Studio and Chartbuilder. Putting “Data-driven” in front of everything has become a trend - Data-driven Marketing, Data-driven HR, Data-driven business transformation… Data literacy has become one of the hottest and most sought after skills in the employment world.

However, data literacy is not just about creating beautiful charts, or coding some complex functions to wrangle data. “Analysing” data, apart from the often necessary cleaning, transformation and visualisation, also includes interpreting the results - insights that can be extracted from lines, bars, maps, numbers, shades or whatever format presented. Visualisation DOES NOT EQUATE TO interpretation and storytelling, and the latter is typically much more challenging, often requiring subject matter expertise or years of industry experience. 

For example, in his viral COVID19 article “Coronavirus: The Hammer and the Dance”, Tomas Pueyo shares how the different ways of interpreting the below chart created by the Imperial College team studying COVID19 can lead to very different conclusions. I will dive into this, and share more examples in the detailed post later this week.

 
 


5. It’s all about actions.

This might be something that the C-suites have to keep reminding data teams, managers have to keep reminding analysts, and consultants have to keep reminding themselves. The SO WHAT, and NOW WHAT questions.

As common sense as it sounds, when you had been buried deep in the data, in and out of the waves of processes and data quality challenges, by the time you managed to surf the wave and could show off a few cool moves with some fancy Machine Learning algorithm and beautifully designed dashboards, it’s easy to forget the primary goal for the analysis. 

1. Data collected -> 2. Data cleaned and consolidated -> 3. Analysis, visualisation, modelling -> 4. Interpretation of results -> 5. Actions/Decisions

This is a simplified data analytics information flow (in reality, it’s much less linear than that). The majority of the time and effort an analytics team spent would be on the first three steps. But the biggest impact is actually brought by the last two. 

So, analysts and data scientists out there, when you finish a piece of analysis, ask yourself and your team these two questions to try and get deep to the business impact -

SO WHAT? 

NOW WHAT?



As always, please feel free to get in touch to share your thoughts.

Previous
Previous

Know the Context of Your Data

Next
Next

Quantitative Methods for Effective Customer Segmentation