Guest blog – Data science and data governance: collaboration for trustworthy data insights

Data scientists are storytellers. They gather data from a variety of sources, clean and combine that data, and use their programming, math, statistical and analytical skills to interpret that data. Data scientists can help businesses understand customers and market trends, forecast sales, improve processes, and help to make better, smarter, faster decisions. But the data driven results depend on good and reliable data. Advanced data models and algorithms are only as good as the data they are applied on. Without good quality data, even the best models and algorithms fail.

Strong Data Governance policies and practices ensure valid, quality data and thus ensure that data analytics and data science methods can arrive at meaningful and trustworthy conclusions. Collaboration is key for the best results.

As a data scientist, you can expect to spend up to 80% of your time cleaning, transforming, and checking your data. A discovery and understanding stage is important, but it should not be so prolonged. If the data scientist is confident that the data has been verified by the business, it is consistent and compliant with regulations, then they can focus on bringing out the stories within the data rather than double checking its content. Data governance plays the key role of validity checking to prevent confusion over the data or misunderstanding and thus meaningless data science results.

What can a data scientist do for your business? Visualisations can be produced to understand and extract insights from data about the business and its customers. Machine learning (ML) models can be developed to continuously capture insights to help make more informed business decisions. Past performance can be studied and predictions made.

One example of the use of ML is the work I carried out for a research hospital in Rome to understand public opinion. Vaccine hesitancy was identified as one of the top threats to global health by the World Health Organization (WHO) in 2019, with the growth of online communications and misinformation about vaccines an increasing area of disquiet. The hospital wanted to understand people’s stance on the subject of vaccination. Concerns had been raised about the low maternal vaccine uptake in Italy. Social media is increasingly being used to express opinions and attitudes, so we decided to use Twitter as our source of data. I trained and fine-tuned a natural language processing machine learning model to classify the vaccine stance (promotional, neutral, or discouraging towards vaccines) of Italian tweets. This is now used on a web platform for medical professionals and policy makers to monitor vaccine stance in almost real-time.

Another example of applying data science to a business is the work I have done for a local gin distillery, analysing sales data to predict future sales and profits. Data visualisations and predictions were an important part of the business plan for explaining the business and its potential to investors.

Data governance can deliver high-quality, trusted, and compliant data. Data science can deliver insights into that data. Collaboration between data governance and data science professionals increases the level of certainty in the results, models and predictions so you can make the data-driven decisions for your business with confidence.

The author: Susan Cheatham is an independent consultant in data science and data mentoring. She gained her technical data skills and physics PhD from the European Centre of Physics Research (CERN). She enjoys communicating and sharing knowledge.