06 October 2020, 07:10 PM
Testing a Data Science Model
I’d like to share how I explored the world of data science as a tester when testing a model and how we can apply that if we find ourselves in this situation. As part of an emerging team, how I contributed value in a field I have never tested.
I have heard from other senior testers that they know of data science teams but no testers testing the models, how do we have enough confidence what is produced is good enough? A model is a statistical black box, how to test it so we understand its behaviours to test is properly. Main aim would be to help inspire testers to explore data science models.
I’d like to invite you to my talk where, we will go through my journey of discovering data science model testing and find the following takeaways useful not just for testing a data science model but day to day testing too.
- Some background of what a data science model is, and how data plays a role in these models. Understand from vast amounts of data. -structured data -unstructured data -metadata -semi-structured data 2- Understand data pipelines
3-Importance of pairing -Follow a SDLC process which may require a bit more of exploratory testing and investigation, therefore pairing with data scientist is a good way of working and understanding the model
4-Pre-testing thoughts: is the model custom made/ off the shelf? How as a team are, we training our own model to behave? What is my input and what’s my output? Am I experiencing the right behaviour? (models do contain some element of randomness so how we will make sure what’s acceptable when testing the results?)
5-As testers we expect input + model that uses predictive analytics = output example 5+3 = 8 but for data scientists 5+3= 8 is not always 8 but 8.1,.8.001,.8.5 in simple words stochastic, so how will we bring processes and strategies to make sure we capture the right output results and the consumers still benefit from this? In a nutshell, making sure the model’s quality is good and we have the confidence in what we provide to consumers.
6-Test the areas we are certain about the behaviour and those areas uncertain about have some bounds around averages - expectations set
7-Exploratory testing and looking for edge cases, regression testing to see that new features are not breaking baseline results
8-Understanding what tests to perform: what is an acceptable test for the model? Have we found anomalies? (results too off the threshold?) How do we know what we produced as results is the right result? How accurate are my results from the model? What is an acceptable deviation?
9-I will give away tips that helped me and could help testers who want to explore testing models and making sure the quality of a model is providing the team enough confidence and helping a business
10- Post testing – Have we got a good understanding of what the model has provided? Are the predictive analytics working as expected? Does the shape of my data looks as expected? (testing the outputs will explain if the values are of the right type from the data input stage)
Takeaways: 1. Have a better understanding of what data science is. 2. Know how we can test models. 3. Know what existing skills we already have that we can apply in a data science team. 4. Leave with resources to help our teams’ better structure itself to have confidence in the data produced. 5. We'll look at what did or didn't work