Hello fellow data scientists,
I'm a analytics enthusiast and newbie to the world of ML. I'm trying to implement a recommender system in one of my areas.
- For one of a very large system, we have a regression test suite and this consists of hundreds of test cases, for any new release we spend quite sometime to select the test cases from the suite and run them as we cannot afford to run all the test cases due to time pressure.
I'm thinking of implementing a recommendar system to suggest what test cases can be run for a particular release, I have done a very rough POC in excel with limited fabricated data,
- I have considered two types of releases, though there are releases all year round they are either minor or major, the reason I'm defining only two release is because it will be easy to recommend to either minor or major future releases
- I have 21 test cases and I have rated each test case between 1-5 based on its importance in either minor or major release
- I then calculated correlation between Minor and Major
- For al test cases I calculated the score by doing a sumproduct of correlation and the actual value
example for TC-1 = sumproduct(first row in the correlation table , first row in first table)
- I get a score for each test case and after I sort them high to low , I select 10 from top and can say they are the recommended
- I have not seen any use case like this for me to verify, does this make sense?
- Also one of my college asked as to why I'm doing a correlation, I actually did not have an answer and I'm still thinking , because without correlation I can still sort the test cases high to low for Minor and recommend them right? without the correlation, where is the correlation helping here?
well, checking on correlation between the features is essential, because the more correlated the more useless.