I am not a Data Scientist, although I do have an intermediate-level understanding of AI / ML. I am curious as to whether commercial or open source data profiling tools might be used to discover…Continue
"Thank you - this is the information I was looking for.
Were you able to automate the process so that after establishing basic rules for quality, there was minimal human intervention prior to delivery of data to data science tool?"
"There are a number of tools out there, Alteryx being one. My previous company built our own customized tool which delivers the results that you are looking for. We created an automated tool that delivered three outputs on a given file:
"I believe that your synopsis is correct. The only difference here is that I would suggest doing the profiling prior to any iterations. I had recently worked with a very large data set to determine the data quality before releasing the…"
"Well take for instance in R, the package set 'Tidyverse', where it has an option to read in a file, where you can then exercise an option called 'problems', and it gives you summary information and also some initial pertinent…"
"Yes, but may also cut down on the time taken in training iterations to 'clean' the data or determine the relative quality of the data.
There have been a couple of blog entries and articles recently stating that Data Scientists are…"
"If I am understanding, I think you are asking is there some set of 'exploratory' actions that are taken a larger portion of the time when a dataset is first presented, that, given what is learned, allows a determined direction to proceed ?"
I am not a Data Scientist, although I do have an intermediate-level understanding of AI / ML. I am curious as to whether commercial or open source data profiling tools might be used to discover patterns in a dataset prior to using the dataset as training data.Big data has posed a volume challenge to some of the profiling tools (IBM, Informatica, Oracle, etc.), but most vendors have addressed or are continuing to address this issue and will connect to the big data platform directly (as opposed…See More