Are you running from one analysis to another? From one data visualization project, data modeling exercise, dashboard development, data quality analysis to another because there is high demand for your skills?
How big is your personal folder? Is it littered with spreadsheets, BI workbooks, and other scripts that were used for a one time project?
Dark Data is getting Darker
What is dark data?
Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. - Isaac
Now many organizations associate dark data with legacy data or data lakes that will serve a future purpose. I call them data landfills that results from the accumulation of database silos and artifacts from one off analytics. For legacy data, I've proposed an agile approach to finding value in dark data.
But data scientist have more tools and a lot more capability today. R scripts, dashboards developed in data visualization tools, software developed for Hadoop clusters, data processing pipelines, etc. Ideally, most of the effort and artifacts are implemented directly to the data warehouses and reference data and become core extensions. But some of this work is one time, single purpose analysis and probably contributing to the organization's dark data unless some action and governance is adopted.
Simple Solutions To Avoid Dark Data
The simple answer to the accumulation of assets tied to analytics is to catalog them. Develop a small database that identifies the analysis performed, its purpose, its owner, and the location of its assets. Develop a tagging taxonomy to make it easier to navigate the catalog. Insure these artifacts are stored in some kind of source control repository like Git and are versioned whenever an analysis is updated. Schedule periodic reviews to identify opportunities to enhance enterprise data assets leveraging these artifacts as prototypes, and archiving others that are no longer valid.
If you have the latest analytics cataloged with reasonable practices around them, you'll help avoid another generation of dark data.