Subscribe to DSC Newsletter

From my observations working with data, few people have any formal training in basic data literacy. Most people think data are "facts". Few question the "truthiness" of data. Few people are aware of the inherent bias in data. Few assess their own trust and confidence in data. If you tell people that "all data lies", would they understand what that means?

Few people question the data they use. I believe this is true in the general population as well as in government, business and academia. People need to be data literate. To be aware of data and to be aware of the data they touch and how they affect the data and are affected by the data. 

I don't think data literacy is about how to use technical tools or conducting analytics. I think its more about the meaning and understanding of data. Data in all its forms not just electronic data.

 

The topics that should be part of a data literacy curriculum should include at least the following:

 

1. The origins of data

2. The semantics and meanings of data

3. The philosophy of data

4. Metadata

5. Data quality

6. The principles governing data

 

These are just a sample of topics I believe should be taught.

 

Do you believe people are data literate?

Should we be teaching data literacy and if so what topics should be in the curriculum?

Should we assess people's data literacy before allowing them to work with data?

Views: 1262

Reply to This

Replies to This Discussion

While I agree that there is a great need for greater data literacy, I think perhaps some of your comments are too general.   I'm not sure that there is a single philosophy of data (please feel free to enlighten me - I'd love to know about it if there is).  I don't think one can teach the origins of data, but one should be taught to be certain to understand the origins of any data one uses - those are different things. 

In the work I do, epidemiological surveillance, there has been a real issue with the computerization of data.  In the past, paper forms created a context around the data, and when it was changed to electronic submissions, quite a bit of context was lost without the discussions taking place that would ensure that would replace that context (for example detailed discussions on what could be included in a particular field, and common understandings of certain terms that can be rather fuzzy).  I would love it if all epidemiologists were educated on data issues before starting work - we'd have much better quality data then!  But I'm not sure that the sorts of things they would need to look at could be generalized to all people working with data.  Perhaps each discipline would have to identify what aspects of data in general for their fields need more discussion, and incorporate it into the training for their workers.

 

Cas, thanks for your comment. I work mostly with business data and I am usually accused of being too academic but seldom lay accused of being too general. You are correct that there is no single philosophy of data, there are numerous aspects of philosophy that apply to data that go back to ancient philosophers such as how to name things. Learning about this can help in understanding the human and systems limitations before embarking on solutions that are fundamentally flawed.

Scientific data can be somewhat less troublesome. In your work in epidemiological surveillance, I think some of the great work done by the National Cancer Institute might be of some help. Their data standards program is the most comprehensive I have seem. They addressed one if the root cause of data quality, naming and definitions.

In my work I have worked with human and social services data which also suffered from a lack of precision and we developed a framework to harmonize data for use in certain contexts while developing data standards for contexts requiring greater precision.

With regards to data literacy, I have been researching this subject area for some time building on previous works and have compiled mind map of relevant topics which I am expanding into a curriculum for the various roles and activities affecting data. Any ideas you have are appreciated.

Hi Richard,

It's interesting that you bring up standards.  We've found that this is one of our greatest challenges - when you scratch the surface, you discover that different participants in the system mean different things when using the same terms (even the term 'standard'), and it's a huge challenge to come to a common understanding.  This is complicated by the fact that our data sources for surveillance are usually collecting the data for a different purpose (case management), and also represent significant investments, so they aren't willing to change them.  Even things as simple as what is meant by a case of tuberculosis, or syphilis, can be challenging, as sometimes you want all of them, sometimes you want infectious ones, and the precision of the terms can be lost simply by the configuration of the electronic systems. 

 

The US seems to be ahead of us in doing some of the work to come to common understandings, but we're getting started!  If data literacy could include how exactly discussions have to take place to establish common understandings, and how important it is to have such discussions, I'd be all for it.  I'd be interested to see your mind map - I'm sure many of the things I've learned through bitter experience to look out for are already there!

Cas, I don't think the US is a exactly a leader in standards. Having worked in other countries, my experience has been that the US is a reluctant partner in standards which I attribute to a culture of individuality. But NCI is an exception.

 

The challenges you describe of achieving consensus across contexts is however universal. But there are techniques that can be used to help achieve consensus. I have developed and adopted these with some success in government and business. I have to admit upfront that it is not easy but it has worked!

 

I would be happy to share my mind map for data literacy and techniques I use with conditions. Since these are my intellectual property, I would retain rights to them without compensation if they are used in the public interest. Otherwise I use them in my practice and expect some form of compensation. If you would like me to send you the details for these best practices just drop me a note where and how they will be used and if they meet my criteria (rather simple) for public good I will provide them to you. The only obligation is that if you use them you provide feedback on how they worked (or did not work) in your environment. This way we can refine them together. My direct e-mail is [email protected]  I would be very happy and pleased to help.

I agree. data literacy is a serious matter and hasn't received enough attention, especially in the big data domain. many ds still consider bid D to have only 3 Vs while according to IBM, it has at least 4, with veracity being the 4th one.

Folks,

I was surfing the web for my research on a data problem, happen to found this ...very few of the companies apply data science starting with a true definition of business problem and solving using data science techniques.. this looks  promising ...do check

http://bit.ly/datasciencetraining

Cheers

I would like to comment on the perceived business value as an evaluation if an organization's Data Literacy and in my experience it always comes down to articulating the value from and by the business manager/s.  All efforts on Data management have to eventually translate and provide their utility in simple meaningful terms and with economic benefits of monetization where possible.  I have faced business partners many time who not only ask 'show me what I have' but some even ask 'show me what can I do with it'.  However unfortunate such an illiteracy is, investment $$ for Data still need to be fought for and concepts sold.  In summary, I would advocate a top-down approach in a standard Enterprise Data Reference framework that demonstrates a value chain proposition.



JB said:

I would like to comment on the perceived business value as an evaluation if an organization's Data Literacy and in my experience it always comes down to articulating the value from and by the business manager/s.  All efforts on Data management have to eventually translate and provide their utility in simple meaningful terms and with economic benefits of monetization where possible.  I have faced business partners many time who not only ask 'show me what I have' but some even ask 'show me what can I do with it'.  However unfortunate such an illiteracy is, investment $$ for Data still need to be fought for and concepts sold.  In summary, I would advocate a top-down approach in a standard Enterprise Data Reference framework that demonstrates a value chain proposition.

Data Literacy's primary purpose is to better understand data. The value is understanding. I liken Data Literacy to Literacy. The intent of Data Literacy is not to monetize data but to get people to better understand the meaning of data. The basics of reading and writing and comprehending of data. What is done with the learning is not part of Data Literacy.

Its appears many jump to the exploitation of data without understanding the data. Exploitation of data without Data Literacy may be fine for uses in advertising where the results are primarily puffery, but when using data for  decision making, Data Literacy is needed.

A very well written article. I think data literacy is very subjective .and domain specific.

The basic competencies of Data Literacy are independent of domain and are directed at identifying the subjectivity in data. Data Literacy are the basic skills needed to read, write and comprehend data.

 

Understanding that data is domain specific and subjective are also Data Literacy skills. Data Literacy provides a better understanding data.

 

 

RSS

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service