The subject of raw data has crept into the vocabulary of the media recently with alleged wrong doings amongst scientists in connection with data related to climate change. The matter has unsurprisingly been dubbed ‘climategate’ and is currently chalking up over 28,000,000 hits on Google. The underlying issues, if they are real, are political, and in the normal spirit of popular news reporting, why let the data get in the way of a good story? However, once you work your way through the political overtones, the issue does draw attention to something that must be close to the heart of any scientist, the preservation of data
Within the context of laboratory data and information management, raw data tends to draw out some interesting debate along the lines of what constitutes raw data, how and where do we store it, and for how long?
Those of us involved in the business of laboratory systems and laboratory integration can be kept pretty busy with the technological challenges of acquiring, managing and storing ever increasing volumes of raw data, but behind these challenges are some more fundamental questions that need to be answered before we can even start thinking about a solution.
So, if we assume that we know what the raw data is, the decision about how long we keep it is influenced by three different considerations.
Firstly, scientists are often hoarders of data and like to hang on to raw data as basic scientific evidence, for reference purposes, or for re-assessment in the light of future scientific or technological advances. This requirement has no definable timeframe.
Secondly, there is an ethical position, largely determined by regulatory bodies, to allow for the re-examination of data in the light of the consequences of unforeseen defects, failures or adverse effects of products or processes. This timeframe may be determined by, or related to the lifetime of the product or process.
Thirdly, there is a business requirement to address IP protection in terms of the underlying value of the data to the business. This may have a long timeframe if it is relevant to a patent, but could in other circumstances have a relatively short timeframe.
Making the decision on what to store and for how long has its complexities, but the combination of regulatory and legal guidance, business best practice, good technology and hopefully, common sense, is helping shape a way forward. However, it is always good to remember that the scientific knowledge food-chain starts with the raw data, and an item by Derek Lowe (Data, Raw and Otherwise), loosely connected to ‘Climategate’ on In the Pipeline serves as a good reminder of the importance of raw data from the scientist’s perspective.