Whichever way you look at laboratory integration, sooner or later the ‘data’ problem will rear its ugly head. For years LIMS projects have wrestled with the vagaries of proprietary interfaces and data formats, and it is increasingly becoming an issue within ELN projects. If you are a LinkedIn member, take a look at this discussion on handling large files in an ELN.
We generate more and more data in the pursuit of further scientific knowledge, and paradoxically, the more we know, the less certain we become, and so we need more data to seek the truth; the perpetual data machine. Data volumes grow to overwhelming levels; we need to store it somewhere for future reference, and for re-interpretation. We may also need it for regulatory or legal purposes which means we may need to keep it a long time. We may need to transfer it to other programmes for processing. We may need to reduce it to graphical images in order to understand it. We may need to derive a set of conclusions for inclusion in reports or presentations. We may need to import data from external agencies, such as CROs. And we know that data volumes will still continue to grow.
And yet, we don’t have any accepted intentional standards for the interchange of laboratory data; we don’t have any agreed integration standards or strategies for data communication; we cannot be confident that today’s applications, operating systems and media will survive for as long as we need the data to survive. All in all, it is not a healthy situation or one that inspires confidence, so what hope is there that it could ever get better? There are perhaps three alternative means by which the situation could change. Firstly, by force; if a regulatory or legal agency were to demand that all data comply with international standards, then the industry would need to respond, but this seems highly unlikely, certainly in the short term. Secondly, by community action; there have been several worthy attempts to evolve international standards either through the adoption of commercially based de facto standards, or by ‘standards’ associations formulating standards. In some instances, this has worked out well. For example, the adoption of PDF and PDF/A as ISO standards has shown that a global demand for document standards can be achieved by adopting a de facto standard. However, most initiatives with laboratory data have failed to acquire adequate uptake, and therefore have struggled to have any substantial impact. The third option is to sit back and wait for technology to provide us with a solution. To some extent this is already happening with the increasing use of XML for data interchange, but without suitable ontologies, there are still limitations.
Two specific community initiatives are currently addressing the laboratory data problem. The Pistoia Alliance, an initiative to provide an open foundation of data standards, ontologies and web-services to streamline the Pharmaceutical Drug Discovery workflow (Chemistry, Biological Screening, Logistics) through common business terms, relationships and processes. The work is currently being undertaken by member companies to develop open standards to support the interchange of data between CROs and their major customers in the Pharmaceutical industry. With a growing number of members, including vendors, the initiative faces a number of challenges, but probably represents the broadest approach to date to tackle the problem.
The other initiative is AnIML, the goal of which is to serve as the open-source development platform for a new XML standard for Analytical Chemistry Information. The project is a collaborative effort between many groups and individuals and is sanctioned by the ASTM under subcommittee E13.15. AnIML is receiving attention from vendors, a critical step towards adoption, but lacks the focused business case that is driving the Pistoia Alliance. Of course, if the Pistoia Aliiance were to adopt AnIML for analytical data interchange……….
The other consideration is how much of the Pistoia Alliance’s objectives have been addressed within the clinical world where a number of data interchange standards already exist. There are sufficient parallels with clinical chemistry at the laboratory level to believe that there may be no need to reinvent the wheel, but to adapt existing open standards to the requirements of non-clinical laboratories. The progress of Pistoia and AnIML will be observed closely




